skip to main content
research-article

Runtime Reconfiguration of Multiprocessors Based on Compile-Time Analysis

Published:01 September 2010Publication History
Skip Abstract Section

Abstract

In multiprocessors, performance improvement is typically achieved by exploring parallelism with fixed granularities, such as instruction-level, task-level, or data-level parallelism. We introduce a new reconfiguration mechanism that facilitates variations in these granularities in order to optimize resource utilization in addition to performance improvements. Our reconfigurable multiprocessor QuadroCore combines the advantages of reconfigurability and parallel processing. In this article, a unified hardware-software approach for the design of our QuadroCore is presented. This design flow is enabled via compiler-driven reconfiguration which matches application-specific characteristics to a fixed set of architectural variations. A special reconfiguration mechanism has been developed that alters the architecture within a single clock cycle.

The QuadroCore has been implemented on Xilinx XC2V6000 for functional validation and on UMC’s 90nm standard cell technology for performance estimation. A diverse set of applications have been mapped onto the reconfigurable multiprocessor to meet orthogonal performance characteristics in terms of time and power. Speedup measurements show a 2--11 times performance increase in comparison to a single processor. Additionally, the reconfiguration scheme has been applied to save power in data-parallel applications. Gate-level simulations have been performed to measure the power-performance trade-offs for two computationally complex applications. The power reports confirm that introducing this scheme of reconfiguration results in power savings in the range of 15--24%.

References

  1. }}Barretta, D., Fornaciari, W., Sami, M., and Pau, D. 2002. SIMD extension to VLIW multicluster processors for embedded applications. In Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD’02). IEEE Computer Society, Los Alamitos, CA, 523. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. }}Bonorden, O., Brüls, N., Le, D. K., Kastens, U., Meyer auf der Heide, F., Niemann, J.-C., Porrmann, M., Rueckert, U., Slowik, A., and Thies, M. 2003. A holistic methodology for network processor design. In Proceedings of the Workshop on High-Speed Local Networks held in conjunction with the 28th Annual IEEE Conference on Local Computer Networks (LCN’03). 583--592. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. }}Compton, K. and Hauck, S. 2002. Reconfigurable computing: A survey of systems and software. ACM Comput. Surv. 34, 2, 171--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. }}Dietz, H., Schwederski, T., O’Keefe, M., and Zaafrani, A. 1989. Static synchronization beyond VLIW. In Proceedings of the ACM/IEEE Conference on Supercomputing (Supercomputing’89). ACM Press, New York, 416--425. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. }}Dreesen, R., Hussmann, M., Thies, M., and Kastens, U. 2007. Register allocation for processors with dynamically reconfigurable register banks. In Proceedings of the 5th Workshop on Optimizations for DSP and Embedded Systems (ODES) held in conjunction with the 5th IEEE/ACM International Symposium on Code Generation and Optimization (CGO’07).Google ScholarGoogle Scholar
  6. }}Ellis, J. R. 1986. Bulldog: A Compiler for VLIW Architectures. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. }}Fischer, D., Teich, J., Weper, R., and Thies, M. 2003. BUILDABONG: A framework for architecture/compiler co-exploration for ASIPs. J. Circ. Syst. Comput. 12, 3, 353--375.Google ScholarGoogle ScholarCross RefCross Ref
  8. }}Gonzalez, R. E. 2006. A software-configurable processor architecture. IEEE Micro 26, 5, 42--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. }}Gruenewald, M., Kastens, U., Le, D. K., Niemann, J.-C., Porrmann, M., Rueckert, U., Thies, M., and Slowik, A. 2004. Network application driven instruction set extensions for embedded processing clusters. In Proceedings of the International Conference on Parallel Computing in Electrical Engineering (PARELEC’04). 209--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. }}Gupta, R. 1990. Employing register channels for the exploitation of instruction level parallelism. In Proceedings of the 2nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP’90). ACM Press, New York, 118--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. }}Halfhill, T. R. 2006. Ambric’s new parallel processor. Tech. rep., (microprocessors report). http://www.ambric.com.Google ScholarGoogle Scholar
  12. }}Hennessy, J. L. and Patterson, D. L. 2006. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. }}Hussmann, M. 2008. Compiler-Driven dynamic reconfiguration of architectural variants. Ph.D. thesis, University of Paderborn.Google ScholarGoogle Scholar
  14. }}Hussmann, M., Thies, M., and Kastens, U. 2005. Parallelizing compilation through load-time scheduling for a superscalar processor family. In Proceedings of the 3rd Workshop on Optimizations for DSP and Embedded Systems (ODES) held in conjunction with the 3rd IEEE/ACM International Symposium on Code Generation and Optimization (CGO’05).Google ScholarGoogle Scholar
  15. }}Hussmann, M., Thies, M., Kastens, U., Purnaprajna, M., Porrmann, M., and Rueckert, U. 2007. Compiler-driven reconfiguration of multiprocessors. In Proceedings of the Workshop on Application Specific Processors (WASP) held in conjunction with the Embedded Systems Week (CODES+ISSS, EMSOFT, and CASES), 3--10.Google ScholarGoogle Scholar
  16. }}Ito, M., Hattori, T., Yoshida, Y., Hayase, K., Hayashi, T., Nishii, O., Yasu, Y., Hasegawa, A., Takada, M., Ito, M., Mizuno, H., Uchiyama, K., Odaka, T., Shirako, J., Mase, M., Kimura, K., and Kasahara, H. 2008. An 8640 MIPS SoC with independent power-off control of 8 CPUs and 8 RAMs by an automatic parallelizing compiler. In Digest of Technical Papers on IEEE International Solid-State Circuits Conference (ISSCC’08). 90--598.Google ScholarGoogle Scholar
  17. }}Karypis, G. and Kumar, V. 1998. Multilevel algorithms for multi-constraint graph partitioning. In Proceedings of the ACM/IEEE Conference on Supercomputing (Supercomputing’98). IEEE Computer Society, Los Alamitos, CA, 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. }}Kennedy, K. and Allen, J. R. 2002. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. }}Kohonen, T. 1989. Self-Organization and Associative Memory. Springer, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. }}Lambrechts, A., Raghavan, P., Leroy, A., Talavera, G., Aa, T., Jayapala, M., Catthoor, F., Verkest, D., Deconinck, G., Corporaal, H., Robert, F., and Carrabina, J. 2005. Power breakdown analysis for a heterogeneous NoC platform running a video application. In Proceedings of the 16th IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP’05). 179--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. }}Larsen, S. and Amarasinghe, S. 2000. Exploiting superword level parallelism with multimedia instruction sets. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’00). ACM Press, New York, 145--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. }}Larsen, S., Rugina, R., and Amarasinghe, S. 2000. Alignment analysis. Tech. rep. LCS-TM-605, Massachusetts Institute of Technology.Google ScholarGoogle Scholar
  23. }}Mei, B., Lambrechts, A., Verkest, D., Mignolet, J.-Y., and Lauwereins, R. 2005. Architecture exploration for a reconfigurable architecture template. IEEE Des. Test 22, 2, 90--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. }}Muchnik, S. S. 1997. Advanced Compiler Design Implementation. Morgan Kaufmann Publishers, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. }}Niemann, J.-C., Puttmann, C., Porrmann, M., and Rueckert, U. 2006. Giganetic: A scalable embedded on-chip multiprocessor architecture for network applications. In Proceedings of the Conference on Architecture of Computing Systems (ARCS’06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. }}Niemann, J.-C., Puttmann, C., Porrmann, M., and Rueckert, U. 2007. Resource efficiency of the GigaNetIC chip multiprocessor architecture. J. Syst. Archit. 53, 5-6, 285--299 (Special issue on architectural premises for pervasive computing). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. }}Porrmann, M., Hagemeyer, J., Romoth, J., and Strugholtz, M. 2009. Rapid prototyping of next-generation multiprocessor SoCs. In Proceedings of the Semiconductor Conference (SCD’09).Google ScholarGoogle Scholar
  28. }}Purnaprajna, M., Puttmann, C., and Porrmann, M. 10-14 March 2008. Power aware reconfigurable multiprocessor for elliptic curve cryptography. In Proceedings of the Design, Automation, and Test in Europe (DATE’08). 1462--1467. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. }}Sankaralingam, K., Nagarajan, R., McDonald, R., Desikan, R., Drolia, S., Govindan, M. S., Gratz, P., Gulati, D., Hanson, H., Kim, C., Liu, H., Ranganathan, N., Sethumadhavan, S., Sharif, S., Shivakumar, P., Keckler, S. W., and Burger, D. 2006. Distributed microarchitectural protocols in the trips prototype processor. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’06). IEEE Computer Society, Los Alamitos, CA, 480--491. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. }}Silicore. 2002. Wishbone system-on-chip (SoC) interconnection architecture for portable IP cores. Tech. rep. http://www.opencores.org.Google ScholarGoogle Scholar
  31. }}Zhong, H., Lieberman, S. A., and Mahlke, S. A. 2007. Extending multicore architectures to exploit hybrid parallelism in single-thread applications. In Proceedings of the IEEE 13th International Symposium on High Performance Computer Architecture (HPCA’07). IEEE Computer Society, Los Alamitos, CA, 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Runtime Reconfiguration of Multiprocessors Based on Compile-Time Analysis

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Reconfigurable Technology and Systems
      ACM Transactions on Reconfigurable Technology and Systems  Volume 3, Issue 3
      September 2010
      231 pages
      ISSN:1936-7406
      EISSN:1936-7414
      DOI:10.1145/1839480
      Issue’s Table of Contents

      Copyright © 2010 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 September 2010
      • Accepted: 1 May 2009
      • Revised: 1 March 2009
      • Received: 1 July 2008
      Published in trets Volume 3, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!