Abstract
In multiprocessors, performance improvement is typically achieved by exploring parallelism with fixed granularities, such as instruction-level, task-level, or data-level parallelism. We introduce a new reconfiguration mechanism that facilitates variations in these granularities in order to optimize resource utilization in addition to performance improvements. Our reconfigurable multiprocessor QuadroCore combines the advantages of reconfigurability and parallel processing. In this article, a unified hardware-software approach for the design of our QuadroCore is presented. This design flow is enabled via compiler-driven reconfiguration which matches application-specific characteristics to a fixed set of architectural variations. A special reconfiguration mechanism has been developed that alters the architecture within a single clock cycle.
The QuadroCore has been implemented on Xilinx XC2V6000 for functional validation and on UMC’s 90nm standard cell technology for performance estimation. A diverse set of applications have been mapped onto the reconfigurable multiprocessor to meet orthogonal performance characteristics in terms of time and power. Speedup measurements show a 2--11 times performance increase in comparison to a single processor. Additionally, the reconfiguration scheme has been applied to save power in data-parallel applications. Gate-level simulations have been performed to measure the power-performance trade-offs for two computationally complex applications. The power reports confirm that introducing this scheme of reconfiguration results in power savings in the range of 15--24%.
- }}Barretta, D., Fornaciari, W., Sami, M., and Pau, D. 2002. SIMD extension to VLIW multicluster processors for embedded applications. In Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD’02). IEEE Computer Society, Los Alamitos, CA, 523. Google Scholar
Digital Library
- }}Bonorden, O., Brüls, N., Le, D. K., Kastens, U., Meyer auf der Heide, F., Niemann, J.-C., Porrmann, M., Rueckert, U., Slowik, A., and Thies, M. 2003. A holistic methodology for network processor design. In Proceedings of the Workshop on High-Speed Local Networks held in conjunction with the 28th Annual IEEE Conference on Local Computer Networks (LCN’03). 583--592. Google Scholar
Digital Library
- }}Compton, K. and Hauck, S. 2002. Reconfigurable computing: A survey of systems and software. ACM Comput. Surv. 34, 2, 171--210. Google Scholar
Digital Library
- }}Dietz, H., Schwederski, T., O’Keefe, M., and Zaafrani, A. 1989. Static synchronization beyond VLIW. In Proceedings of the ACM/IEEE Conference on Supercomputing (Supercomputing’89). ACM Press, New York, 416--425. Google Scholar
Digital Library
- }}Dreesen, R., Hussmann, M., Thies, M., and Kastens, U. 2007. Register allocation for processors with dynamically reconfigurable register banks. In Proceedings of the 5th Workshop on Optimizations for DSP and Embedded Systems (ODES) held in conjunction with the 5th IEEE/ACM International Symposium on Code Generation and Optimization (CGO’07).Google Scholar
- }}Ellis, J. R. 1986. Bulldog: A Compiler for VLIW Architectures. MIT Press. Google Scholar
Digital Library
- }}Fischer, D., Teich, J., Weper, R., and Thies, M. 2003. BUILDABONG: A framework for architecture/compiler co-exploration for ASIPs. J. Circ. Syst. Comput. 12, 3, 353--375.Google Scholar
Cross Ref
- }}Gonzalez, R. E. 2006. A software-configurable processor architecture. IEEE Micro 26, 5, 42--51. Google Scholar
Digital Library
- }}Gruenewald, M., Kastens, U., Le, D. K., Niemann, J.-C., Porrmann, M., Rueckert, U., Thies, M., and Slowik, A. 2004. Network application driven instruction set extensions for embedded processing clusters. In Proceedings of the International Conference on Parallel Computing in Electrical Engineering (PARELEC’04). 209--214. Google Scholar
Digital Library
- }}Gupta, R. 1990. Employing register channels for the exploitation of instruction level parallelism. In Proceedings of the 2nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP’90). ACM Press, New York, 118--127. Google Scholar
Digital Library
- }}Halfhill, T. R. 2006. Ambric’s new parallel processor. Tech. rep., (microprocessors report). http://www.ambric.com.Google Scholar
- }}Hennessy, J. L. and Patterson, D. L. 2006. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, San Francisco, CA. Google Scholar
Digital Library
- }}Hussmann, M. 2008. Compiler-Driven dynamic reconfiguration of architectural variants. Ph.D. thesis, University of Paderborn.Google Scholar
- }}Hussmann, M., Thies, M., and Kastens, U. 2005. Parallelizing compilation through load-time scheduling for a superscalar processor family. In Proceedings of the 3rd Workshop on Optimizations for DSP and Embedded Systems (ODES) held in conjunction with the 3rd IEEE/ACM International Symposium on Code Generation and Optimization (CGO’05).Google Scholar
- }}Hussmann, M., Thies, M., Kastens, U., Purnaprajna, M., Porrmann, M., and Rueckert, U. 2007. Compiler-driven reconfiguration of multiprocessors. In Proceedings of the Workshop on Application Specific Processors (WASP) held in conjunction with the Embedded Systems Week (CODES+ISSS, EMSOFT, and CASES), 3--10.Google Scholar
- }}Ito, M., Hattori, T., Yoshida, Y., Hayase, K., Hayashi, T., Nishii, O., Yasu, Y., Hasegawa, A., Takada, M., Ito, M., Mizuno, H., Uchiyama, K., Odaka, T., Shirako, J., Mase, M., Kimura, K., and Kasahara, H. 2008. An 8640 MIPS SoC with independent power-off control of 8 CPUs and 8 RAMs by an automatic parallelizing compiler. In Digest of Technical Papers on IEEE International Solid-State Circuits Conference (ISSCC’08). 90--598.Google Scholar
- }}Karypis, G. and Kumar, V. 1998. Multilevel algorithms for multi-constraint graph partitioning. In Proceedings of the ACM/IEEE Conference on Supercomputing (Supercomputing’98). IEEE Computer Society, Los Alamitos, CA, 1--13. Google Scholar
Digital Library
- }}Kennedy, K. and Allen, J. R. 2002. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers, San Francisco, CA. Google Scholar
Digital Library
- }}Kohonen, T. 1989. Self-Organization and Associative Memory. Springer, New York. Google Scholar
Digital Library
- }}Lambrechts, A., Raghavan, P., Leroy, A., Talavera, G., Aa, T., Jayapala, M., Catthoor, F., Verkest, D., Deconinck, G., Corporaal, H., Robert, F., and Carrabina, J. 2005. Power breakdown analysis for a heterogeneous NoC platform running a video application. In Proceedings of the 16th IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP’05). 179--184. Google Scholar
Digital Library
- }}Larsen, S. and Amarasinghe, S. 2000. Exploiting superword level parallelism with multimedia instruction sets. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’00). ACM Press, New York, 145--156. Google Scholar
Digital Library
- }}Larsen, S., Rugina, R., and Amarasinghe, S. 2000. Alignment analysis. Tech. rep. LCS-TM-605, Massachusetts Institute of Technology.Google Scholar
- }}Mei, B., Lambrechts, A., Verkest, D., Mignolet, J.-Y., and Lauwereins, R. 2005. Architecture exploration for a reconfigurable architecture template. IEEE Des. Test 22, 2, 90--101. Google Scholar
Digital Library
- }}Muchnik, S. S. 1997. Advanced Compiler Design Implementation. Morgan Kaufmann Publishers, San Francisco, CA. Google Scholar
Digital Library
- }}Niemann, J.-C., Puttmann, C., Porrmann, M., and Rueckert, U. 2006. Giganetic: A scalable embedded on-chip multiprocessor architecture for network applications. In Proceedings of the Conference on Architecture of Computing Systems (ARCS’06). Google Scholar
Digital Library
- }}Niemann, J.-C., Puttmann, C., Porrmann, M., and Rueckert, U. 2007. Resource efficiency of the GigaNetIC chip multiprocessor architecture. J. Syst. Archit. 53, 5-6, 285--299 (Special issue on architectural premises for pervasive computing). Google Scholar
Digital Library
- }}Porrmann, M., Hagemeyer, J., Romoth, J., and Strugholtz, M. 2009. Rapid prototyping of next-generation multiprocessor SoCs. In Proceedings of the Semiconductor Conference (SCD’09).Google Scholar
- }}Purnaprajna, M., Puttmann, C., and Porrmann, M. 10-14 March 2008. Power aware reconfigurable multiprocessor for elliptic curve cryptography. In Proceedings of the Design, Automation, and Test in Europe (DATE’08). 1462--1467. Google Scholar
Digital Library
- }}Sankaralingam, K., Nagarajan, R., McDonald, R., Desikan, R., Drolia, S., Govindan, M. S., Gratz, P., Gulati, D., Hanson, H., Kim, C., Liu, H., Ranganathan, N., Sethumadhavan, S., Sharif, S., Shivakumar, P., Keckler, S. W., and Burger, D. 2006. Distributed microarchitectural protocols in the trips prototype processor. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’06). IEEE Computer Society, Los Alamitos, CA, 480--491. Google Scholar
Digital Library
- }}Silicore. 2002. Wishbone system-on-chip (SoC) interconnection architecture for portable IP cores. Tech. rep. http://www.opencores.org.Google Scholar
- }}Zhong, H., Lieberman, S. A., and Mahlke, S. A. 2007. Extending multicore architectures to exploit hybrid parallelism in single-thread applications. In Proceedings of the IEEE 13th International Symposium on High Performance Computer Architecture (HPCA’07). IEEE Computer Society, Los Alamitos, CA, 25--36. Google Scholar
Digital Library
Index Terms
Runtime Reconfiguration of Multiprocessors Based on Compile-Time Analysis
Recommendations
Runtime Temporal Partitioning Assembly to Reduce FPGA Reconfiguration Time
RECONFIG '09: Proceedings of the 2009 International Conference on Reconfigurable Computing and FPGAsLarge applications that exceed available FPGA resources must time-multiplex these resources using smaller hardware modules. In order to orchestrate this time-multiplexing, temporal partitioning partitions these hardware modules into multiple subsets, ...
Exploiting Partial Runtime Reconfiguration for High-Performance Reconfigurable Computing
Runtime Reconfiguration (RTR) has been traditionally utilized as a means for exploiting the flexibility of High-Performance Reconfigurable Computers (HPRCs). However, the RTR feature comes with the cost of high configuration overhead which might ...






Comments