Abstract
Coarse Grain Reconfigurable Architectures (CGRAs) promise high performance at high power efficiency. They fulfil this promise by keeping the hardware extremely simple, and moving the complexity to application mapping. One major challenge comes in the form of data mapping. For reasons of power-efficiency and complexity, CGRAs use multi-bank local memory, and a row of PEs share memory access. In order for each row of the PEs to access any memory bank, there is a hardware arbiter between the memory requests generated by the PEs and the banks of the local memory. However, a fundamental restriction remains that a bank cannot be accessed by two different PEs at the same time. We propose to meet this challenge by mapping application operations onto PEs and data into memory banks in a way that avoids such conflicts. Our experimental results on kernels from multimedia benchmarks demonstrate that our local memory-aware compilation approach can generate mappings that are up to 40% better in performance (17.3% on average) compared to a memory-unaware scheduler.
- B. Bougard, B. De Sutter, D. Verkest, L. Van der Perre, and R. Lauwereins. A coarse-grained array accelerator for softwaredefined radio baseband processing. IEEE Micro, 28(4):41--50, 2008. Google Scholar
Digital Library
- F. Bouwens. Power and performance optimization for adres. Master's thesis, Delft University of Technology, 2006.Google Scholar
- G. Dimitroulakos, M. D. Galanis, and C. E. Goutis. Alleviating the data memory bandwidth bottleneck in coarse-grained reconfigurable arrays. In ASAP '05: Proceedings of the 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors, pages 161--168, Washington, DC, USA, 2005. IEEE Computer Society. Google Scholar
Digital Library
- G. Dimitroulakos, S. Georgiopoulos, M. D. Galanis, and C. E. Goutis. Resource aware mapping on coarse grained reconfigurable arrays. Microprocess. Microsyst., 33(2):91--105, 2009. Google Scholar
Digital Library
- M. Ahn, J. Yoon, Y. Paek, Y. Kim, M. Kiemb, and K. Choi. A spatial mapping algorithm for heterogeneous coarse-grained reconfigurable architectures. In DATE '06: Proceedings of the conference on Design, automation and test in Europe, pages 363--368, 3001 Leuven, Belgium, Belgium, 2006. European Design and Automation Association. Google Scholar
Digital Library
- A. Hatanaka and N. Bagherzadeh. A modulo scheduling algorithm for a coarse-grain reconfigurable array template. In Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International, pages 1--8, March 2007.Google Scholar
Cross Ref
- Y. Kim, M. Kiemb, C. Park, J. Jung, and K. Choi. Resource sharing and pipelining in coarse-grained reconfigurable architecture for domain-specific optimization. In DATE '05: Proceedings of the conference on Design, Automation and Test in Europe, pages 12--17, Washington, DC, USA, 2005. IEEE Computer Society. Google Scholar
Digital Library
- Y. Kim, J. Lee, A. Shrivastava, J. Yoon, and Y. Paek. Memory-aware application mapping on coarse-grained reconfigurable arrays. In HiPEAC 2010, LNCS 5952, pages 171--185, 2010. Springer-Verlag. Google Scholar
Digital Library
- J. Lee, K. Choi, and N. D. Dutt. An algorithm for mapping loops onto coarse-grained reconfigurable architectures. ACM SIGPLAN Notices, 38(7):183--188, 2003. Google Scholar
Digital Library
- J. Lee, K. Choi, and N. Dutt. Compilation approach for coarse-grained reconfigurable architectures. IEEE D&T, 20:26--33, Jan./Feb. 2003. Google Scholar
Digital Library
- J. Lee, K. Choi, and N. Dutt. Evaluating memory architectures for media applications on coarse-grained reconfigurable architectures. In ASAP '03: Proceedings of the conference on application-specific systems, architectures, and processors, pages 172--182, 2003. IEEE Computer Society.Google Scholar
- B. Mei, S. Vernalde, D. Verkest, H. De Man, and R. Lauwereins. Dresc: a retargetable compiler for coarse-grained reconfigurable architectures. pages 166--173, Dec. 2002.Google Scholar
- B. Mei, S. Vernalde, D. Verkest, H. De Man, and R. Lauwereins. Adres: An architecture with tightly coupled vliw processor and coarsegrained reconfigurable matrix. pages 61--70. 2003.Google Scholar
- T. Oh, B. Egger, H. Park, and S. Mahlke. Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures. SIGPLAN Not., 44(7):21--30, 2009. Google Scholar
Digital Library
- H. Park, K. Fan, M. Kudlur, and S. Mahlke. Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures. In CASES '06: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, pages 136--146, New York, NY, USA, 2006. ACM. Google Scholar
Digital Library
- H. Park, K. Fan, S. A. Mahlke, T. Oh, H. Kim, and H.-s. Kim. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pages 166--176, New York, NY, USA, 2008. ACM. Google Scholar
Digital Library
- C. O. Shields, Jr. Area efficient layouts of binary trees in grids. PhD thesis, 2001. Supervisor-Ivan Hal Sudborough. Google Scholar
Digital Library
- H. Singh, G. Lu, E. Filho, R. Maestre, M.-H. Lee, F. Kurdahi, and N. Bagherzadeh. Morphosys: case study of a reconfigurable computing system targeting multimedia applications. In DAC '00: Proceedings of the 37th Annual Design Automation Conference, pages 573--578, New York, NY, USA, 2000. ACM. Google Scholar
Digital Library
- Y. Tamir and G. L. Frazier. Dynamically-allocated multi-queue buffers for vlsi communication switches. IEEE Trans. Comput., 41(6):725--737, 1992. Google Scholar
Digital Library
- G. Venkataramani, W. Najjar, F. Kurdahi, N. Bagherzadeh, and W. Bohm. A compiler framework for mapping applications to a coarse-grained reconfigurable computer architecture. In CASES '01: Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems, pages 116--125, New York, NY, USA, 2001. ACM Press. Google Scholar
Digital Library
- J.W. Yoon, A. Shrivastava, S. Park, M. Ahn, R. Jeyapaul, and Y. Paek. Spkm: a novel graph drawing based algorithm for application mapping onto coarse-grained reconfigurable architectures. In ASP--DAC '08, pages 776--782, 2008. Google Scholar
Digital Library
Index Terms
Operation and data mapping for CGRAs with multi-bank memory
Recommendations
Memory access optimization in compilation for coarse-grained reconfigurable architectures
Coarse-grained reconfigurable architectures (CGRAs) promise high performance at high power efficiency. They fulfil this promise by keeping the hardware extremely simple, and moving the complexity to application mapping. One major challenge comes in the ...
Operation and data mapping for CGRAs with multi-bank memory
LCTES '10: Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systemsCoarse Grain Reconfigurable Architectures (CGRAs) promise high performance at high power efficiency. They fulfil this promise by keeping the hardware extremely simple, and moving the complexity to application mapping. One major challenge comes in the ...
Compiler-Based Performance Evaluation of an SIMD Processor with a Multi-Bank Memory Unit
The single instruction multiple data (SIMD) architecture is very efficient for executing arithmetic intensive programs, but frequently suffers from data-alignment problems. The data-alignment problem not only induces extra time overhead but also hinders ...







Comments