skip to main content
research-article

Operation and data mapping for CGRAs with multi-bank memory

Published:13 April 2010Publication History
Skip Abstract Section

Abstract

Coarse Grain Reconfigurable Architectures (CGRAs) promise high performance at high power efficiency. They fulfil this promise by keeping the hardware extremely simple, and moving the complexity to application mapping. One major challenge comes in the form of data mapping. For reasons of power-efficiency and complexity, CGRAs use multi-bank local memory, and a row of PEs share memory access. In order for each row of the PEs to access any memory bank, there is a hardware arbiter between the memory requests generated by the PEs and the banks of the local memory. However, a fundamental restriction remains that a bank cannot be accessed by two different PEs at the same time. We propose to meet this challenge by mapping application operations onto PEs and data into memory banks in a way that avoids such conflicts. Our experimental results on kernels from multimedia benchmarks demonstrate that our local memory-aware compilation approach can generate mappings that are up to 40% better in performance (17.3% on average) compared to a memory-unaware scheduler.

References

  1. B. Bougard, B. De Sutter, D. Verkest, L. Van der Perre, and R. Lauwereins. A coarse-grained array accelerator for softwaredefined radio baseband processing. IEEE Micro, 28(4):41--50, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. F. Bouwens. Power and performance optimization for adres. Master's thesis, Delft University of Technology, 2006.Google ScholarGoogle Scholar
  3. G. Dimitroulakos, M. D. Galanis, and C. E. Goutis. Alleviating the data memory bandwidth bottleneck in coarse-grained reconfigurable arrays. In ASAP '05: Proceedings of the 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors, pages 161--168, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Dimitroulakos, S. Georgiopoulos, M. D. Galanis, and C. E. Goutis. Resource aware mapping on coarse grained reconfigurable arrays. Microprocess. Microsyst., 33(2):91--105, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Ahn, J. Yoon, Y. Paek, Y. Kim, M. Kiemb, and K. Choi. A spatial mapping algorithm for heterogeneous coarse-grained reconfigurable architectures. In DATE '06: Proceedings of the conference on Design, automation and test in Europe, pages 363--368, 3001 Leuven, Belgium, Belgium, 2006. European Design and Automation Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Hatanaka and N. Bagherzadeh. A modulo scheduling algorithm for a coarse-grain reconfigurable array template. In Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International, pages 1--8, March 2007.Google ScholarGoogle ScholarCross RefCross Ref
  7. Y. Kim, M. Kiemb, C. Park, J. Jung, and K. Choi. Resource sharing and pipelining in coarse-grained reconfigurable architecture for domain-specific optimization. In DATE '05: Proceedings of the conference on Design, Automation and Test in Europe, pages 12--17, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. Kim, J. Lee, A. Shrivastava, J. Yoon, and Y. Paek. Memory-aware application mapping on coarse-grained reconfigurable arrays. In HiPEAC 2010, LNCS 5952, pages 171--185, 2010. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Lee, K. Choi, and N. D. Dutt. An algorithm for mapping loops onto coarse-grained reconfigurable architectures. ACM SIGPLAN Notices, 38(7):183--188, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Lee, K. Choi, and N. Dutt. Compilation approach for coarse-grained reconfigurable architectures. IEEE D&T, 20:26--33, Jan./Feb. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Lee, K. Choi, and N. Dutt. Evaluating memory architectures for media applications on coarse-grained reconfigurable architectures. In ASAP '03: Proceedings of the conference on application-specific systems, architectures, and processors, pages 172--182, 2003. IEEE Computer Society.Google ScholarGoogle Scholar
  12. B. Mei, S. Vernalde, D. Verkest, H. De Man, and R. Lauwereins. Dresc: a retargetable compiler for coarse-grained reconfigurable architectures. pages 166--173, Dec. 2002.Google ScholarGoogle Scholar
  13. B. Mei, S. Vernalde, D. Verkest, H. De Man, and R. Lauwereins. Adres: An architecture with tightly coupled vliw processor and coarsegrained reconfigurable matrix. pages 61--70. 2003.Google ScholarGoogle Scholar
  14. T. Oh, B. Egger, H. Park, and S. Mahlke. Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures. SIGPLAN Not., 44(7):21--30, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Park, K. Fan, M. Kudlur, and S. Mahlke. Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures. In CASES '06: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, pages 136--146, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Park, K. Fan, S. A. Mahlke, T. Oh, H. Kim, and H.-s. Kim. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pages 166--176, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. O. Shields, Jr. Area efficient layouts of binary trees in grids. PhD thesis, 2001. Supervisor-Ivan Hal Sudborough. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Singh, G. Lu, E. Filho, R. Maestre, M.-H. Lee, F. Kurdahi, and N. Bagherzadeh. Morphosys: case study of a reconfigurable computing system targeting multimedia applications. In DAC '00: Proceedings of the 37th Annual Design Automation Conference, pages 573--578, New York, NY, USA, 2000. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y. Tamir and G. L. Frazier. Dynamically-allocated multi-queue buffers for vlsi communication switches. IEEE Trans. Comput., 41(6):725--737, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Venkataramani, W. Najjar, F. Kurdahi, N. Bagherzadeh, and W. Bohm. A compiler framework for mapping applications to a coarse-grained reconfigurable computer architecture. In CASES '01: Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems, pages 116--125, New York, NY, USA, 2001. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J.W. Yoon, A. Shrivastava, S. Park, M. Ahn, R. Jeyapaul, and Y. Paek. Spkm: a novel graph drawing based algorithm for application mapping onto coarse-grained reconfigurable architectures. In ASP--DAC '08, pages 776--782, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Operation and data mapping for CGRAs with multi-bank memory

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!