10.1145/1629395.1629433acmconferencesArticle/Chapter ViewAccess DenialPublication PagesesweekConference Proceedingsconference-collections
research-article

CGRA express: accelerating execution using dynamic operation fusion

Published:11 October 2009Publication History

ABSTRACT

Coarse-grained reconfigurable architectures (CGRAs) present an appealing hardware platform by providing programmability with the potential for high computation throughput, scalability, low cost, and energy efficiency. CGRAs have been effectively used for innermost loops that contain an abundant of instruction-level parallelism. Conversely, non-loop and outer-loop code are latency constrained and do not offer significant amounts of instruction-level parallelism. In these situations, CGRAs are ineffective as the majority of the resources remain idle. In this paper, dynamic operation fusion is introduced to enable CGRAs to effectively accelerate latency-constrained code regions. Dynamic operation fusion is enabled through the combination of a small bypass network added between function units in a conventional CGRA and a sub-cycle modulo scheduler to automatically identify opportunities for fusion. Results show that dynamic operation fusion reduced total application run-time by up to 17% on a 4x4 CGRA.

Get full access to this Publication

Purchase, subscribe or recommend this publication to your librarian.

References

  1. P. Bonzini, G. Ansaloni, and L. Pozzi. Compiling custom instructions onto expression-grained reconfigurable architectures. In Proc. of the 2008 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pages 51--59, Oct. 2008. Google ScholarGoogle Scholar
  2. T. Callahan, J. Hauser, and J. Wawrzynek. The Garp architecture and C compiler. IEEE Computer, 33(4):62--69, Apr. 2000. Google ScholarGoogle Scholar
  3. N. Clark et al. Application-specific processing on a general-purpose core via transparent instruction set customization. In Proc. of the 37th Annual International Symposium on Microarchitecture, pages 30--40, Dec. 2004. Google ScholarGoogle Scholar
  4. N. Clark et al. An architecture framework for transparent instruction set customization in embedded processors. In Proc. of the 32nd Annual International Symposium on Computer Architecture, pages 272--283, June 2005. Google ScholarGoogle Scholar
  5. N. Clark, A. Hormati, and S. Mahlke. VEAL: Virtualized execution accelerator for loops. In Proc. of the 35th Annual International Symposium on Computer Architecture, pages 389?400, June 2008. Google ScholarGoogle Scholar
  6. C. Ebeling et al. Mapping applications to the RaPiD configurable architecture. In Proc. of the 5th IEEE Symposium on Field-Programmable Custom Computing Machines, pages 106--115, Apr. 1997. Google ScholarGoogle Scholar
  7. S. Goldstein et al. PipeRench: A coprocessor for streaming multimedia acceleration. In Proc. of the 26th Annual International Symposium on Computer Architecture, pages 28--39, June 1999. Google ScholarGoogle Scholar
  8. A. Hormati et al. Exploiting narrow accelerators with data-centric subgraph mapping. In Proc. of the 2007 International Symposium on Code Generation and Optimization, pages 341--353, Mar. 2007. Google ScholarGoogle Scholar
  9. Y. Kim and R. N. Mahapatra. A new array fabric for coarse-grained reconfigurable architecture. In Proc. of the 34th Euromicro Conference, pages 584--591, Sept. 2008. Google ScholarGoogle Scholar
  10. Y. Kim, I. Park, K. Choi, and Y. Paek. Power-conscious configuration cache structure and code mapping for coarse-grained reconfigurable architecture. In Proc. of the 2006 International Symposium on Low Power Electronics and Design, Oct. 2006. Google ScholarGoogle Scholar
  11. J. Lee, K. Choi, and N. Dutt. Compilation approach for coarse-grained reconfigurable architectures. IEEE Journal of Design&Test of Computers, 20(1):26--33, Jan. 2003. Google ScholarGoogle Scholar
  12. W. Lee, R. Barua, M. Frank, D. Srikrishna, J. Babb, V. Sarkar, and S. Amarasinghe. Space-time scheduling of instruction-level parallelism on a RAW machine. In Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 46--57, Oct. 1998. Google ScholarGoogle Scholar
  13. G. Lu et al. The MorphoSys parallel reconfigurable system. In Proc. of the 5th International Euro-Par Conference, pages 727--734, 1999. Google ScholarGoogle Scholar
  14. B. Mei et al. Adres: An architecture with tightly coupled vliw processor and coarse-grained reconfigurable matrix. In Proc. of the 2003 International Conference on Field Programmable Logic and Applications, pages 61--70, Aug. 2003.Google ScholarGoogle Scholar
  15. B. Mei et al. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. In Proc. of the 2003 Design, Automation and Test in Europe, pages 296--301, Mar. 2003. Google ScholarGoogle Scholar
  16. B. Mei, F. Veredas, and B. Masschelein. Mapping an H.264/AVC decoder onto the ADRES reconfigurable architecture. In Proc. of the 2005 International Conference on Field Programmable Logic and Applications, pages 622--625, Aug. 2005.Google ScholarGoogle Scholar
  17. H. Park, K. Fan, M. Kudlur, and S. Mahlke. Modulo graph embedding: Mapping applications onto coarse-grained reconfigurable architectures. In Proc. of the 2006 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pages 136--146, Oct. 2006. Google ScholarGoogle Scholar
  18. H. Park, K. Fan, S. Mahlke, T. Oh, H. Kim, and H. seok Kim. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In Proc. of the 17th International Conference on Parallel Architectures and Compilation Techniques, pages 166--176, Oct. 2008. Google ScholarGoogle Scholar
  19. M. Quax, J. Huisken, and J. Meerbergen. A scalable implementation of a reconfigurable WCDMA RAKE receiver. In Proc. of the 2004 Design, Automation and Test in Europe, pages 230--235, Mar. 2004. Google ScholarGoogle Scholar
  20. B. R. Rau. Iterative modulo scheduling: An algorithm for software pipelining loops. In Proc. of the 27th Annual International Symposium on Microarchitecture, pages 63--74, Nov. 1994. Google ScholarGoogle Scholar
  21. M. B. Taylor et al. The Raw microprocessor: A computational fabric for software circuits and general purpose programs. IEEE Micro, 22(2):25--35, 2002. Google ScholarGoogle Scholar

Index Terms

  1. CGRA express: accelerating execution using dynamic operation fusion

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CASES '09: Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
          October 2009
          298 pages
          ISBN:9781605586267
          DOI:10.1145/1629395

          Copyright © 2009 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Check for updates

          Qualifiers

          • research-article
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!