skip to main content
research-article

Scientific Application Demands on a Reconfigurable Functional Unit Interface

Published:01 May 2011Publication History
Skip Abstract Section

Abstract

Modern scientific applications are large, complex, and highly parallel they are commonly executed on supercomputers with tens of thousands of processors. Yet these applications still commonly require weeks or even months to execute. Thus, single-thread performance remains a concern for highly parallel scientific applications. Adding a reconfigurable accelerator to each CPU could improve system performance; however, scientific applications have design constraints that differ from most application domains commonly accelerated by reconfigurable logic. In this article, we discuss the constraints imposed by scientific applications on the computation model, the accelerator architecture, and the accelerator’s communication interface with the CPU. Based on these constraints and application analysis, we have previously proposed adding a Reconfigurable Functional Unit (RFU) to accelerate integer graphs that calculate complex memory addresses. In this work, we now propose a flexible multi-instruction interface technique that allows dataflow graphs implemented on the RFU to access a large number of inputs and outputs with minor CPU datapath modifications. We present an in-depth examination of the performance effects of different communication interfaces that use this technique, and select one that best matches the needs of Sandia’s scientific applications. Although RFU execution overall improves performance, we also isolate two key negative performance effects introduced by aggregating CPU instructions into dataflow graphs: delayed issue and graph serialization. Finally, to demonstrate the marketability of an RFU beyond scientific applications, we reanalyze the proposed interfaces using the SPEC-fp benchmark suite. We show that although choosing an interface based on SPEC-fp needs is detrimental to Sandia application performance, choosing an interface based on Sandia demands works well for more general-purpose applications.

References

  1. Arnold, J. M. 2005. S5: The architecture and development flow of a software configurable processor. In Proceedings of the IEEE International Conference on Field Programmable Technology. 121--128.Google ScholarGoogle ScholarCross RefCross Ref
  2. Atasu, K., Mencer, O., Luk, W., Ozturan, C., and Dundar, G. 2008. Fast custom instruction identification by convex subgraph enumeration. In Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors (ASAP’08). 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bracy, A., Prahlad, P., and Roth, A. 2004. Dataflow mini-graphs: Amplifying superscalar capacity and bandwidth. In Proceedings of the 37th annual International Symposium on Microarchitecture (MICRO’37). IEEE Computer Society, 18--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bracy, A. and Roth, A. 2006. Serialization-Aware mini-graphs: Performance with fewer resources. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. 171--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Brisk, P., Kaplan, A., Kastner, R., and Sarrafzadeh, M. 2002. Instruction generation and regularity extraction for reconfigurable processors. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’02). ACM Press, New Yowk, 262--269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Burger, D. and Austin, T. M. 1997. The Simplescalar tool set, version 2.0. Tech. rep. CS-TR-97-1342.Google ScholarGoogle Scholar
  7. Clark, N., Blome, J., Chu, M., Mahlke, S., Biles, S., and Flautner, K. 2005. An architecture framework for transparent instruction set customization in embedded processors. In Proceedings of the 32nd International Symposium on Computer Architecture (ISCA’05). 0--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Clark, N., Zhong, H., and Mahlke, S. 2003. Processor acceleration through automated instruction set customization. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’36). IEEE Computer Society, 129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Compton, K. and Hauck, S. 2002. Reconfigurable computing: A survey of systems and software. ACM Comput. Surv. 34, 171--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cong, J., Fan, Y., Han, G., Jagannathan, A., Reinman, G., and Zhang, Z. 2005. Instruction set extension with shadow registers for configurable processors. Proceedings of the ACM/SIGDA 13th International Symposium on Field-Programmable Gate Arrays. 99--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cronquist, D. C., Fisher, C., Figueroa, M., Franklin, P., and Ebeling, C. 1999. Architecture design of reconfigurable pipelined datapaths. In Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI. 23--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dehon, A., Adams, J., Delorimier, M., Kapre, N., Matsuda, Y., Naeimi, H., Vanier, M., and Wrighton, M. 2004. Design patterns for reconfigurable computing. In Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’04). 13--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Evans, J., Rupnow, K., and Compton, K. 2007. Reconfigurable functional units for scientific superscalar processors. In Proceedings of the IEEE International Conference on Field-Programmable Technology. 73--80.Google ScholarGoogle Scholar
  14. Galuzzi, C., Bertels, K., and Vassiliadis, S. 2008. A linear complexity algorithm for the automatic generation of convex multiple input multiple output instructions. Int. J. Electron. 9, 1--17.Google ScholarGoogle Scholar
  15. Gara, A. 2005. Blue gene/l architecture. In Proceedings of the Supercomputer Best Practices Symposium. 1--2.Google ScholarGoogle Scholar
  16. Hauck, S., Fry, T. W., Hosler, M. M., and Kao, J. P. 1997. The Chimaera reconfigurable functional unit. In Proceedings of the 5th IEEE Symposium on FPGAs for Custom Computing Machines (FCCM’97). 87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hauser, J. R. and Wawrzynek, J. 1997. Garp: A MIPS processor with a reconfigurable coprocessor. In Proceedings of the 5th IEEE Symposium on FPGAs for Custom Computing Machines (FCCM’97). IEEE Computer Society Press, 12--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Henning, J. L. 2000. SPEC CPU2000: Measuring CPU performance in the new millennium. IEEE Comput. 33, 28--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kastner, R., Kaplan, A., Memik, S. O., and Bozorgzadeh, E. 2002. Instruction generation for hybrid reconfigurable systems. ACM Trans. Des. Autom. Electron. Syst. 7, 605--627. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. La Fratta, P., Rodrigues, A., and Underwood, K. D. 2007. Architectural extensions for executing floating point instruction aggregates. In CSRI Summer Proceedings. 2--22.Google ScholarGoogle Scholar
  21. Rao, D. and Kurdahi, F. 1993. On clustering for maximal regularity extraction. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 12, 1198--1208.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Razdan, R. and Smith, M. D. 1994. A high-performance microarchitecture with hardware-programmable functional units. In Proceedings of the 27th Annual International Symposium on Microarchitecture (MICRO’27). ACM Press New York, 172--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Rupnow, K., Rodrigues, A., Underwood, K. D., and Compton, K. 2006. Scientific applications vs. SPEC-FP: A comparison of program behavior. In Proceedings of the 20th Annual International Conference on Supercomputing (ICS’06). ACM Press, New York, 66--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Trimberger, S., Carberry, D., Johnson, A., and Wong, J. 1997. A time-multiplexed FPGA. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines. 22--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Underwood, K. D., Levenhagen, M., and Rodrigues, A. 2007. Simulating red storm: Challenges and successes in building a system simulation. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’07). 1--10.Google ScholarGoogle Scholar
  26. Wetzel, J., Silha, E., May, C., and Frey, B., Eds. 2003. PowerPC User Instruction Set Architecture, Book 1, Version 2.01. IBM.Google ScholarGoogle Scholar
  27. Wirthlin, M. J. and Hutchings, B. L. 1996. Sequencing run-time reconfigured hardware with software. In Proceedings of the 4th ACM International Symposium on Field-Programmable Gate Arrays (FPGA’96). ACM Press New York, 122--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Wittig, R. D. and Chow, P. 1996. OneChip: An FPGA processor with reconfigurable logic. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM’96). 126--135.Google ScholarGoogle Scholar

Index Terms

  1. Scientific Application Demands on a Reconfigurable Functional Unit Interface

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Reconfigurable Technology and Systems
        ACM Transactions on Reconfigurable Technology and Systems  Volume 4, Issue 2
        May 2011
        216 pages
        ISSN:1936-7406
        EISSN:1936-7414
        DOI:10.1145/1968502
        Issue’s Table of Contents

        Copyright © 2011 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 May 2011
        • Accepted: 1 February 2010
        • Revised: 1 August 2009
        • Received: 1 June 2008
        Published in trets Volume 4, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!