skip to main content
research-article

(FPL 2015) Scavenger: Automating the Construction of Application-Optimized Memory Hierarchies

Published:22 March 2017Publication History
Skip Abstract Section

Abstract

High-level abstractions separate algorithm design from platform implementation, allowing programmers to focus on algorithms while building complex systems. This separation also provides system programmers and compilers an opportunity to optimize platform services on an application-by-application basis. In field-programmable gate arrays (FPGAs), platform-level malleability extends to the memory system: Unlike general-purpose processors, in which memory hardware is fixed at design time, the capacity, associativity, and topology of FPGA memory systems may all be tuned to improve application performance. Since application kernels may only explicitly use few memory resources, substantial memory capacity may be available to the platform for use on behalf of the user program. In this work, we present Scavenger, which utilizes spare resources to construct program-optimized memories, and we also perform an initial exploration of methods for automating the construction of these application-specific memory hierarchies. Although exploiting spare resources can be beneficial, naïvely consuming all memory resources may cause frequency degradation. To relieve timing pressure in large block RAM (BRAM) structures, we provide microarchitectural techniques to trade memory latency for design frequency. We demonstrate, by examining a set of benchmarks, that our scalable cache microarchitecture achieves performance gains of 7% to 74% (with a 26% geometric mean on average) over the baseline cache microarchitecture when scaling the size of first-level caches to the maximum.

References

  1. Michael Adler, Kermin Fleming, Angshuman Parashar, Michael Pellauer, and Joel Emer. 2011. LEAP scratchpads: Automatic memory and cache management for reconfigurable logic. In International Symposium on Field-Programmable Gate Arrays (FPGA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Amit Agarwal, Kaushik Roy, and T. N. Vijaykumar. 2003. Exploring high bandwidth pipelined cache architecture for scaled technology. In Design, Automation Test in Europe Conference Exhibition (DATE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Tomasz Czajkowski, Stephen D. Brown, and Jason H. Anderson. 2013. LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems. ACM Trans. Embed. Comput. Syst. 13, 2 (2013), 24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Choi, K. Nam, A. Canis, J. Anderson, S. Brown, and T. Czajkowski. 2012. Impact of cache architecture and interface on performance and area of FPGA-based processor/parallel-accelerator systems. In International Symposium on Field-Programmable Custom Computing Machines (FCCM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Eric S. Chung, James C. Hoe, and Ken Mai. 2011. CoRAM: An in-fabric memory abstraction for FPGA-based computing. In International Symposium on Field-Programmable Gate Arrays (FPGA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jason Cong, Bin Liu, Stephen Neuendorffer, Juanjo Noguera, Kees Vissers, and Zhiru Zhang. 2011. High-level synthesis for FPGAs: From prototyping to deployment. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 30, 4 (2011), 473--491. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Dessouky, M. J. Klaiber, D. G. Bailey, and S. Simon. 2014. Adaptive dynamic on-chip memory management for FPGA-based reconfigurable architectures. In International Conference on Field-Programmable Logic and Applications (FPL).Google ScholarGoogle Scholar
  8. Jeffrey R. Diamond, Donald S. Fussell, and Stephen W. Keckler. 2014. Arbitrary modulus indexing. In International Symposium on Microarchitecture (MICRO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kermin Fleming, Hsin-Jung Yang, Michael Adler, and Joel Emer. 2014. The LEAP FPGA operating system. In International Conference on Field-Programmable Logic and Applications (FPL).Google ScholarGoogle ScholarCross RefCross Ref
  10. Q. S. Gao. 1993. The chinese remainder theorem and the prime memory system. In ACM SIGARCH Computer Architecture News. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Göhringer, L. Meder, M. Hübner, and J. Becker. 2011. Adaptive multi-client network-on-chip memory. In International Conference on Reconfigurable Computing and FPGAs (RECONFIG). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. 2002. An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24, 7 (2002), 881--892. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Changkyu Kim, Doug Burger, and Stephen W. Keckler. 2002. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Charles Eric LaForest and J. Gregory Steffan. 2010. Efficient multi-ported memories for FPGAs. In International Symposium on Field-Programmable Gate Arrays (FPGA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Lange, T. Wink, and A. Koch. 2011. MARC ii: A parametrized speculative multi-ported memory subsystem for reconfigurable computers. In Design, Automation Test in Europe Conference Exhibition (DATE).Google ScholarGoogle Scholar
  16. Duncan H. Lawrie and Chandra R. Vora. 1982. The prime memory system for array access. IEEE Trans. Comput. 31, 5 (1982), 435--442. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Eric Matthews, Nicholas C. Doyle, and Lesley Shannon. 2015. Design space exploration of L1 data caches for FPGA-based multiprocessor systems. In International Symposium on Field-Programmable Gate Arrays (FPGA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. Matthews, L. Shannon, and A. Fedorova. 2012. Polyblaze: From one to many bringing the microblaze into the multicore era with linux SMP support. In International Conference on Field-Programmable Logic and Applications (FPL).Google ScholarGoogle Scholar
  19. Vincent Mirian and Paul Chow. 2012. FCache: A system for cache coherent processing on FPGAs. In International Symposium on Field-Programmable Gate Arrays (FPGA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Pellauer, M. Adler, M. Kinsy, A. Parashar, and J. Emer. 2011. HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing. In International Symposium on High-Performance Computer Architecture (HPCA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jason Villarreal, Adrian Park, Walid Najjar, and Robert Halstead. 2010. Designing modular hardware accelerators in c with ROCCC 2.0. In International Symposium on Field-Programmable Custom Computing Machines (FCCM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Vivado. 2012. Vivado High-Level Synthesis. Retrieved from http://www.xilinx.com/products/design-tools/vivado/integration/esl-design.html.Google ScholarGoogle Scholar
  23. Felix Winterstein, Samuel Bayliss, and George A. Constantinides. 2014. Separation logic-assisted code transformations for efficient high-level synthesis. In International Symposium on Field-Programmable Custom Computing Machines (FCCM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Hsin-Jung Yang, Kermin Fleming, Michael Adler, and Joel Emer. 2014. LEAP shared memories: Automating the construction of FPGA coherent memories. In International Symposium on Field-Programmable Custom Computing Machines (FCCM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Hsin-Jung Yang, Kermin Fleming, Michael Adler, Felix Winterstein, and Joel Emer. 2015. Scavenger: Automating the construction of application-optimized memory hierarchies. In International Conference on Field-Programmable Logic and Applications (FPL).Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. (FPL 2015) Scavenger: Automating the Construction of Application-Optimized Memory Hierarchies

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Reconfigurable Technology and Systems
      ACM Transactions on Reconfigurable Technology and Systems  Volume 10, Issue 2
      Special Section on Field Programmable Logic and Applications 2015 and Regular Papers
      June 2017
      133 pages
      ISSN:1936-7406
      EISSN:1936-7414
      DOI:10.1145/3068424
      • Editor:
      • Steve Wilton
      Issue’s Table of Contents

      Copyright © 2017 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 March 2017
      • Accepted: 1 October 2016
      • Revised: 1 August 2016
      • Received: 1 April 2016
      Published in trets Volume 10, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!