skip to main content
research-article

Microarchitecture and Circuits for a 200 MHz Out-of-Order Soft Processor Memory System

Published:09 December 2016Publication History
Skip Abstract Section

Abstract

Although FPGAs have grown in capacity, FPGA-based soft processors have grown very little because of the difficulty of achieving higher performance in exchange for area. Superscalar out-of-order processors promise large performance gains, and the memory subsystem is a key part of such a processor that must help supply increased performance. In this article, we describe and explore microarchitectural and circuit-level tradeoffs in the design of such a memory system. We show the significant instructions-per-cycle wins for providing various levels of out-of-order memory access and memory dependence speculation (1.32 × SPECint2000) and for the addition of a second-level cache (another 1.60 × ). With careful microarchitecture and circuit design, we also achieve a L1 translation lookaside buffers and cache lookup with 29% less logic delay than the simpler Nios II/f memory system.

References

  1. K. Aasaraai and A. Moshovos. 2010. An efficient non-blocking data cache for soft processors. In Proc. ReConFig. 19--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. M. S. Abdelhadi and G. G. F. Lemieux. 2015. Modular SRAM-based binary content-addressable memories. In Proc. FCCM. 207--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Altera. 2015. Nios II Gen2 Processor Reference Guide.Google ScholarGoogle Scholar
  4. ARM. 2012. ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition.Google ScholarGoogle Scholar
  5. J.-L. Brelet and L. Gopalakrishnan. 2002. Using Virtex-II Block RAM for High Performance Read/Write CAMs. Xilinx Application Note XAPP260. (2002).Google ScholarGoogle Scholar
  6. Christopher Celio, David A. Patterson, and Krste Asanovic. 2015. The Berkeley Out-of-Order Machine (BOOM): An Industry-Competitive, Synthesizable, Parameterized RISC-V Processor. Technical Report UCB/EECS-2015-167. EECS Department, University of California, Berkeley. Retrieved from http://www.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-167.html.Google ScholarGoogle Scholar
  7. Aeroflex Gaisler. 2015. GRLIP IP Core User’s Manual 1.4.1.Google ScholarGoogle Scholar
  8. M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Workshop on Workload Characterization(WWC-4). 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. John L. Hennessy and David A. Patterson. 2003. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Aamer Jaleel. 2007. Memory Characterization of Workloads Using Instrumentation-Driven Simulation. Technical Report. Intel VSSAD.Google ScholarGoogle Scholar
  11. David Kroft. 1981. Lockup-free instruction fetch/prefetch cache organization. In Proc. ISCA. Minneapolis, MN, 81--87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Belli Kuttanna. 2013. Technology Insight: Intel Silvermont Microarchitecture. IDF 2013, Retrieved from https://software.intel.com/sites/default/files/managed/bb/2c/02_Intel_Silvermont_Microarchitecture.pdf. (2013).Google ScholarGoogle Scholar
  13. Damjan Lampret. OpenRISC 1200 IP Core Specification.Google ScholarGoogle Scholar
  14. Kevin P. Lawton. 1996. Bochs: A portable PC emulator for Unix/X. Linux J. 1996, 29es, Article 7 (Sept. 1996). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yunsup Lee, A. Waterman, R. Avizienis, H. Cook, Chen Sun, V. Stojanovic, and K. Asanovic. 2014. A 45nm 1.3GHz 16.7 double-precision GFLOPS/W RISC-V processor with vector accelerators. In European Solid State Circuits Conference (ESSCIRC). 199--202.Google ScholarGoogle Scholar
  16. Shih-Lien L. Lu, Peter Yiannacouras, Rolf Kassa, Michael Konow, and Taeweon Suh. 2007. An FPGA-based Pentium in a complete desktop system. In Proc. FPGA. 53--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Andreas Moshovos. 1997. Dynamic speculation and synchronization of data dependencies. In Proc. ISCA. 181--193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Power, M. D. Hill, and D. A. Wood. 2014. Supporting x86-64 address translation for 100s of GPU lanes. In Proc. HPCA. 568--578.Google ScholarGoogle Scholar
  19. Graham Schelle, Jamison Collins, Ethan Schuchman, Perry Wang, Xiang Zou, Gautham Chinya, Ralf Plate, Thorsten Mattner, Franz Olbrich, Per Hammarlund, Ronak Singhal, Jim Brayton, Sebastian Steibl, and Hong Wang. 2010. Intel Nehalem processor core made FPGA synthesizable. In Proc. FPGA. 3--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. SPEC. 2000. SPEC CPU95 Results. Retrieved from https://www.spec.org/cpu95/results/. (2000).Google ScholarGoogle Scholar
  21. Perry H. Wang, Jamison D. Collins, Christopher T. Weaver, Blliappa Kuttanna, Shahram Salamian, Gautham N. Chinya, Ethan Schuchman, Oliver Schilling, Thorsten Doil, Sebastian Steibl, and Hong Wang. 2009. Intel Atom processor core made FPGA-synthesizable. In Proc. FPGA. 209--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Henry Wong, Vaughn Betz, and Jonathan Rose. 2011. Comparing FPGA vs. custom CMOS and the impact on processor microarchitecture. In Proc. FPGA. 5--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Henry Wong, Vaughn Betz, and Jonathan Rose. 2013. Efficient methods for out-of-order load/store execution for high-performance soft processors. In Proc. FPT. 442--445.Google ScholarGoogle ScholarCross RefCross Ref
  24. Jonathan D. Woodruff. 2014. CHERI: A RISC Capability Machine for Practical Memory Safety. Ph.D. Dissertation. University of Cambridge.Google ScholarGoogle Scholar
  25. Xilinx. 2014. MicroBlaze Processor Reference Guide.Google ScholarGoogle Scholar

Index Terms

  1. Microarchitecture and Circuits for a 200 MHz Out-of-Order Soft Processor Memory System

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Reconfigurable Technology and Systems
          ACM Transactions on Reconfigurable Technology and Systems  Volume 10, Issue 1
          March 2017
          206 pages
          ISSN:1936-7406
          EISSN:1936-7414
          DOI:10.1145/3002131
          • Editor:
          • Steve Wilton
          Issue’s Table of Contents

          Copyright © 2016 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 9 December 2016
          • Accepted: 1 July 2016
          • Revised: 1 June 2016
          • Received: 1 February 2016
          Published in trets Volume 10, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!