Abstract
Although FPGAs have grown in capacity, FPGA-based soft processors have grown very little because of the difficulty of achieving higher performance in exchange for area. Superscalar out-of-order processors promise large performance gains, and the memory subsystem is a key part of such a processor that must help supply increased performance. In this article, we describe and explore microarchitectural and circuit-level tradeoffs in the design of such a memory system. We show the significant instructions-per-cycle wins for providing various levels of out-of-order memory access and memory dependence speculation (1.32 × SPECint2000) and for the addition of a second-level cache (another 1.60 × ). With careful microarchitecture and circuit design, we also achieve a L1 translation lookaside buffers and cache lookup with 29% less logic delay than the simpler Nios II/f memory system.
- K. Aasaraai and A. Moshovos. 2010. An efficient non-blocking data cache for soft processors. In Proc. ReConFig. 19--24. Google Scholar
Digital Library
- A. M. S. Abdelhadi and G. G. F. Lemieux. 2015. Modular SRAM-based binary content-addressable memories. In Proc. FCCM. 207--214. Google Scholar
Digital Library
- Altera. 2015. Nios II Gen2 Processor Reference Guide.Google Scholar
- ARM. 2012. ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition.Google Scholar
- J.-L. Brelet and L. Gopalakrishnan. 2002. Using Virtex-II Block RAM for High Performance Read/Write CAMs. Xilinx Application Note XAPP260. (2002).Google Scholar
- Christopher Celio, David A. Patterson, and Krste Asanovic. 2015. The Berkeley Out-of-Order Machine (BOOM): An Industry-Competitive, Synthesizable, Parameterized RISC-V Processor. Technical Report UCB/EECS-2015-167. EECS Department, University of California, Berkeley. Retrieved from http://www.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-167.html.Google Scholar
- Aeroflex Gaisler. 2015. GRLIP IP Core User’s Manual 1.4.1.Google Scholar
- M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Workshop on Workload Characterization(WWC-4). 3--14. Google Scholar
Digital Library
- John L. Hennessy and David A. Patterson. 2003. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Francisco, CA. Google Scholar
Digital Library
- Aamer Jaleel. 2007. Memory Characterization of Workloads Using Instrumentation-Driven Simulation. Technical Report. Intel VSSAD.Google Scholar
- David Kroft. 1981. Lockup-free instruction fetch/prefetch cache organization. In Proc. ISCA. Minneapolis, MN, 81--87. Google Scholar
Digital Library
- Belli Kuttanna. 2013. Technology Insight: Intel Silvermont Microarchitecture. IDF 2013, Retrieved from https://software.intel.com/sites/default/files/managed/bb/2c/02_Intel_Silvermont_Microarchitecture.pdf. (2013).Google Scholar
- Damjan Lampret. OpenRISC 1200 IP Core Specification.Google Scholar
- Kevin P. Lawton. 1996. Bochs: A portable PC emulator for Unix/X. Linux J. 1996, 29es, Article 7 (Sept. 1996). Google Scholar
Digital Library
- Yunsup Lee, A. Waterman, R. Avizienis, H. Cook, Chen Sun, V. Stojanovic, and K. Asanovic. 2014. A 45nm 1.3GHz 16.7 double-precision GFLOPS/W RISC-V processor with vector accelerators. In European Solid State Circuits Conference (ESSCIRC). 199--202.Google Scholar
- Shih-Lien L. Lu, Peter Yiannacouras, Rolf Kassa, Michael Konow, and Taeweon Suh. 2007. An FPGA-based Pentium in a complete desktop system. In Proc. FPGA. 53--59. Google Scholar
Digital Library
- Andreas Moshovos. 1997. Dynamic speculation and synchronization of data dependencies. In Proc. ISCA. 181--193. Google Scholar
Digital Library
- J. Power, M. D. Hill, and D. A. Wood. 2014. Supporting x86-64 address translation for 100s of GPU lanes. In Proc. HPCA. 568--578.Google Scholar
- Graham Schelle, Jamison Collins, Ethan Schuchman, Perry Wang, Xiang Zou, Gautham Chinya, Ralf Plate, Thorsten Mattner, Franz Olbrich, Per Hammarlund, Ronak Singhal, Jim Brayton, Sebastian Steibl, and Hong Wang. 2010. Intel Nehalem processor core made FPGA synthesizable. In Proc. FPGA. 3--12. Google Scholar
Digital Library
- SPEC. 2000. SPEC CPU95 Results. Retrieved from https://www.spec.org/cpu95/results/. (2000).Google Scholar
- Perry H. Wang, Jamison D. Collins, Christopher T. Weaver, Blliappa Kuttanna, Shahram Salamian, Gautham N. Chinya, Ethan Schuchman, Oliver Schilling, Thorsten Doil, Sebastian Steibl, and Hong Wang. 2009. Intel Atom processor core made FPGA-synthesizable. In Proc. FPGA. 209--218. Google Scholar
Digital Library
- Henry Wong, Vaughn Betz, and Jonathan Rose. 2011. Comparing FPGA vs. custom CMOS and the impact on processor microarchitecture. In Proc. FPGA. 5--14. Google Scholar
Digital Library
- Henry Wong, Vaughn Betz, and Jonathan Rose. 2013. Efficient methods for out-of-order load/store execution for high-performance soft processors. In Proc. FPT. 442--445.Google Scholar
Cross Ref
- Jonathan D. Woodruff. 2014. CHERI: A RISC Capability Machine for Practical Memory Safety. Ph.D. Dissertation. University of Cambridge.Google Scholar
- Xilinx. 2014. MicroBlaze Processor Reference Guide.Google Scholar
Index Terms
Microarchitecture and Circuits for a 200 MHz Out-of-Order Soft Processor Memory System
Recommendations
High-Performance Instruction Scheduling Circuits for Superscalar Out-of-Order Soft Processors
Special Section on FCCM 2016 and Regular PapersSoft processors have a role to play in simplifying field-programmable gate array (FPGA) application design as they can be deployed only when needed, and it is easier to write and debug single-threaded software code than create hardware. The breadth of ...
Microarchitecture of a Coarse-Grain Out-of-Order Superscalar Processor
We explore the design, implementation, and evaluation of a coarse-grain superscalar processor in the context of the microarchitecture of the Control Processor (CP) of the Multilevel Computing Architecture (MLCA), a novel architecture targeted for ...
Microarchitecture of the Godson-2 processor
The Godson project is the first attempt to design high performance general-purpose microprocessors in China. This paper introduces the microarchitecture of the Godson-2 processor which is a 64-bit, 4-issue, out-of-order execution RISC processor that ...






Comments