Abstract
Soft processors have a role to play in simplifying field-programmable gate array (FPGA) application design as they can be deployed only when needed, and it is easier to write and debug single-threaded software code than create hardware. The breadth of this second role increases when the performance of the soft processor increases, yet the sophisticated out-of-order superscalar approaches that arrived in the mid-1990s are not employed, despite their area cost now being easily tolerable. In this article, we take an important step toward out-of-order execution in soft processors by exploring instruction scheduling in an FPGA substrate. This differs from the hard-processor design problem because the logic substrate is restricted to LUTs, whereas hard processor scheduling circuits employ CAM and wired-OR structures to great benefit. We discuss both circuit and microarchitectural trade-offs and compare three circuit structures for the scheduler, including a new structure called a fused-logic matrix scheduler. Using our optimized circuits, we show that four-issue distributed schedulers with up to 54 entries can be built with the same cycle time as the commercial Nios II/f soft processor (240MHz). This careful design has the potential to significantly increase both the IPC and raw compute performance of a soft processor, compared to current commercial soft processors.
- K. Aasaraai and A. Moshovos. 2010. Design space exploration of instruction schedulers for out-of-order soft processors. In Proceedings of the Conference on Field-Programmable Technology (FPT’10). Google Scholar
Cross Ref
- Altera. 2011. Stratix IV Device Handbook, Vol. 1.Google Scholar
- Altera. 2015. Nios II Performance Benchmarks, DS-N28162004.Google Scholar
- Mary D. Brown, Jared Stark, and Yale N. Patt. 2001. Select-free instruction scheduling logic. In Proceedings of the Conference on Microarchitecture (MICRO’01).Google Scholar
- Ramon Canal and Antonio González. 2000. A low-complexity issue logic. In Proceedings of the Conference on Super-computing.Google Scholar
Digital Library
- Chung-Ho Chen and Kuo-Su Hsiao. 2007. Scalable dynamic instruction scheduler through wake-up spatial locality. IEEE Trans. Computers 56, 11 (Nov 2007), 1534--1548. Google Scholar
Digital Library
- D. Ernst and T. Austin. 2002. Efficient dynamic scheduling through tag elimination. In Proceedings of the International Symposium on Computer Architecture (ISCA’02). Google Scholar
Cross Ref
- J. A. Farrell and Timothy C. Fischer. 1998. Issue logic for a 600-MHz out-of-order execution microprocessor. IEEE JSSC 33, 5 (May 1998), 707--712.Google Scholar
Cross Ref
- M. Golden, S. Arekapudi, and J. Vinh. 2011. 40-entry unified out-of-order scheduler and integer execution unit for the AMD Bulldozer x86-64 core. In Proceedings of the International Solid-State Circuits Conference (ISSCC’11).Google Scholar
- Masahiro Goshima, Kengo Nishino, Toshiaki Kitamura, Yasuhiko Nakashima, Shinji Tomita, and Shin-ichiro Mori. 2001. A high-speed dynamic instruction scheduling scheme for superscalar processors. In Proceedings of the Conference on Microarchitecture (MICRO’01).Google Scholar
- M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the Workshop on Workload Characterization. 3--14. Google Scholar
Cross Ref
- Linley Gwennap. 1997. MIPS R12000 to hit 300 MHz. Microprocessor Report 11, 13 (Oct 1997).Google Scholar
- D. Harris. 2003. A taxonomy of parallel prefix networks. In Proceedings of the Conference on Signals, Systems and Computers. Vol. 2. Google Scholar
Cross Ref
- Abhishek Johri. 2011. Implementation of Instruction Scheduler on FPGA. Master’s thesis. University of Tokyo.Google Scholar
- I. Kim and M. H. Lipasti. 2003. Half-price architecture. In Proceedings of the International Symposium on Computer Architecture (ISCA’03). Google Scholar
Digital Library
- Belli Kuttanna. 2013. Technology Insight: Intel Silvermont Microarchitecture. IDF 2013, retrieved from https://software.intel.com/sites/default/files/managed/bb/2c/02_Intel%_Silvermont_Microarchitecture.pdf. (2013).Google Scholar
- Kevin P. Lawton. 1996. Bochs: A portable PC emulator for unix/X. Linux J. 1996, 29es, Article 7 (sep 1996).Google Scholar
Digital Library
- W. L. Lynch, G. Lautterbach, and J. I. Chamdani. 1998. Low load latency through sum-addressed memory (SAM). In Proceedings of the International Symposium on Computer Architecture (ISCA’98).Google Scholar
- Francisco J. Mesa-Martínez, Michael C. Huang, and Jose Renau. 2006. SEED: Scalable, efficient enforcement of dependences. In Proceedings of the Conference on Parallel Architectures and Compilation Techniques (PACT’06).Google Scholar
Digital Library
- Pierre Michaud and André Seznec. 2001. Data-flow prescheduling for large instruction windows in out-of-order processors. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’01).Google Scholar
Cross Ref
- Subbarao Palacharla, Norman P. Jouppi, and J. E. Smith. 1997. Complexity-effective superscalar processors. In Proceedings of the International Symposium on Computer Architecture (ISCA’97). Google Scholar
Digital Library
- M. Rosière, J.-L. Desbarbieux, N. Drach, and F. Wajsburt. 2012. An out-of-order superscalar processor on FPGA: The reorder buffer design. In Proceedings of the Design, Automation & Test in Europe Conference (DATE’12).Google Scholar
- Peter G. Sassone, Jeff Rupley, II, Edward Brekelbaum, Gabriel H. Loh, and Bryan Black. 2007. Matrix scheduler reloaded. In Proceedings of the International Symposium on Computer Architecture (ISCA’07). 335--346.Google Scholar
Digital Library
- Graham Schelle, Jamison Collins, Ethan Schuchman, Perry Wang, Xiang Zou, Gautham Chinya, Ralf Plate, Thorsten Mattner, Franz Olbrich, Per Hammarlund, Ronak Singhal, Jim Brayton, Sebastian Steibl, and Hong Wang. 2010. Intel Nehalem processor core made FPGA synthesizable. In Proceedings of the Symposium on Field Programmable Gate Arrays (FPGA’10). 3--12. Google Scholar
Digital Library
- SPEC. 2000. SPEC CPU95 Results. Retrieved from https://www.spec.org/cpu95/results/.Google Scholar
- Jared Stark, Mary D. Brown, and Yale N. Patt. 2000. On pipelining dynamic instruction scheduling logic. In Proceedings of the Conference on Microarchitecture (MICRO’00).Google Scholar
- Wesley Terpstra. 2017. OPA: Out-of-order superscalar soft CPU. In Proceedings of the Ontology for the Router Configuration Conference (ORConf’17). https://orconf.org/2015/#opa.Google Scholar
- S. Vangal, M. A. Anders, N. Borkar, E. Seligman, V. Govindarajulu, V. Erraguntla, H. Wilson, A. Pangal, V. Veeramachaneni, J. W. Tschanz, Yibin Ye, D. Somasekhar, B. A. Bloechel, G. E. Dermer, R. K. Krishnamurthy, K. Soumyanath, S. Mathew, S. G. Narendra, M. R. Stan, S. Thompson, V. De, and S. Borkar. 2002. 5-GHz 32-bit integer execution core in 130-nm dual-VT CMOS. IEEE JSSC 37, 11 (Nov 2002), 1421--1432.Google Scholar
- H. Wong, V. Betz, and J. Rose. 2016. High performance instruction scheduling circuits for out-of-order soft processors. In Proceedings of the International Symposium on Field-Programmable Custom Computing Machines (FCCM’16). 9--16. Google Scholar
Cross Ref
Index Terms
High-Performance Instruction Scheduling Circuits for Superscalar Out-of-Order Soft Processors
Recommendations
Microarchitecture and Circuits for a 200 MHz Out-of-Order Soft Processor Memory System
Although FPGAs have grown in capacity, FPGA-based soft processors have grown very little because of the difficulty of achieving higher performance in exchange for area. Superscalar out-of-order processors promise large performance gains, and the memory ...
Circuits for wide-window superscalar processors
ISCA '00: Proceedings of the 27th annual international symposium on Computer architectureOur program benchmarks and simulations of novel circuits indicate that large-window processors are feasible. Using our redesigned superscalar components, a large-window processor implemented in today's technology can achieve an increase of 10-60% (...
Fine-grain performance scaling of soft vector processors
CASES '09: Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systemsEmbedded systems are often implemented on FPGA devices and 25% of the time include a soft processor--a processor built using the FPGA reprogrammable fabric. Because of their prevalence and flexibility, soft processors are compelling targets for ...






Comments