skip to main content
research-article

High-Performance Instruction Scheduling Circuits for Superscalar Out-of-Order Soft Processors

Published:09 January 2018Publication History
Skip Abstract Section

Abstract

Soft processors have a role to play in simplifying field-programmable gate array (FPGA) application design as they can be deployed only when needed, and it is easier to write and debug single-threaded software code than create hardware. The breadth of this second role increases when the performance of the soft processor increases, yet the sophisticated out-of-order superscalar approaches that arrived in the mid-1990s are not employed, despite their area cost now being easily tolerable. In this article, we take an important step toward out-of-order execution in soft processors by exploring instruction scheduling in an FPGA substrate. This differs from the hard-processor design problem because the logic substrate is restricted to LUTs, whereas hard processor scheduling circuits employ CAM and wired-OR structures to great benefit. We discuss both circuit and microarchitectural trade-offs and compare three circuit structures for the scheduler, including a new structure called a fused-logic matrix scheduler. Using our optimized circuits, we show that four-issue distributed schedulers with up to 54 entries can be built with the same cycle time as the commercial Nios II/f soft processor (240MHz). This careful design has the potential to significantly increase both the IPC and raw compute performance of a soft processor, compared to current commercial soft processors.

References

  1. K. Aasaraai and A. Moshovos. 2010. Design space exploration of instruction schedulers for out-of-order soft processors. In Proceedings of the Conference on Field-Programmable Technology (FPT’10). Google ScholarGoogle ScholarCross RefCross Ref
  2. Altera. 2011. Stratix IV Device Handbook, Vol. 1.Google ScholarGoogle Scholar
  3. Altera. 2015. Nios II Performance Benchmarks, DS-N28162004.Google ScholarGoogle Scholar
  4. Mary D. Brown, Jared Stark, and Yale N. Patt. 2001. Select-free instruction scheduling logic. In Proceedings of the Conference on Microarchitecture (MICRO’01).Google ScholarGoogle Scholar
  5. Ramon Canal and Antonio González. 2000. A low-complexity issue logic. In Proceedings of the Conference on Super-computing.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chung-Ho Chen and Kuo-Su Hsiao. 2007. Scalable dynamic instruction scheduler through wake-up spatial locality. IEEE Trans. Computers 56, 11 (Nov 2007), 1534--1548. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Ernst and T. Austin. 2002. Efficient dynamic scheduling through tag elimination. In Proceedings of the International Symposium on Computer Architecture (ISCA’02). Google ScholarGoogle ScholarCross RefCross Ref
  8. J. A. Farrell and Timothy C. Fischer. 1998. Issue logic for a 600-MHz out-of-order execution microprocessor. IEEE JSSC 33, 5 (May 1998), 707--712.Google ScholarGoogle ScholarCross RefCross Ref
  9. M. Golden, S. Arekapudi, and J. Vinh. 2011. 40-entry unified out-of-order scheduler and integer execution unit for the AMD Bulldozer x86-64 core. In Proceedings of the International Solid-State Circuits Conference (ISSCC’11).Google ScholarGoogle Scholar
  10. Masahiro Goshima, Kengo Nishino, Toshiaki Kitamura, Yasuhiko Nakashima, Shinji Tomita, and Shin-ichiro Mori. 2001. A high-speed dynamic instruction scheduling scheme for superscalar processors. In Proceedings of the Conference on Microarchitecture (MICRO’01).Google ScholarGoogle Scholar
  11. M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the Workshop on Workload Characterization. 3--14. Google ScholarGoogle ScholarCross RefCross Ref
  12. Linley Gwennap. 1997. MIPS R12000 to hit 300 MHz. Microprocessor Report 11, 13 (Oct 1997).Google ScholarGoogle Scholar
  13. D. Harris. 2003. A taxonomy of parallel prefix networks. In Proceedings of the Conference on Signals, Systems and Computers. Vol. 2. Google ScholarGoogle ScholarCross RefCross Ref
  14. Abhishek Johri. 2011. Implementation of Instruction Scheduler on FPGA. Master’s thesis. University of Tokyo.Google ScholarGoogle Scholar
  15. I. Kim and M. H. Lipasti. 2003. Half-price architecture. In Proceedings of the International Symposium on Computer Architecture (ISCA’03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Belli Kuttanna. 2013. Technology Insight: Intel Silvermont Microarchitecture. IDF 2013, retrieved from https://software.intel.com/sites/default/files/managed/bb/2c/02_Intel%_Silvermont_Microarchitecture.pdf. (2013).Google ScholarGoogle Scholar
  17. Kevin P. Lawton. 1996. Bochs: A portable PC emulator for unix/X. Linux J. 1996, 29es, Article 7 (sep 1996).Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. W. L. Lynch, G. Lautterbach, and J. I. Chamdani. 1998. Low load latency through sum-addressed memory (SAM). In Proceedings of the International Symposium on Computer Architecture (ISCA’98).Google ScholarGoogle Scholar
  19. Francisco J. Mesa-Martínez, Michael C. Huang, and Jose Renau. 2006. SEED: Scalable, efficient enforcement of dependences. In Proceedings of the Conference on Parallel Architectures and Compilation Techniques (PACT’06).Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Pierre Michaud and André Seznec. 2001. Data-flow prescheduling for large instruction windows in out-of-order processors. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’01).Google ScholarGoogle ScholarCross RefCross Ref
  21. Subbarao Palacharla, Norman P. Jouppi, and J. E. Smith. 1997. Complexity-effective superscalar processors. In Proceedings of the International Symposium on Computer Architecture (ISCA’97). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Rosière, J.-L. Desbarbieux, N. Drach, and F. Wajsburt. 2012. An out-of-order superscalar processor on FPGA: The reorder buffer design. In Proceedings of the Design, Automation & Test in Europe Conference (DATE’12).Google ScholarGoogle Scholar
  23. Peter G. Sassone, Jeff Rupley, II, Edward Brekelbaum, Gabriel H. Loh, and Bryan Black. 2007. Matrix scheduler reloaded. In Proceedings of the International Symposium on Computer Architecture (ISCA’07). 335--346.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Graham Schelle, Jamison Collins, Ethan Schuchman, Perry Wang, Xiang Zou, Gautham Chinya, Ralf Plate, Thorsten Mattner, Franz Olbrich, Per Hammarlund, Ronak Singhal, Jim Brayton, Sebastian Steibl, and Hong Wang. 2010. Intel Nehalem processor core made FPGA synthesizable. In Proceedings of the Symposium on Field Programmable Gate Arrays (FPGA’10). 3--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. SPEC. 2000. SPEC CPU95 Results. Retrieved from https://www.spec.org/cpu95/results/.Google ScholarGoogle Scholar
  26. Jared Stark, Mary D. Brown, and Yale N. Patt. 2000. On pipelining dynamic instruction scheduling logic. In Proceedings of the Conference on Microarchitecture (MICRO’00).Google ScholarGoogle Scholar
  27. Wesley Terpstra. 2017. OPA: Out-of-order superscalar soft CPU. In Proceedings of the Ontology for the Router Configuration Conference (ORConf’17). https://orconf.org/2015/#opa.Google ScholarGoogle Scholar
  28. S. Vangal, M. A. Anders, N. Borkar, E. Seligman, V. Govindarajulu, V. Erraguntla, H. Wilson, A. Pangal, V. Veeramachaneni, J. W. Tschanz, Yibin Ye, D. Somasekhar, B. A. Bloechel, G. E. Dermer, R. K. Krishnamurthy, K. Soumyanath, S. Mathew, S. G. Narendra, M. R. Stan, S. Thompson, V. De, and S. Borkar. 2002. 5-GHz 32-bit integer execution core in 130-nm dual-VT CMOS. IEEE JSSC 37, 11 (Nov 2002), 1421--1432.Google ScholarGoogle Scholar
  29. H. Wong, V. Betz, and J. Rose. 2016. High performance instruction scheduling circuits for out-of-order soft processors. In Proceedings of the International Symposium on Field-Programmable Custom Computing Machines (FCCM’16). 9--16. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. High-Performance Instruction Scheduling Circuits for Superscalar Out-of-Order Soft Processors

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Reconfigurable Technology and Systems
          ACM Transactions on Reconfigurable Technology and Systems  Volume 11, Issue 1
          Special Section on FCCM 2016 and Regular Papers
          March 2018
          183 pages
          ISSN:1936-7406
          EISSN:1936-7414
          DOI:10.1145/3178391
          • Editor:
          • Steve Wilton
          Issue’s Table of Contents

          Copyright © 2018 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 9 January 2018
          • Accepted: 1 May 2017
          • Revised: 1 March 2017
          • Received: 1 November 2016
          Published in trets Volume 11, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!