skip to main content
article
Free Access

Forward semantic: a compiler-assisted instruction fetch method for heavily pipelined processors

Authors Info & Claims
Published:01 August 1989Publication History
Skip Abstract Section

Abstract

A new instruction fetch method, forward semantic, is offered to enable the deeply pipelined processors to fetch one useful instruction every cycle. Forward semantic is an improved alternative to the delayed branching (with or without squashing), with five major advantages. Fist, no restriction is imposed on the type of instructions filling the branch slots, which allows a large number of slots to be filled. Second, no modification to the offsets and displacements is necessary when an instruction is copied to fill a branch slot, which simplifies the linker implementation. Third, an interrupted program can resume execution with a single program counter, eliminating the need for reloading the instruction pipeline before resuming execution. Fourth, programs compiled with N slots can execute on pipelines requiring K (K ≤ N) slots, which makes family architecture compatibility possible . Lastly, the filling of branch slots is totally transparent to code compaction and software interlocking schemes. These advantages combine to provide an efficient instruction fetch mechanism and to eliminate artificial penalties on branch cost. At the cost of 11% static code expansion, forward semantic achieves an instruction fetch cost of 1.2 cycles for pipelines requiring 10 slots for each taken branch. This level of instruction fetch efficiency has never been achieved before with conventional instruction fetch methods. The branch cost is dictated by the accuracy of the compile-time branch prediction rather than artificial limitations, such as data dependencies, which prevent the slots from being filled. These results are measured from the execution of real UNIX and CAD programs with complex control structures.

References

  1. 1 P. M. Kogge, The Architecture of Pipelined Computers, pp. 237-243, McGraw-Hill, 1981.]]Google ScholarGoogle Scholar
  2. 2 J. E. Smith, "A Study of 13ranch Prediction Strategies,' Proceedings of the 8th international Symposium of Computer Architecture, pp. 135 - 148, June, 1981.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3 J. K. F. Lee and A. J. Smith, "Branch Prediction Strategies and Branch Target Buffer Design,' IEEE Computer, January 1984.]]Google ScholarGoogle Scholar
  4. 4 J. A. DeRosa and H. M. Levy, "An Evaluation of Branch Architectures,' Proceedings of the 15th International Symposium on Computer Architecture, Honolulu, Hawaii, May 30 -June 2,1988.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5 S. McFarling and J.L. Hennessy, 'Reducing the Cost of Branches," The 13th International Symposium on Computer Architecture Conference Proceedings, pp. 396403, Tokyo, Japan, June 1986.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6 D. R. Ditzel and H. R. McLellan, 'Branch Folding in the CRISP Microprocessor: Reducing Branch Delay to Zero," Proceedings of the 14th Annual International Symposium on Computer Architecture, pp. 2 - 9, Pittsburgh, Pennsylvania, June 2-5.1987.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7 W. W. Hwu, T. M. Conte, and P. P. Chang, "Comparing Software and Hardware Schemes For Reducing the Cost of Branches," Proceedings of the 16th Annual International Symposium on Computer Architecture, Jerusalem, Israel, May 28 - June 1. 1989.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8 G. Kadm, "The 801 Minicomputer," Proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems, pp. 39 - 47, March 1982.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9 D. A. Patterson and C. H. Sequin, "A VLSI RISC,' IEEE Computer, pp. 8 - 21, September, 1982.]]Google ScholarGoogle Scholar
  10. 10 J. L. Hennessy , N. Jouppi, F. Baskett, and J. Gill, "MIPS: A VLSI Processor Architecture,' Proceedings of the CMU Conference on VLSI Systems and Computations, October 1981.]]Google ScholarGoogle Scholar
  11. 11 J. S. Bimbaum and W. S. Worley, "Beyond RISC: High Precision Architecture,' Spring COMPCON, p. 40.1986.]]Google ScholarGoogle Scholar
  12. 12 T. R. Gross and J. L. Hermessy., "Optimizing Delayed Branches,' Proceedings of the 15th Microprogramming Workshop, pp. 114 - 120, October 1982.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13 M. Hill and et al, "Design Decisions in SPUR," IEEE Computer, pp. 8 - 22, November 1986.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14 P. Chow and M. Horowitz, "Architecture Tradeoffs in the Design of MIPS-X,' Proceedings of the $14 sup th$ Annual international Symposium on Computer Architecture, Pittsburgh, Pennsylvania, June 2-5, 1987.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15 G. Kane, MIPS R2000 RISC Architecture, Prentice Hall, Englewood Cliffs, NJ, 1987.]]Google ScholarGoogle Scholar
  16. 16 Charles Melear, "The Design of the 88000 RISC Family," IEEE MICRO, pp. 26 - 38, April 1989.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17 J. Emer and D. Clark, "A Characterization of Processor Performance in the VAX-11/780.""" Proceedings of the Ilth Annual Symposium on Computer Architecture, June 1984.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18 J. L. Hemtessy and T. Gross, "Postpass Code Gptimization of Pipeline Constraints," ACM Trans. on Programming Languages and Systems, vol. 5, pp. 422-448, ACM, July 1983.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19 R. M. Russell, "The Cray-1 Computer System," Comm. ACM, vol. 21, No. 1, pp. 63-72, January 1978.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20 S. Weiss and J. E. Smith, "Instruction Issue Logic in Pipelined Supercomputers," IEEE Transactions on Computers, vol. C-33, pp. 1013--1022, IEEE, November 1984.]]Google ScholarGoogle Scholar
  21. 21 Y. N. Patt, W. W. Hwu, and M. C. Shebanow, "HPS, A New Microarchitecture: Rationale and Introduction," Proceedings of the 18th International Microprogramming Workshop, pp. 103-108, Asilomar, CA, Dec. 1985.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22 W. W. Hwu, "Exploiting Concurrency to Achieve High Performance in a Single-chip Microarchitecture,' Ph.D. Dissertation, Computer Science Division Report, vol. No. UCB/CSD 88/398. University of California, Berkeley, January 1988.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23 R. D. Acosta, J. Kjelstrup. and H. C. Tomg, "An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors," IEEE Transactions on Computers, vol. C-35, no. 9, September 1986.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24 R. M. Tomasulo, "An Efficient Algorithm for Exploiting Multiple Arithmetic Units," IBM Journal of Research and Development, vol. 11, pp. 25-33, January 1967.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25 W. W. Hwu and Y. N. Patt, "Checkpoint Repair for High Performance Gut-of-order Execution Machines," IEEE Transaction on Computers, IEEE, December 1987.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26 P. P. Chang and W. W. Hwu, "Trace Selection for Compiling Large C Application Programs to Microcode," Proceedings of the 21st Annual Workshop on Microprogramming and Microarchitectures, pp. 21-29. San Diego, California, November 29 - December 2.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Forward semantic: a compiler-assisted instruction fetch method for heavily pipelined processors

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM SIGMICRO Newsletter
              ACM SIGMICRO Newsletter  Volume 20, Issue 3
              Sep. 1989
              253 pages
              ISSN:1050-916X
              DOI:10.1145/75395
              Issue’s Table of Contents
              • cover image ACM Conferences
                MICRO 22: Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
                August 1989
                253 pages
                ISBN:0897913248
                DOI:10.1145/75362

              Copyright © 1989 Authors

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 August 1989

              Check for updates

              Qualifiers

              • article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!