Abstract
Delayed branches are commonly found in micro-architectures. A compiler or assembler can exploit delayed branches. This is achieved by moving code from one of several points to the positions following the branch instruction. We present several strategies for moving code to utilize the branch delay, and discuss the requirements and benefits of these strategies. An algorithm for processing branch delays has been implemented and we give empirical results. The performance data show that a reasonable percentage of these delays can be avoided.
- 1 Wulf, W.A., "Compilers and Computer Architecture," Computer, Vol. 14, No. 7, July 1981, pp. 41-48.Google Scholar
Digital Library
- 2 McClure, R.M., "Parallelism in Microprogrammed Controls," in Intl. Advanced Summer Institute on Microprogramming, Boulaye, G. and Mermet, J., eds., Hermann, Paris, 1972, pp. 307-328.Google Scholar
- 3 Agrawala, A.K. and Rauscher, T.G., Foundations of Microprogramming Academic Press, New York, 1976, ACM Monograph SeriesGoogle Scholar
- 4 Fisher, J.A., "2N-way Jump Microinstruction Hardware and an Effective Instruction Binding Method," Proceedings: The 13th Annual Microprogramming Workshop Micro 13, ACM, SIGMICRO, 1980, pp. 64-75. Google Scholar
Digital Library
- 5 Patterson, D.A. and Sequin C.H., "RISC-I: A Reduced Instruction Set VLSI Computer," Proc. of the Eighth Annual Symposium on Computer Architecture, Minneapolis, Minn., May 1981,. Google Scholar
Digital Library
- 6 Hennessy, J.,Jouppi, N., Przybylski, S., Rowen, C., Gross, T., Baskett, F., and Gill, J., "MIPS: A Microprocessor Architecture," Proceedings of Micro-15, IEEE, October 1982,. Google Scholar
Digital Library
- 7 Radin, G., "The 801 Minicomputer," Proc. SIGARCH/SIGPLAN Symposium on Architectural Support for Programming Languages and Operating Systems, ACM, Palo Alto, March 1982, pp. 39-47. Google Scholar
Digital Library
- 8 Shustek, L.J., Analysis and Performance of Computer Instruction Sets, PhD dissertation, Stanford University, May 1977, Also published as SLAC Report 205. Google Scholar
Digital Library
- 9 Riseman, E.M. and Foster, C.C., "The Inhibition of Potential Parallelism by Conditional Jumps," Trans. on Computer, Vol. C-21, No. 12, Dec 1972, pp. 1405-1411.Google Scholar
- 10 Hennessy, J.L. and Gross, T.R., "Code Generation and Reorganization in the Presence of Pipeline Constraints," Proc. Ninth POPL Conference, ACM, January 1982,. Google Scholar
Digital Library
- 11 Hennessy, J.L., Jouppi, N., Baskett, F., and Gill, J, "MIPS: A VLSI Processor Architecture," Proc. CMU Conference on VLSI Systems and Computations, Computer Science Press, October 1981,.Google Scholar
Cross Ref
- 12 Baskett, F., "Puzzle: an informal compute bound benchmark", Widely circulated and run.Google Scholar
Index Terms
Optimizing delayed branches
Recommendations
Optimizing delayed branches
MICRO 15: Proceedings of the 15th annual workshop on MicroprogrammingDelayed branches are commonly found in micro-architectures. A compiler or assembler can exploit delayed branches. This is achieved by moving code from one of several points to the positions following the branch instruction. We present several strategies ...
Formal Verification of Pipelined Microprocessors with Delayed Branches
ISQED '06: Proceedings of the 7th International Symposium on Quality Electronic DesignPresented is an approach for formal verification of pipelined microprocessors with delayed branches, i.e., branch instructions whose immediately following instruction is always executed regardless of whether the branch is taken. Delayed branches are ...
Optimizing Indirect Branches in Dynamic Binary Translators
Dynamic binary translation is a technology for transparently translating and modifying a program at the machine code level as it is running. A significant factor in the performance of a dynamic binary translator is its handling of indirect branches. ...






Comments