Abstract
Indirect jump instructions are used to implement increasingly-common programming constructs such as virtual function calls, switch-case statements, jump tables, and interface calls. The performance impact of indirect jumps is likely to increase because indirect jumps with multiple targets are difficult to predict even with specialized hardware.
This paper proposes a new way of handling hard-to-predict indirect jumps: dynamically predicating them. The compiler (static or dynamic) identifies indirect jumps that are suitable for predication along with their control-flow merge (CFM) points. The hardware predicates theinstructions between different targets of the jump and its CFM point if the jump turns out to be hard-to-predict at run time. If the jump would actually have been mispredicted, its dynamic predication eliminates a pipeline flush, thereby improving performance.
Our evaluations show that Dynamic Indirect jump Predication (DIP) improves the performance of a set of object-oriented applications including the Java DaCapo benchmark suite by 37.8% compared to a commonly-used branch target buffer based predictor, while also reducing energy consumption by 24.8%. We compare DIP to three previously proposed indirect jump predictors and find that it provides the best performance and energy-efficiency.
Supplemental Material
Available for Download
Supplemental material for Improving the performance of object-oriented languages with dynamic predication of indirect jumps
- J.R. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of control dependence to data dependence. In POPL-10, 1983. Google Scholar
Digital Library
- B. Alpern, A. Cocchi, S. Fink, D. Grove, and D. Lieber. Efficient implementation of Java interfaces: Invokeinterface considered harmless. In OOPSLA, 2001. Google Scholar
Digital Library
- S. Bhansali, W.-K. Chen, S.D. Jong, A. Edwards, and M. Drinic. Framework for instruction-level tracing and analysis of programs. In VEE, 2006. Google Scholar
Digital Library
- N.L. Binkert, R.G. Dreslinski, L.R. Hsu, K.T. Lim, A.G. Saidi, and S.K. Reinhardt. The M5 simulator: Modeling networked systems. IEEE Micro, 26(4):52--60, 2006. Google Scholar
Digital Library
- S.M. Blackburn et al. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA'06, 2006. Google Scholar
Digital Library
- D. Brooks, V. Tiwari, and M. Martonosi. Wattch: a framework for architectural-level power analysis and optimizations. In ISCA-27, 2000. Google Scholar
Digital Library
- B. Calder, D. Grunwald, and B. Zorn. Quantifying behavioral differences between C and C++ programs. Journal of Programming Languages, 2(4):323--351, 1995.Google Scholar
- L. Cardelli and P. Wegner. On understanding types, data abstraction, and polymorphism. ACM Computing Surveys, 17(4):471--523, Dec. 1985. Google Scholar
Digital Library
- P. Chang, E. Hao, and Y.N. Patt. Target prediction for indirect jumps. In ISCA, 1997. Google Scholar
Digital Library
- R. Cytron et al. Efficiently computing static single assignment form and the control dependence graph. ACM TOPLAS, 13(4):451--490, Oct. 1991. Google Scholar
Digital Library
- L.P. Deutsch and A.M. Schiffman. Efficient implementation of the Smalltalk-80 system. In POPL, 1984. Google Scholar
Digital Library
- K. Driesen and U. Holzle. Accurate indirect branch prediction. In ISCA-25, 1998. Google Scholar
Digital Library
- K. Driesen and U. Holzle. Multi-stage cascaded prediction. In Euro-Par, 1999. Google Scholar
Digital Library
- M.A. Ertl and D. Gregg. Optimizing indirect branch prediction accuracy in virtual machine interpreters. In PLDI, 2003. Google Scholar
Digital Library
- M. Farrens, T. Heil, J.E. Smith, and G. Tyson. Restricted dual path execution. Technical Report CSE-97-18, University of California at Davis, Nov. 1997.Google Scholar
- S. Gochman, R. Ronen, I. Anati, A. Berkovits, T. Kurts, A. Naveh, A. Saeed, Z. Sperber, and R.C. Valentine. The Intel PentiumM processor: Microarchitecture and performance. Intel Technology Journal, 7(2), May 2003.Google Scholar
- T. Heil and J.E. Smith. Selective dual path execution. Technical report, University of Wisconsin-Madison, Nov. 1996.Google Scholar
- G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The microarchitecture of the Pentium 4 processor. Intel Technology Journal, Feb. 2001.Google Scholar
- U. Holzle, C. Chambers, and D. Ungar. Optimizing dynamically-typed object-oriented languages with polymorphic inline caches. In ECOOP, 1991. Google Scholar
Digital Library
- U. Holzle and D. Ungar. Optimizing dynamically-dispatched calls with run-time type feedback. In PLDI, 1994. Google Scholar
Digital Library
- Intel Corp. ICC 9.1 for Linux. http://www.intel.com/cd/software/products/asmo-na/eng/compilers/284264.htm.Google Scholar
- Intel Corp. Intel Core2 Duo Desktop Processor E6600. http://processorfinder.intel.com/details.aspx?sSpec=SL9ZL.Google Scholar
- Intel Corp. Intel VTune Performance Analyzers. http://www.intel.com/vtune/.Google Scholar
- K. Ishizaki, M. Kawahito, T. Yasue, H. Komatsu, and T. Nakatani. A study of devirtualization techniques for a java just-in-time compiler. In OOPSLA-15, 2000. Google Scholar
Digital Library
- E. Jacobsen, E. Rotenberg, and J.E. Smith. Assigning confidence to conditional branch predictions. In MICRO-29, 1996. Google Scholar
Digital Library
- D. Jimenez and C. Lin. Dynamic branch prediction with perceptrons. In HPCA, 2001. Google Scholar
Digital Library
- J.A. Joao, O. Mutlu, H. Kim, and Y.N. Patt. Dynamic predication of indirect jumps. IEEE Computer Architecture Letters, May 2007. Google Scholar
Digital Library
- J. Kalamatianos and D.R. Kaeli. Predicting indirect branches via data compression. In MICRO-31. Google Scholar
Digital Library
- R.E. Kessler. The Alpha 21264 microprocessor. IEEE Micro, 19(2):24--36, 1999. Google Scholar
Digital Library
- H. Kim, J.A. Joao, O. Mutlu, C.J. Lee, Y.N. Patt, and R.S. Cohn. VPC Prediction: Reducing the cost of indirect branches via hardware-based dynamic devirtualization. In ISCA-34, 2007. Google Scholar
Digital Library
- H. Kim, J.A. Joao, O. Mutlu, and Y.N. Patt. Diverge-merge processor (DMP): Dynamic predicated execution of complex control-flow graphs based on frequently executed paths. In MICRO-39, 2006. Google Scholar
Digital Library
- H. Kim, J.A. Joao, O. Mutlu, and Y.N. Patt. Diverge-merge processor: Generalized and energy-efficient dynamic predication. IEEE Micro, 27(1):94--104, 2007. Google Scholar
Digital Library
- H. Kim, J.A. Joao, O. Mutlu, and Y.N. Patt. Profile-assisted compiler support for dynamic predication in diverge-merge processors. In CGO-5, 2007. Google Scholar
Digital Library
- A. Klauser, T. Austin, D. Grunwald, and B. Calder. Dynamic hammock predication for non-predicated instruction set architectures. In PACT, 1998. Google Scholar
Digital Library
- A. Klauser, A. Paithankar, and D. Grunwald. Selective eager execution on the polypath architecture. In ISCA-25, 1998. Google Scholar
Digital Library
- B.R. Rau, D.W.L. Yen, W. Yen, and R.A. Towle. The Cydra 5 departmental supercomputer. IEEE Computer, 22:12--35, Jan. 1989. Google Scholar
Digital Library
- E.M. Riseman and C.C. Foster. The inhibition of potential parallelism by conditional jumps. IEEE Transactions on Computers, C-21(12):1405--1411, 1972. Google Scholar
Digital Library
- A. Roth, A. Moshovos, and G.S. Sohi. Improving virtual function call target prediction via dependence-based pre-computation. In ICS-13, 1999. Google Scholar
Digital Library
- A. Seznec and P. Michaud. A case for (partially) TAgged GEometric history length branch prediction. JILP, Feb. 2006.Google Scholar
- E.H. Sussenguth. Instruction Control Sequence. U.S. Patent 3559183, Jan. 26, 1971.Google Scholar
- D. Tarditi, July 2007. Personal communication.Google Scholar
- J. Tendler, S. Dodson, S. Fields, H. Le, and B. Sinharoy. POWER4 system microarchitecture. IBM Technical White Paper, Oct. 2001.Google Scholar
- P.H. Wang, H. Wang, R.M. Kling, K. Ramakrishnan, and J.P. Shen. Register renaming and scheduling for dynamic execution of predicated code. In HPCA-7, 2001. Google Scholar
Digital Library
- M. Wolczko. Benchmarking Java with the Richards benchmark. http://research.sun.com/people/mario/java_benchmarking/richards/richards.html.Google Scholar
Index Terms
Improving the performance of object-oriented languages with dynamic predication of indirect jumps
Recommendations
Improving the performance of object-oriented languages with dynamic predication of indirect jumps
ASPLOS '08Indirect jump instructions are used to implement increasingly-common programming constructs such as virtual function calls, switch-case statements, jump tables, and interface calls. The performance impact of indirect jumps is likely to increase because ...
Improving the performance of object-oriented languages with dynamic predication of indirect jumps
ASPLOS '08Indirect jump instructions are used to implement increasingly-common programming constructs such as virtual function calls, switch-case statements, jump tables, and interface calls. The performance impact of indirect jumps is likely to increase because ...
Improving the performance of object-oriented languages with dynamic predication of indirect jumps
ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systemsIndirect jump instructions are used to implement increasingly-common programming constructs such as virtual function calls, switch-case statements, jump tables, and interface calls. The performance impact of indirect jumps is likely to increase because ...







Comments