Abstract
Interpreters designed for efficiency execute a huge number of indirect branches and can spend more than half of the execution time in indirect branch mispredictions. Branch target buffers (BTBs) are the most widely available form of indirect branch prediction; however, their prediction accuracy for existing interpreters is only 2%--50%. In this article we investigate two methods for improving the prediction accuracy of BTBs for interpreters: replicating virtual machine (VM) instructions and combining sequences of VM instructions into superinstructions. We investigate static (interpreter build-time) and dynamic (interpreter runtime) variants of these techniques and compare them and several combinations of these techniques. To show their generality, we have implemented these optimizations in VMs for both Java and Forth. These techniques can eliminate nearly all of the dispatch branch mispredictions, and have other benefits, resulting in speedups by a factor of up to 4.55 over efficient threaded-code interpreters, and speedups by a factor of up to 1.34 over techniques relying on dynamic superinstructions alone.
- Bell, J. R. 1973. Threaded code. Commun. ACM 16, 6, 370--372. Google Scholar
Digital Library
- Bell, T. C., Cleary, J. G., and Witten, I. H. 1990. Text Compression. Prentice-Hall. Google Scholar
Digital Library
- Berndl, M., Vitale, B., Zaleski, M., and Brown, A. D. 2005. Context threading: A flexible and efficient dispatch technique for virtual machine interpreters. In 3nd IEEE /ACM International Symposium on Code Generation and Optimization (CGO'05), San Jose, CA. 15--26. Google Scholar
Digital Library
- Calder, B. and Grunwald, D. 1994. Reducing branch costs via branch alignment. In Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI). 242--251. Google Scholar
Digital Library
- Casey, K., Ertl, A., and Gregg, D. 2005. Optimizations for a Java interpreter using instruction set enhancement. Tech. rep. TCD-CS-2005-61, Department of Computer Science, University of Dublin, Trinity College, Dublin, Ireland.Google Scholar
- Casey, K., Gregg, D., and Ertl, A. 2005. Tiger---an interpreter generation tool. In International Conference on Compiler Construction (CC'05). Lecture Notes in Computer Science, vol. 3443. Springer Verlag, 246--249. Google Scholar
Digital Library
- Casey, K., Gregg, D., Ertl, M. A., and Nisbet, A. 2003. Towards superinstructions for Java interpeters. In Proceedings of the 7th International Workshoop on Software and Compilers for Embedded Systems (SCOPES'03), A. Krall, Ed. Lecture Notes in Computer Science, Vol. 2826. 329--343.Google Scholar
Cross Ref
- Driesen, K. and Hölzle, U. 1998. Accurate indirect branch prediction. In Proceedings of the 25th Annual International Symposium on Computer Architecture (ISCA98). 167--178. Google Scholar
Digital Library
- Driesen, K. and Hölzle, U. 1999. Multi-stage cascaded prediction. In EuroPar'99 Conference Proceedings. Lecture Notes in Computer Science, vol. 1685. Springer, 1312--1321. Google Scholar
Digital Library
- Ertl, M. A. 1995. Stack caching for interpreters. In SIGPLAN '95 Conference on Programming Language Design and Implementation. 315--327. Google Scholar
Digital Library
- Ertl, M. A. and Gregg, D. 2003a. Optimizing indirect branch prediction accuracy in virtual machine interpreters. In SIGPLAN '03 Conference on Programming Language Design and Implementation. Google Scholar
Digital Library
- Ertl, M. A. and Gregg, D. 2003b. The structure and performance of Efficient interpreters. J. Instruc.-Lev. Paral. 5. http://www.jilp.org/vol5/.Google Scholar
- Ertl, M. A. and Gregg, D. 2006. Optimizing Interpreters for Processors with Branch Target Buffers. Tech. rep. TCD-CS-2006-51, Department of Computer Science, University of Dublin, Trinity College, Dublin, Ireland.Google Scholar
- Ertl, M. A., Gregg, D., Krall, A., and Paysan, B. 2002. vmgen---a generator of efficient virtual machine interpreters. Softw. Prac. Exper. 32, 3, 265--294. Google Scholar
Digital Library
- Ertl, M. A., Thalinger, C., and Krall, A. 2006. Superinstructions and replication in the Cacao JVM interpreter. J. .Net Techn. 4, 1, 31--38.Google Scholar
- Gagnon, E. 2003. A portable research framework for the execution of Java bytecode. Ph.D. thesis, McGill University. Google Scholar
Digital Library
- Gagnon, E. and Hendren, L. J. 2003. Effective inline-threaded interpretation of Java bytecode using preparation sequences. In Proceedings of Compiler Construction, 12th International Conference (CC'03). The Joint European Conferences on Theory and Practice of Software, ETAPS'03. 170--184.Google Scholar
- Gagnon, E. M. and Hendren, L. J. 2001. SableVM: A research framework for the efficient execution of Java bytecode. In Proceedings of the Java Virtual Machine Research and Technology Symposium (JVM'01). Monterey, CA, 27--39. Google Scholar
Digital Library
- Gochman, S., Ronen, R., Anati, I., Berkovits, A., Kurts, T., Naveh, A., Saeed, A., Sperber, Z., and Valentine, R. 2003. The Intel Pentium M processor: microarchitecture and performance. Intel Tech. J. 7, 2, 20--36.Google Scholar
- Hoogerbrugge, J. and Augusteijn, L. 2000. Pipelined Java virtual machine interpreters. In Proceedings of the 9th International Conference on Compiler Construction (CC'00). Lecture Notes in Computer Science, Springer Verlag. Google Scholar
Digital Library
- Hoogerbrugge, J., Augusteijn, L., Trum, J., and van de Wiel, R. 1999. A code compression system based on pipelined interpreters. Softw. Prac. Exper. 29, 11, 1005--1023. Google Scholar
Digital Library
- Kaeli, D. R. and Emma, P. G. 1994. Case block table for holding multi-way branches. US Patent No. 5,333,283.Google Scholar
- Kaeli, D. R. and Emma, P. G. 1997. Improving the accuracy of history-based branch prediction. IEEE Trans. Comput. 46, 4, 469--472. Google Scholar
Digital Library
- Kalamatianos, J. and Kaeli, D. 1999. Indirect branch prediction using data compression techniques. J. Instruc. Lev. Paral.Google Scholar
- Krall, A. 1994. Improving semi-static branch prediction by code replication. In Conference on Programming Language Design and Implementation. ACM, 97--106. Google Scholar
Digital Library
- Li, T., Bhargava, R., and John, L. K. 2005. Adapting branch-target buffer to improve the target predictability of Java code. ACM Trans. Archit. Code Optimiz. 2, 2, 109--130. Google Scholar
Digital Library
- Piumarta, I. and Riccardi, F. 1998. Optimizing direct threaded code by selective inlining. In SIGPLAN'98 Conference on Programming Language Design and Implementation. 291--300. Google Scholar
Digital Library
- Proebsting, T. A. 1995. Optimizing an ANSI C interpreter with superoperators. In Principles of Programming Languages (POPL'95). 322--332. Google Scholar
Digital Library
- Romer, T. H., Lee, D., Voelker, G. M., Wolman, A., Wong, W. A., Baer, J.-L., Bershad, B. N., and Levy, H. M. 1996. The structure and performance of interpreters. In Architectural Support for Programming Languages and Operating Systems (ASPLOS-VII). 150--159. Google Scholar
Digital Library
- Rossi, M. and Sivalingam, K. 1996. A survey of instruction dispatch techniques for byte-code interpreters. Tech. rep. TKO-C79, Faculty of Information Technology, Helsinki University of Technology.Google Scholar
- Santos Costa, V. 1999. Optimising bytecode emulation for Prolog. In Proceedings of PPDP'99. Lecture Notes in Computer Science, vol. 1702. Springer-Verlag, 261--267. Google Scholar
Digital Library
- Smith, J. and Nair, R. 2005. Virtual Machines: Versatile Platforms for Systems and Processes. Morgan Kaufmann. Google Scholar
Digital Library
- Sun-Microsystems. 2001. The Java Hotspot virtual machine. Tech. rep., Sun Microsystems Inc.Google Scholar
- Young, C., Gloy, N., and Smith, M. D. 1995. A comparative analysis of schemes for correlated branch prediction. In 22nd Annual International Symposium on Computer Architecture. 276--286. Google Scholar
Digital Library
- Young, C. and Smith, M. D. 1994. Improving the accuracy of static branch prediction using branch correlation. In Achitectural Support for Programming Languags and Operating Systems (ASPLOS VI). Google Scholar
Digital Library
- Zhou, J. and Ross, K. A. 2004. Buffering database operations for enhanced instruction cache performance. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM Press, 191--202. Google Scholar
Digital Library
Index Terms
Optimizing indirect branch prediction accuracy in virtual machine interpreters
Recommendations
Optimizing indirect branch prediction accuracy in virtual machine interpreters
PLDI '03: Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementationInterpreters designed for efficiency execute a huge number of indirect branches and can spend more than half of the execution time in indirect branch mispredictions. Branch target buffers are the best widely available form of indirect branch prediction; ...
Optimizing indirect branch prediction accuracy in virtual machine interpreters
Interpreters designed for efficiency execute a huge number of indirect branches and can spend more than half of the execution time in indirect branch mispredictions. Branch target buffers are the best widely available form of indirect branch prediction; ...
A Comprehensive Analysis of Indirect Branch Prediction
ISHPC '02: Proceedings of the 4th International Symposium on High Performance ComputingIndirect branch prediction is a performance limiting factor for current computer systems, preventing superscalar processors from exploiting the available ILP. Indirect branches are responsible for 55.7% of mispredictions in our benchmark set, although ...






Comments