skip to main content
article
Free Access

Optimizing indirect branch prediction accuracy in virtual machine interpreters

Published:01 October 2007Publication History
Skip Abstract Section

Abstract

Interpreters designed for efficiency execute a huge number of indirect branches and can spend more than half of the execution time in indirect branch mispredictions. Branch target buffers (BTBs) are the most widely available form of indirect branch prediction; however, their prediction accuracy for existing interpreters is only 2%--50%. In this article we investigate two methods for improving the prediction accuracy of BTBs for interpreters: replicating virtual machine (VM) instructions and combining sequences of VM instructions into superinstructions. We investigate static (interpreter build-time) and dynamic (interpreter runtime) variants of these techniques and compare them and several combinations of these techniques. To show their generality, we have implemented these optimizations in VMs for both Java and Forth. These techniques can eliminate nearly all of the dispatch branch mispredictions, and have other benefits, resulting in speedups by a factor of up to 4.55 over efficient threaded-code interpreters, and speedups by a factor of up to 1.34 over techniques relying on dynamic superinstructions alone.

References

  1. Bell, J. R. 1973. Threaded code. Commun. ACM 16, 6, 370--372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bell, T. C., Cleary, J. G., and Witten, I. H. 1990. Text Compression. Prentice-Hall. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Berndl, M., Vitale, B., Zaleski, M., and Brown, A. D. 2005. Context threading: A flexible and efficient dispatch technique for virtual machine interpreters. In 3nd IEEE /ACM International Symposium on Code Generation and Optimization (CGO'05), San Jose, CA. 15--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Calder, B. and Grunwald, D. 1994. Reducing branch costs via branch alignment. In Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI). 242--251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Casey, K., Ertl, A., and Gregg, D. 2005. Optimizations for a Java interpreter using instruction set enhancement. Tech. rep. TCD-CS-2005-61, Department of Computer Science, University of Dublin, Trinity College, Dublin, Ireland.Google ScholarGoogle Scholar
  6. Casey, K., Gregg, D., and Ertl, A. 2005. Tiger---an interpreter generation tool. In International Conference on Compiler Construction (CC'05). Lecture Notes in Computer Science, vol. 3443. Springer Verlag, 246--249. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Casey, K., Gregg, D., Ertl, M. A., and Nisbet, A. 2003. Towards superinstructions for Java interpeters. In Proceedings of the 7th International Workshoop on Software and Compilers for Embedded Systems (SCOPES'03), A. Krall, Ed. Lecture Notes in Computer Science, Vol. 2826. 329--343.Google ScholarGoogle ScholarCross RefCross Ref
  8. Driesen, K. and Hölzle, U. 1998. Accurate indirect branch prediction. In Proceedings of the 25th Annual International Symposium on Computer Architecture (ISCA98). 167--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Driesen, K. and Hölzle, U. 1999. Multi-stage cascaded prediction. In EuroPar'99 Conference Proceedings. Lecture Notes in Computer Science, vol. 1685. Springer, 1312--1321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ertl, M. A. 1995. Stack caching for interpreters. In SIGPLAN '95 Conference on Programming Language Design and Implementation. 315--327. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ertl, M. A. and Gregg, D. 2003a. Optimizing indirect branch prediction accuracy in virtual machine interpreters. In SIGPLAN '03 Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ertl, M. A. and Gregg, D. 2003b. The structure and performance of Efficient interpreters. J. Instruc.-Lev. Paral. 5. http://www.jilp.org/vol5/.Google ScholarGoogle Scholar
  13. Ertl, M. A. and Gregg, D. 2006. Optimizing Interpreters for Processors with Branch Target Buffers. Tech. rep. TCD-CS-2006-51, Department of Computer Science, University of Dublin, Trinity College, Dublin, Ireland.Google ScholarGoogle Scholar
  14. Ertl, M. A., Gregg, D., Krall, A., and Paysan, B. 2002. vmgen---a generator of efficient virtual machine interpreters. Softw. Prac. Exper. 32, 3, 265--294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ertl, M. A., Thalinger, C., and Krall, A. 2006. Superinstructions and replication in the Cacao JVM interpreter. J. .Net Techn. 4, 1, 31--38.Google ScholarGoogle Scholar
  16. Gagnon, E. 2003. A portable research framework for the execution of Java bytecode. Ph.D. thesis, McGill University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Gagnon, E. and Hendren, L. J. 2003. Effective inline-threaded interpretation of Java bytecode using preparation sequences. In Proceedings of Compiler Construction, 12th International Conference (CC'03). The Joint European Conferences on Theory and Practice of Software, ETAPS'03. 170--184.Google ScholarGoogle Scholar
  18. Gagnon, E. M. and Hendren, L. J. 2001. SableVM: A research framework for the efficient execution of Java bytecode. In Proceedings of the Java Virtual Machine Research and Technology Symposium (JVM'01). Monterey, CA, 27--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Gochman, S., Ronen, R., Anati, I., Berkovits, A., Kurts, T., Naveh, A., Saeed, A., Sperber, Z., and Valentine, R. 2003. The Intel Pentium M processor: microarchitecture and performance. Intel Tech. J. 7, 2, 20--36.Google ScholarGoogle Scholar
  20. Hoogerbrugge, J. and Augusteijn, L. 2000. Pipelined Java virtual machine interpreters. In Proceedings of the 9th International Conference on Compiler Construction (CC'00). Lecture Notes in Computer Science, Springer Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Hoogerbrugge, J., Augusteijn, L., Trum, J., and van de Wiel, R. 1999. A code compression system based on pipelined interpreters. Softw. Prac. Exper. 29, 11, 1005--1023. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kaeli, D. R. and Emma, P. G. 1994. Case block table for holding multi-way branches. US Patent No. 5,333,283.Google ScholarGoogle Scholar
  23. Kaeli, D. R. and Emma, P. G. 1997. Improving the accuracy of history-based branch prediction. IEEE Trans. Comput. 46, 4, 469--472. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kalamatianos, J. and Kaeli, D. 1999. Indirect branch prediction using data compression techniques. J. Instruc. Lev. Paral.Google ScholarGoogle Scholar
  25. Krall, A. 1994. Improving semi-static branch prediction by code replication. In Conference on Programming Language Design and Implementation. ACM, 97--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Li, T., Bhargava, R., and John, L. K. 2005. Adapting branch-target buffer to improve the target predictability of Java code. ACM Trans. Archit. Code Optimiz. 2, 2, 109--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Piumarta, I. and Riccardi, F. 1998. Optimizing direct threaded code by selective inlining. In SIGPLAN'98 Conference on Programming Language Design and Implementation. 291--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Proebsting, T. A. 1995. Optimizing an ANSI C interpreter with superoperators. In Principles of Programming Languages (POPL'95). 322--332. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Romer, T. H., Lee, D., Voelker, G. M., Wolman, A., Wong, W. A., Baer, J.-L., Bershad, B. N., and Levy, H. M. 1996. The structure and performance of interpreters. In Architectural Support for Programming Languages and Operating Systems (ASPLOS-VII). 150--159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Rossi, M. and Sivalingam, K. 1996. A survey of instruction dispatch techniques for byte-code interpreters. Tech. rep. TKO-C79, Faculty of Information Technology, Helsinki University of Technology.Google ScholarGoogle Scholar
  31. Santos Costa, V. 1999. Optimising bytecode emulation for Prolog. In Proceedings of PPDP'99. Lecture Notes in Computer Science, vol. 1702. Springer-Verlag, 261--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Smith, J. and Nair, R. 2005. Virtual Machines: Versatile Platforms for Systems and Processes. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Sun-Microsystems. 2001. The Java Hotspot virtual machine. Tech. rep., Sun Microsystems Inc.Google ScholarGoogle Scholar
  34. Young, C., Gloy, N., and Smith, M. D. 1995. A comparative analysis of schemes for correlated branch prediction. In 22nd Annual International Symposium on Computer Architecture. 276--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Young, C. and Smith, M. D. 1994. Improving the accuracy of static branch prediction using branch correlation. In Achitectural Support for Programming Languags and Operating Systems (ASPLOS VI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Zhou, J. and Ross, K. A. 2004. Buffering database operations for enhanced instruction cache performance. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM Press, 191--202. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimizing indirect branch prediction accuracy in virtual machine interpreters

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!