Abstract
As hardware complexity increases and virtualization is added at more layers of the execution stack, predicting the performance impact of optimizations becomes increasingly difficult. Production compilers and virtual machines invest substantial development effort in performance tuning to achieve good performance for a range of benchmarks. Although optimizations typically perform well on average, they often have unpredictable impact on running time, sometimes degrading performance significantly. Today's VMs perform sophisticated feedback-directed optimizations, but these techniques do not address performance degradations, and they actually make the situation worse by making the system more unpredictable.This paper presents an online framework for evaluating the effectiveness of optimizations, enabling an online system to automatically identify and correct performance anomalies that occur at runtime. This work opens the door for a fundamental shift in the way optimizations are developed and tuned for online systems, and may allow the body of work in offline empirical optimization search to be applied automatically at runtime. We present our implementation and evaluation of this system in a product Java VM.
- A.-R. Adl-Tabatabai, J. Bharadwaj, D.-Y. Chen, A. Ghuloum, V. Menon, B. Murphy, M. Serrano, and T. Shpeisman. The Star-JIT compiler: A dynamic compiler for managed runtime environments. Intel Technology Journal, 7(1):19--31, Feb. 2003.Google Scholar
- A.-R. Adl-Tabatabai, R. L. Hudson, M. J. Serrano, and S. Subramoney. Prefetch injection based on hardware monitoring and object metadata. ACM SIGPLAN Notices, 39(6):267--276, June 2004. In Conference on Programming Language Design and Implementation (PLDI). Google Scholar
Digital Library
- F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. O'Boyle, J. Thomson, M. Toussaint, and C. Williams. Using machine learning to focus iterative optimization. In The International Symposium on Code Generation and Optimization, 2006. Google Scholar
Digital Library
- A. W. Appel. Simple generational garbage collection and fast allocation. Software-Practice and Experience, 19(2):171--183, Feb. 1989. Google Scholar
Digital Library
- M. Arnold, S. Fink, D. Grove, M. Hind, and P. F. Sweeney. Adaptive optimization in the Jalapeño JVM. ACM SIGPLAN Notices, 35(10):47--65, Oct. 2000. In Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA). Google Scholar
Digital Library
- M. Arnold, S. J. Fink, D. Grove, M. Hind, and P. F. Sweeney. A survey of adaptive optimization in virtual machines. Proceedings of the IEEE, 93(2), 2005. Special issue on Program Generation, Optimization, and Adaptation.Google Scholar
Cross Ref
- M. Arnold and B. G. Ryder. A framework for reducing the cost of instrumented code. ACMSIGPLAN Notices, 36(5):168--179, May 2001. In Conference on Programming Language Design and Implementation (PLDI). Google Scholar
Digital Library
- D. F. Bacon, R. Konuru, C. Murthy, and M. Serrano. Thin locks: Featherweight synchronization for Java. ACM SIGPLAN Notices, 33(5):258--268, May 1998. In Conference on Programming Language Design and Implementation (PLDI). Google Scholar
Digital Library
- J. Bilmes, K. Asanovic, C.-W. Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology. In 1997 International Conference on Supercomputing, pages 340--347, 1997. Google Scholar
Digital Library
- J. Cavazos and J. E. B. Moss. Inducing heuristics to decide whether to schedule. ACM SIGPLAN Notices, 39(6):183--194, June 2004. In Conference on Programming Language Design and Implementation (PLDI). Google Scholar
Digital Library
- C. Chen, J. Chame, and M. Hall. Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy. In The International Symposium on Code Generation and Optimization, Mar. 2005. Google Scholar
Digital Library
- H. Chen, J. Lu, W.-C. Hsu, and P.-C. Yew. Continuous adaptive objectcode re-optimization framework. In Ninth Asia-Pacific Computer Systems Architecture, Sept. 2004.Google Scholar
Cross Ref
- B. Childers, J. Davidson, and M. L. Soffa. Continuous compilation: A new approach to aggressive and adaptive code transformation. In International Symposium on Parallel and Distributed Processing Symposium, Apr. 2003. Google Scholar
Digital Library
- T. M. Chilimbi and M. Hirzel. Dynamic hot data stream prefetching for general-purpose programs. ACM SIGPLAN Notices, 37(5):199--209, May 2002. In Conference on Programming Language Design and Implementation (PLDI). Google Scholar
Digital Library
- http://www-plan.cs.colorado.edu/henkel/projects/colorado bench.Google Scholar
- K. Cooper, P. Schielke, and D. Subramanian. Optimizing for reduced code space using genetic algorithms. In ACM Conference on Languages, Compilers, and Tools for Embedded Systems, May 1999. Google Scholar
Digital Library
- K. D. Cooper, D. Subramanian, and L. Torczon. Adaptive optimizing compilers for the 21st century. Journal of Supercomputing, May 2002. Google Scholar
Digital Library
- http://pag.csail.mit.edu/daikon.Google Scholar
- J. Dean and C. Chambers. Towards better inlining decisions using inlining trials. In LISP and Functional Programming, pages 273--282, 1994. Google Scholar
Digital Library
- L. P. Deutsch and A. M. Schiffman. Efficient implementation of the Smalltalk-80 system. In 11th Symposium on Principles of Programming Languages (POPL), pages 297--302, Jan. 1984. Google Scholar
Digital Library
- P. C. Diniz and M. C. Rinard. Dynamic feedback: An effective technique for adaptive computing. ACM SIGPLAN Notices, 32(5):71--84, May 1997. In Conference on Programming Language Design and Implementation (PLDI). Google Scholar
Digital Library
- M. Frigo. A fast Fourier transform compiler. ACM SIGPLAN Notices, 34(5):169--180, May 1999. In Conference on Programming Language Design and Implementation (PLDI). Google Scholar
Digital Library
- M. Frigo and S. G. Johnson. FFTW: An adaptive software architecture for the FFT. In 1998 IEEE International Conference on Acoustics Speech and Signal Processing, volume 3, pages 1381--1384. IEEE, 1998.Google Scholar
Cross Ref
- G. Fursin, A. Cohen, M. O'Boyle, and O. Temam. A practical method for quickly evaluating program optimizations. In Proceedings of the 1st International Conference on High Performance Embedded Architectures & Compilers (HiPEAC 2005), number 3793 in LNCS, pages 29--46. Springer Verlag, November 2005. Google Scholar
Digital Library
- N. Grcevski, A. Kilstra, K. Stoodley, M. Stoodley, and V. Sundaresan. Java just-in-time compiler and virtual machine improvements for server and middleware applications. In 3rd Virtual Machine Research and Technology Symposium (VM), May 2004. Google Scholar
Digital Library
- G. J. Hansen. Adaptive Systems for the Dynamic Run-time Optimization of Programs. PhD thesis, Carnegie-Mellon University, 1974. Google Scholar
Digital Library
- M. Hirzel, A. Diwan, and M. Hind. Pointer analysis in the pressence of dynamic class loading. In 18th European Conference on Object-Oriented Programming (ECOOP), volume 3086 of LNCS, pages 96--122, June 2004.Google Scholar
- U. Hölzle, C. Chambers, and D. Ungar. Optimizing dynamicallytyped object-oriented languages with polymorphic inline caches. In 5th European Conference on Object-Oriented Programming (ECOOP), volume 512 of LNCS, pages 21--38, July 1991. Google Scholar
Digital Library
- U. Hölzle and D. Ungar. Reconciling responsiveness with performance in pure object-oriented languages. Transactions on Programming Languages and Systems (TOPLAS), 18(4):355--400, July 1996. Google Scholar
Digital Library
- X. Huang, S. M. Blackburn, K. S. McKinley, J. E. B. Moss, Z. Wang, and P. Cheng. The garbage collection advantage: Improving program locality. ACM SIGPLAN Notices, 39(10):69--80, Oct. 2004. In Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA). Google Scholar
Digital Library
- E.-J. Im and K. Yelick. Optimizing sparse matrix-vector multiplication for register reuse in SPARSITY. In International Conference on Computational Science, May 2001. Google Scholar
Digital Library
- E.-J. Im, K. Yelick, and R. Vuduc. SPARSITY: Optimization framework for sparse matrix kernels. International Journal of High Performance Computing Applications, 18(1), Jan. 2004. Google Scholar
Digital Library
- K. Ishizaki, M. Takeuchi, K. Kawachiya, T. Suganuma, O. Gohda, T. Inagaki, A. Koseki, K. Ogata, M. Kawahito, T. Yasue, T. Ogasawara, T. Onodera, H. Komatsu, and T. Nakatani. Effectiveness of crossplatform optimizations for a Java just-in-time compiler. ACMSIGPLAN Notices, 38(11):187--204, Nov. 2003. Google Scholar
Digital Library
- T. Kistler and M. Franz. Automated data-member layout of heap objects to improve memory-hierarchy performance. Transactions on Programming Languages and Systems (TOPLAS), 22(3):49--505, 2000. Google Scholar
Digital Library
- T. P. Kistler and M. Franz. Continuous program optimization: Design and evaluation. IEEE Transactions on Computers, 50(6):549--566, June 2001. Google Scholar
Digital Library
- P. Kulkarni, D. Whalley, G. Tyson, and J. Davidson. Exhaustive optimization phase order space exploration. In The International Symposium on Code Generation and Optimization, Mar. 2006. Google Scholar
Digital Library
- P. A. Kulkarni, S. R. Hines, D. B. Whalley, J. D. Hiser, J. W. Davidson, and D. L. Jones. Fast and efficient searches for effective optimization phase sequences. ACM Transactions on Architecture and Code Optimization, 2(2):165--198, June 2005. Google Scholar
Digital Library
- X. Li, M. J. Garzarán, and D. Padua. Optimizing sorting with genetic algorithms. In The International Symposium on Code Generation and Optimization, pages 99--110, Mar. 2005. Google Scholar
Digital Library
- D. Maier, P. Ramarao, M. Stoodley, and V. Sundaresan. Experiences with multithreading and dynamic class loading in a Java just-in-time compiler. In The International Symposium on Code Generation and Optimization, Mar. 2006. Google Scholar
Digital Library
- P. Nagpurkar, M. Hind, C. Krintz, P. F. Sweeney, and V. Rajan. Online phase detection algorithms. In The International Symposium on Code Generation and Optimization, Mar. 2006. Google Scholar
Digital Library
- N. Nethercote, D. Burger, and K. S. McKinley. Self-evaluating compilation applied to loop unrolling. Technical Report TR-06-12, The University of Texas at Austin, Feb. 2006.Google Scholar
- M. Paleczny, C. Vick, and C. Click. The Java Hotspot server compiler. In Java Virtual Machine Research and Technology Symposium (JVM), pages 1--12, Apr. 2001. Google Scholar
Digital Library
- M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. W. Singer, J. Xiong, F. Franchetti, A. Gačić, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, 93(2), 2005. Special issue on Program Generation, Optimization, and Adaptation.Google Scholar
Cross Ref
- R. M. Rabbah and K. V. Palem. Data remapping for design space optimization of embedded memory systems. ACM Transactions on Embedded Computing Systems, 2(2):1--32, May 2003. Google Scholar
Digital Library
- R. Saavedra and D. Park. Improving the effectiveness of software prefetching with adaptive execution. In Conference on Parallel Architectures and Compilation Techniques, 1996. Google Scholar
Digital Library
- http://www.sable.mcgill.ca/software/#soot.Google Scholar
- Standard Performance Evaluation Corporation. SPECjbb2000 Java Business Benchmark. http://www.spec.org/jbb2000.Google Scholar
- Standard Performance Evaluation Corporation. SPECjvm98 Benchmarks. http://www.spec.org/jvm98.Google Scholar
- M. Stephenson and S. Amarasinghe. Predicting unroll factors using supervised classification. In The International Symposium on Code Generation and Optimization, pages 123--134, 2005. Google Scholar
Digital Library
- T. Suganuma, T. Yasue, M. Kawahito, H. Komatsu, and T. Nakatani. A dynamic optimization framework for a Java just-in-time compiler. ACM SIGPLAN Notices, 36(11):180--195, Nov. 2001. In Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA). Google Scholar
Digital Library
- The DaCapo Project. DaCapo Benchmark Suite, version beta051009. http://www-ali.cs.umass.edu/DaCapo/gcbm.html.Google Scholar
- N. Thomas, G. Tanase, O. Tkachyshyn, J. Perdue, N. M. Amato, and L. Rauchwerger. A framework for adaptive algorithm selection in STAPL. In Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 277--288, 2005. Google Scholar
Digital Library
- S. Triantafyllis, M. Vachharajani, and D. August. Compiler optimization-space exploration. Journal of Instruction-Level Parallelism, 7:1--25, Jan. 2005.Google Scholar
- M. J. Voss and R. Eigemann. High-level adaptive program optimization with ADAPT. ACM SIGPLAN Notices, 36(7):93--102, July 2001. Google Scholar
Digital Library
- R. C. Whaley, A. Petitet, and J. J. Dongarra. Automated empirical optimizations of software and the ATLAS project. Parallel Computing, 27(1-2):3--35, 2001.Google Scholar
Digital Library
- http://xml.apache.org/xerces2-j/index.html.Google Scholar
- K. Yotov, X. Li, G. Ren, M. Cibulskis, G. DeJong, M. Garzaran, D. Padua, K. Pingali, P. Stodghill, and P.Wu. A comparison of empirical and model-driven optimization. In ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2003. Google Scholar
Digital Library
Index Terms
Online performance auditing: using hot optimizations without getting burned
Recommendations
Online performance auditing: using hot optimizations without getting burned
PLDI '06: Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and ImplementationAs hardware complexity increases and virtualization is added at more layers of the execution stack, predicting the performance impact of optimizations becomes increasingly difficult. Production compilers and virtual machines invest substantial ...
Devirtualizable virtual machines enabling general, single-node, online maintenance
ASPLOS XI: Proceedings of the 11th international conference on Architectural support for programming languages and operating systemsMaintenance is the dominant source of downtime at high availability sites. Unfortunately, the dominant mechanism for reducing this downtime, cluster rolling upgrade, has two shortcomings that have prevented its broad acceptance. First, cluster-style ...
Devirtualizable virtual machines enabling general, single-node, online maintenance
ASPLOS '04Maintenance is the dominant source of downtime at high availability sites. Unfortunately, the dominant mechanism for reducing this downtime, cluster rolling upgrade, has two shortcomings that have prevented its broad acceptance. First, cluster-style ...







Comments