Abstract
Virtual Machines (VMs) with Just-In-Time (JIT) compilers are traditionally thought to execute programs in two phases: the initial warmup phase determines which parts of a program would most benefit from dynamic compilation, before JIT compiling those parts into machine code; subsequently the program is said to be at a steady state of peak performance. Measurement methodologies almost always discard data collected during the warmup phase such that reported measurements focus entirely on peak performance. We introduce a fully automated statistical approach, based on changepoint analysis, which allows us to determine if a program has reached a steady state and, if so, whether that represents peak performance or not. Using this, we show that even when run in the most controlled of circumstances, small, deterministic, widely studied microbenchmarks often fail to reach a steady state of peak performance on a variety of common VMs. Repeating our experiment on 3 different machines, we found that at most 43.5% of <VM, Benchmark> pairs consistently reach a steady state of peak performance.
Supplemental Material
Available for Download
- Jaromir Antoch, Marie Huskova, and Zuzana Prášková. 1997. Effect of dependence on statistics for determination of change. Journal of Statistical Planning and Inference 60 (May 1997), 291–310. Google Scholar
Cross Ref
- Doug Bagley, Brent Fulgham, and Isaac Gouy. 2004. The Computer Language Benchmarks Game. http://benchmarksgame. alioth.debian.org/ . (2004). Accessed: 2017-09-01.Google Scholar
- Edd Barrett, Carl Friedrich Bolz, and Laurence Tratt. 2015. Approaches to Interpreter Composition. COMLAN 44, C (March 2015). Google Scholar
Digital Library
- Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In OOPSLA. 169–190.Google Scholar
- Carl Friedrich Bolz and Laurence Tratt. 2015. The Impact of Meta-Tracing on VM Design and Implementation. SCICO 98, 3 (Feb. 2015), 408–421. Google Scholar
Digital Library
- James Charles, Preet Jassi, Ananth Narayan S, Abbas Sadat, and Alexandra Fedorova. 2009. Evaluation of the Intel Core i7 Turbo Boost Feature. In IISWC.Google Scholar
- Charlie Curtsinger and Emery D. Berger. 2013. Stabilizer: Statistically sound performance evaluation. In ASPLOS.Google Scholar
- Idris Eckley, Paul Fearnhead, and Rebecca Killick. 2011. Analysis of Changepoint Models. In Bayesian Time Series Models, D. Barber, T. Cemgil, and S. Chiappa (Eds.). Google Scholar
Cross Ref
- Andy Georges, Dries Buytaert, and Lieven Eeckhout. 2007. Statistically rigorous Java performance evaluation. SIGPLAN Not. 42, 10 (Oct. 2007), 57–76. Google Scholar
Digital Library
- Joseph Yossi Gil, Keren Lenz, and Yuval Shimron. 2011. A Microbenchmark Case Study and Lessons Learned. In VMIL. Google Scholar
Digital Library
- Google. 2012. Octane benchmark suite. https://developers.google.com/octane/ . (2012). Accessed: 2017-09-01.Google Scholar
- Intel. 2017. Intel 64 and IA-32 Architectures Software Developer’s Manual: P-State Hardware Coordination.Google Scholar
- Tomas Kalibera, Lubomir Bulej, and Petr Tuma. 2005. Benchmark precision and random initial state. In SPECTS.Google Scholar
- Tomas Kalibera and Richard Jones. 2012. Quantifying performance changes with effect size confidence intervals. Technical Report 4-12. University of Kent.Google Scholar
- Tomas Kalibera and Richard Jones. 2013. Rigorous Benchmarking in Reasonable Time. In ISMM. 63–74. Google Scholar
Digital Library
- Rebecca Killick and Idris Eckley. 2014. changepoint: An R Package for Changepoint Analysis. J. Stat. Soft. 58, 1 (May 2014), 1–19.Google Scholar
Cross Ref
- Rebecca Killick, Paul Fearnhead, and Idris Eckley. 2012. Optimal Detection of Changepoints With a Linear Computational Cost. J. Am. Stat. Assoc. 107, 500 (Dec. 2012), 1590–1598. Google Scholar
Cross Ref
- Linux. 2013. NO_HZ: Reducing Scheduling-Clock Ticks, Linux Kernel Documentation. https://www.kernel.org/-doc/Documentation/timers/NO_HZ.txt . (2013). Accessed: 2017-09-01.Google Scholar
- Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney. 2009. Producing Wrong Data Without Doing Anything Obviously Wrong!. In ASPLKS. 265–276.Google Scholar
- Paruj Ratanaworabhan, Benjamin Livshits, David Simmons, and Benjamin Zorn. 2009. JSMeter: Characterizing Real-World Behavior of JavaScript Programs. Technical Report MSR-TR-2009-173. Microsoft Research.Google Scholar
- Chris Seaton. 2015. Specialising Dynamic Techniques for Implementing the Ruby Programming Language. Ph.D. Dissertation. University of Manchester.Google Scholar
- Cristina P. Sison and Joseph Glaz. 1995. Simultaneous confidence intervals and sample size determination for multinomial proportions. J. ASA 90, 429 (March 1995), 366–369. Google Scholar
Cross Ref
- Livio Soares and Michael Stumm. 2010. FlexSC: Flexible System Call Scheduling with Exception-less System Calls. In OSDI. 1–8.Google Scholar
- John Tukey. 1977. Exploratory Data Analysis.Google Scholar
Index Terms
Virtual machine warmup blows hot and cold
Recommendations
Costs of Virtual Machine Live Migration: A Survey
SERVICES '12: Proceedings of the 2012 IEEE Eighth World Congress on ServicesLive migration allows moving a continuously running VM from one physical host to another. It provides special benefit for data centers in a variety of scenarios including load balancing, maintenance and power management. However virtual machine live ...
A feather-weight virtual machine for windows applications
VEE '06: Proceedings of the 2nd international conference on Virtual execution environmentsMany fault-tolerant and intrusion-tolerant systems require the ability to execute unsafe programs in a realistic environment without leaving permanent damages. Virtual machine technology meets this requirement perfectly because it provides an execution ...
Virtual Machine Migration Method between Different Hypervisor Implementations and Its Evaluation
WAINA '12: Proceedings of the 2012 26th International Conference on Advanced Information Networking and Applications WorkshopsVirtualization technologies are an important building block for cloud services. Each service will run on virtual machines (VMs) deployed over different hyper visors in the future. Therefore, a VM migration method between different hyper visor ...






Comments