skip to main content
10.1145/1297027.1297033acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
Article

Statistically rigorous java performance evaluation

Published:21 October 2007Publication History

ABSTRACT

Java performance is far from being trivial to benchmark because it is affected by various factors such as the Java application, its input, the virtual machine, the garbage collector, the heap size, etc. In addition, non-determinism at run-time causes the execution time of a Java program to differ from run to run. There are a number of sources of non-determinism such as Just-In-Time (JIT) compilation and optimization in the virtual machine (VM) driven by timer-based method sampling, thread scheduling, garbage collection, and various.

There exist a wide variety of Java performance evaluation methodologies usedby researchers and benchmarkers. These methodologies differ from each other in a number of ways. Some report average performance over a number of runs of the same experiment; others report the best or second best performance observed; yet others report the worst. Some iterate the benchmark multiple times within a single VM invocation; others consider multiple VM invocations and iterate a single benchmark execution; yet others consider multiple VM invocations and iterate the benchmark multiple times.

This paper shows that prevalent methodologies can be misleading, and can even lead to incorrect conclusions. The reason is that the data analysis is not statistically rigorous. In this paper, we present a survey of existing Java performance evaluation methodologies and discuss the importance of statistically rigorous data analysis for dealing with non-determinism. We advocate approaches to quantify startup as well as steady-state performance, and, in addition, we provide the JavaStats software to automatically obtain performance numbers in a rigorous manner. Although this paper focuses on Java performance evaluation, many of the issues addressed in this paper also apply to other programming languages and systems that build on a managed runtime system.

References

  1. M. Arnold, S. Fink, D. Grove, M. Hind, and P. F. Sweeney. Adaptive optimization in the Jalapeño JVM. In OOPSLA, pages 47--65, Oct. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Arnold, M. Hind, and B. G. Ryder. Online feedback-directed optimization of Java. In OOPSLA, pages 111--129, Nov. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Barabash, Y. Ossia, and E. Petrank. Mostly concurrent garbage collection revisited. In OOPSLA, pages 255--268, Nov. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. O. Ben-Yitzhak, I. Goft, E. K. Kolodner, K. Kuiper, and V. Leikehman. An algorithm for parallel incremental compaction. In ISMM, pages 207--212, Feb. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Blackburn, P. Cheng, and K. McKinley. Myths and reality: The performance impact of garbage collection. In SIGMETRICS, pages 25--36, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Blackburn, P. Cheng, and K. McKinley. Oil and water? High performance garbage collection in Java with JMTk. In ICSE, pages 137--146, May 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, MHirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovic, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA, pages 169--190, Oct. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. M. Blackburn and K. S. McKinley. In or out?: Putting write barriers in their place. In ISMM, pages 281--290, June 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. M. Blackburn and K. S. McKinley. Ulterior reference counting: Fast garbage collection without a long wait. In OOPSLA, pages 344--358, Oct. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Eeckhout, A. Georges, and K. De Bosschere. How Java programs interact with virtual machines at the microarchitectural level. In OOPSLA, pages 169- 186, Oct. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Gu, C. Verbrugge, and E. M. Gagnon. Relative factors in performance analysis of Java virtual machines. In VEE, pages 111--121, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Hauswirth, P. F. Sweeney, A. Diwan, and M. Hind. Vertical profiling: Understanding the behavior of object-priented applications. In OOPSLA, pages 251--269, Oct. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J.L. Hintze, and R.D. Nelson. Violin Plots: A Box Plot-Density Trace Synergism In The American Statistician, Volume 52(2), pages 181--184, May 1998.Google ScholarGoogle Scholar
  14. X. Huang, S. M. Blackburn, K. S. McKinley, J. E. B. Moss, Z. Wang, and P. Cheng. The garbage collection advantage: Improving program locality. In OOPSLA, pages 69--80, Oct. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. A. Johnson and D.W. Wichern Applied Multivariate Statistical Analysis Prentice Hall, 2002 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. J. Lilja. Measuring Computer Performance: A Practitioner's Guide. Cambridge University Press, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Maebe, D. Buytaert, L. Eeckhout, and K. De Bosschere. Javana: A system for building customized Java program analysis tools. In OOPSLA, pages 153--168, Oct. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. McGachey and A. L. Hosking. Reducing generational copy reserve overhead with fallback compaction. In ISMM, pages 17--28, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Neter, M. H. Kutner, W. Wasserman, and C. J. Nachtsheim Applied Linear Statistical Models WCB/McGraw-Hill, 1996.Google ScholarGoogle Scholar
  20. N. Sachindran and J. E. B. Moss. Mark-copy: Fast copying GC with less space overhead. In OOPSLA, pages 326--343, Oct. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. Sagonas and J. Wilhelmsson. Mark and split. In ISMM, pages 29--39, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Siegwart and M. Hirzel. Improving locality with parallel hierarchical copying GC. In ISMM, pages 52--63, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Standard Performance Evaluation Corporation. SPECjvm98 Benchmarks. http://www.spec.org/jvm98.Google ScholarGoogle Scholar
  24. P. F. Sweeney, M. Hauswirth, B. Cahoon, P. Cheng, A. Diwan, D. Grove, and M. Hind. Using hardware performance monitors to understand the behavior of Java applications. In VM, pages 57--72, May 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Zhang, K. Kelsey, X. Shen, C. Ding, MHertz, and M. Ogihara. Program--level adaptive memory management. In ISMM, pages 174--183, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Statistically rigorous java performance evaluation

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!