Abstract
Since benchmarks drive computer science research and industry product development, which ones we use and how we evaluate them are key questions for the community. Despite complex runtime tradeoffs due to dynamic compilation and garbage collection required for Java programs, many evaluations still use methodologies developed for C, C++, and Fortran. SPEC, the dominant purveyor of benchmarks, compounded this problem by institutionalizing these methodologies for their Java benchmark suite. This paper recommends benchmarking selection and evaluation methodologies, and introduces the DaCapo benchmarks, a set of open source, client-side Java benchmarks. We demonstrate that the complex interactions of (1) architecture, (2) compiler, (3) virtual machine, (4) memory management, and (5) application require more extensive evaluation than C, C++, and Fortran which stress (4) much less, and do not require (3). We use and introduce new value, time-series, and statistical metrics for static and dynamic properties such as code complexity, code size, heap composition, and pointer mutations. No benchmark suite is definitive, but these metrics show that DaCapo improves over SPEC Java in a variety of ways, including more complex code, richer object behaviors, and more demanding memory system requirements. This paper takes a step towards improving methodologies for choosing and evaluating benchmarks to foster innovation in system design and implementation for Java and other managed languages.
References
- B. Alpern, D. Attanasio, J. J. Barton, A. Cocchi, S. F. Hummel, D. Lieber, M. Mergen, T. Ngo, J. Shepherd, and S. Smith. Implementing Jalapeño in Java. In ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, Denver, CO, Nov. 1999.]] Google Scholar
Digital Library
- Apache Software Foundation. Apache Software License, 2000.http://www.-open-source.org/-licenses/apachepl.php.]]Google Scholar
- C. Attanasio, D. Bacon, A. Cocchi, and S. Smith. A comparative evaluation of parallel garbage collectors. In Proceedings of the Fourteenth Workshop on Languages and Compilers for Parallel Computing, Cumberland Falls, KY, Aug. 2001.]]Google Scholar
- S. M. Blackburn, P. Cheng, and K. S. McKinley. Myths and realities: The performance impact of garbage collection. In Proceedings of the ACM Conference on Measurement & Modeling Computer Systems, pages 25--36, NY, NY, June 2004.]] Google Scholar
Digital Library
- S. M. Blackburn, P. Cheng, and K. S. McKinley. Oil and water? High performance garbage collection in Java with MMTk. In Proceedings of the 26th International Conference on Software Engineering, pages 137--146, Scotland, UK, May 2004.]] Google Scholar
Digital Library
- S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo Benchmarks: Java benchmarking development and analysis (extended version). Technical Report TR-CS-06-01, Dept. of Computer Science, Australian National University, 2006. http://www.dacapobench.org.]]Google Scholar
- S. M. Blackburn and A. Hosking. Barriers: Friend or foe? In The International Symposium on Memory Management, pages 143--151, Oct. 2004.]] Google Scholar
Digital Library
- S. M. Blackburn and K. S. McKinley. In or out? Putting write barriers in their place. In The International Symposium on Memory Management, pages 175--184, June 2002.]] Google Scholar
Digital Library
- S. M. Blackburn, S. Singhai, M. Hertz, K. S. McKinley, and J. E. B. Moss. Pretenuring for Java. In ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pages 342--352, Tampa, FL, Oct. 2001. ACM.]] Google Scholar
Digital Library
- T. Brecht, E. Arjomandi, C. Li, and H. Pham. Controlling garbage collection and heap growth to reduce the execution time of Java applications. In ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pages 353--366, Tampa, FL, 2001.]] Google Scholar
Digital Library
- B. Cahoon and K. S. McKinley. Data flow analysis for software prefetching linked data structures in java controller. In The International Conference on Parallel Architectures and Compilation Techniques, pages 280--291, Barcelona, Spain, Sept. 2001.]] Google Scholar
Digital Library
- S. R. Chidamber and C. F. Kemerer. A metrics suite for object-oriented design. IEEE Transactions on Software Engineering, 20(6):476--493, June 1994.]] Google Scholar
Digital Library
- F. Chow, A. Wright, and K. Lai. Characterization of java workloads by principal components analysis and indert branches. In Proceedings of the Workshop on Workload Characterization, pages 11--19, Dallas, TX, Nov. 1998.]]Google Scholar
- DaCapo Project. The DaCapo Benchmarks, beta-2006-08, 2006.http://www.-dacapo-bench.-org.]]Google Scholar
- S. Dieckmann and U. Hölzle. A study of the allocation behavior of the SPECjvm98 Java benchmarks. In European Conference on Object-Oriented Programming, June 1999.]] Google Scholar
Digital Library
- A. Diwan, D. Tarditi, and J. E. B. Moss. Memory subsystem performance of programs using copying garbage collection. In Conference Record of the Twenty-First ACM Symposium on Principles of Programming Languages, pages 1--14, Portland, OR, Jan. 1994.]] Google Scholar
Digital Library
- B. Dufour, K. Driesen, L. Hendren, and C. Verbrugge. Dynamic metrics for Java. In ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pages 149--168, Anaheim, CA, Oct. 2003.]] Google Scholar
Digital Library
- G. H. Dunteman. Principal Components Analysis. Sage Publications, 1989.]]Google Scholar
- L. Eeckhout, A. Georges, and K. De Bosschere. How Java programs interact with virtual machines at the microarchitecture level. In ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pages 169--186, Anaheim, CA, October 2003.]] Google Scholar
Digital Library
- R. Fitzgerald and D. Tarditi. The case for profile-directed selection of garbage collectors. In The International Symposium on Memory Management, pages 111--120, Minneapolis, MN, Oct. 2000.]] Google Scholar
Digital Library
- M. Hauswirth, A. Diwan, P. Sweeney, and M. Mozer. Automating vertical profiling. In ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pages 281--296, San Diego, CA, October 2005.]] Google Scholar
Digital Library
- M. W. Hicks, J. T. Moore, and S. Nettles. The measured cost of copying garbage collection mechanisms. In ACM International Conference on Functional Programming, pages 292--305, 1997.]] Google Scholar
Digital Library
- U. Hölzle and D. Ungar. Do object-oriented languages need special hardware support? In European Conference on Object-Oriented Programming, pages 283--302, London, UK, 1995.]] Google Scholar
Digital Library
- A. L. Hosking, J. E. B. Moss, and D. Stefanović. A comparative performance evaluation of write barrier implementations. In ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pages 92--109, Vancouver, BC, Oct. 1992.]] Google Scholar
Digital Library
- X. Huang, Z. Wang, S. M. Blackburn, K. S. McKinley, J. E. B. Moss, and P. Cheng. The garbage collection advantage: Improving mutator locality. In ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pages 69--80, Vancouver, BC, 2004.]] Google Scholar
Digital Library
- Java Grande Forum. The Java Grande Benchmark Suite, 2006.http://www.-epcc-.ed.ac.uk/-javagrande/.]]Google Scholar
- A. Joshi, A. Phansalkar, L. Eeckhout, and L. John. Measuring benchmark similarity using inherent program characteristics. IEEE Transactions on Computers, 55(6):769--782, June 2006.]] Google Scholar
Digital Library
- J. Kim and Y. Hsu. Memory system behavior of Java programs: methodology and analysis. In Proceedings of the ACM Conference on Measurement & Modeling Computer Systems, pages 264--274, Santa Clara, California, June 2000.]] Google Scholar
Digital Library
- T. Li, L. John, V. Narayanan, A. Sivasubramaniam, J. Sabarinathan, and A. Murthy. Using complete system simulation to characterize SPECjvm98 benchmarks. In Proceedings of the 2000 ACM International Conference on Supercomputing, pages 22--33, Santa Fe, NM, 2000.]] Google Scholar
Digital Library
- Y. Luo and L. John. Workload characterization of multithreaded Java servers. In IEEE International Symposium on Performance Analysis of Systems and Software, pages 128--136, 2001.]]Google Scholar
- M. Marden, S. Lu, K. Lai, and M. Lipasti. Comparison of memory system behavior in Java and non-Java commercial workloads. In Proceedings of the Workshop on Computer Architecture Evaluation using Commercial Workloads, Boston, MA, Feb. 2002.]]Google Scholar
- R. Radhakrishnan, N. Vijaykrishnan, L. K., A. Sivasubramaniam, J. Rubio, and J. Sabarinathan. Java runtime systems: Characterization and architectural implications. IEEE Transactions on Computers, 50(2):131--146, Feb. 2001.]] Google Scholar
Digital Library
- A. Rajan, S. Hu, and J. Rubio. Cache performance in Java virtual machines: A study of constituent phases. In IEEE International Workshop on Workload Characterization, Nov. 2002.]]Google Scholar
Cross Ref
- A. Rogers, M. Carlisle, J. H. Reppy, and L. J. Hendren. Supporting dynamic data structures on distributed-memory machines. ACM Transactions on Programming Languages and Systems, 17(2):233--263, Mar. 1995.]] Google Scholar
Digital Library
- Y. Shuf, M. J. Serran, M. Gupta, and J. P. Singh. Characterizing the memory behavior of Java workloads: A structured view and opportunities for optimizations. In Proceedings of the ACM Conference on Measurement & Modeling Computer Systems, pages 194--205, Cambridge, MA, June 2001.]] Google Scholar
Digital Library
- D. D. Spinellis. ckjm Chidamber and Kemerer metrics Software, v 1.6. Technical report, Athens University of Economics and Business, 2005.http://-www.-spinellis.-gr/-sw/-ckjm.]]Google Scholar
- Standard Performance Evaluation Corporation. SPEC jvm98 Documentation, release 1.03 edition, March 1999.]]Google Scholar
- Standard Performance Evaluation Corporation. SPEC jbb2000 (Java Business Benchmark) Documentation, release 1.01 edition, 2001.]]Google Scholar
- D. Stefanović. Properties of Age-Based Automatic Memory Reclamation Algorithms. PhD thesis, Department of Computer Science, University of Massachusetts, Amherst, Massachusetts, Dec. 1998.]] Google Scholar
Digital Library
- D. Stefanović, M. Hertz, S. M. Blackburn, K. McKinley, and J. Moss. Older-first garbage collection in practice: Evaluation in a Java virtual machine. In Memory System Performance, Berlin, Germany, June 2002.]] Google Scholar
Digital Library
- D. Tarditi and A. Diwan. Measuring the cost of memory management. Lisp and Symbolic Computation, 9(4), Dec. 1996.]] Google Scholar
Digital Library
- Virtutec, Inc. Virtutech Simics, 2006.http://www.simics.net.]]Google Scholar
Index Terms
The DaCapo benchmarks: java benchmarking development and analysis






Comments