skip to main content
10.1145/1640089.1640116acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
research-article

Allocation wall: a limiting factor of Java applications on emerging multi-core platforms

Published:25 October 2009Publication History

ABSTRACT

Multi-core processors are widely used in computer systems. As the performance of microprocessors greatly exceeds that of memory, the memory wall becomes a limiting factor. It is important to understand how the large disparity of speed between processor and memory influences the performance and scalability of Java applications on emerging multi-core platforms.

In this paper, we studied two popular Java benchmarks, SPECjbb2005 and SPECjvm2008, on multi-core platforms including Intel Clovertown and AMD Phenom. We focus on the "partially scalable" benchmark programs. With smaller number of CPU cores these programs scale perfectly, but when more cores and software threads are used, the slope of the scalability curve degrades dramatically.

We identified a strong correlation between scalability, object allocation rate and memory bus write traffic in our experiments with our partially scalable programs. We find that these applications allocate large amounts of memory and consume almost all the memory write bandwidth in our hardware platforms. Because the write bandwidth is so limited, we propose the following hypothesis: the scalability and performance is limited by the object allocation on emerging multi-core platforms for those objects-allocation intensive Java applications, as if these applications are running into an "allocation wall".

In order to verify this hypothesis, several experiments are performed, including measuring key architecture level metrics, composing a micro-benchmark program, and studying the effect of modifying some of the "partially scalable" programs. All the experiments strongly suggest the existence of the allocation wall.

References

  1. RTSJ(Real Time Specification for Java) Main Page. http://www.rtsj.org/.Google ScholarGoogle Scholar
  2. AMD. Amd phenom x4 quad-core and amd phenom x3 triplecore processors. http://www.amd.com/us-en/Processors/ProductInformation/0,,30 118 15331 15%332,00.html.Google ScholarGoogle Scholar
  3. BLACKBURN, S. M., CHENG, P., AND MCKINLEY, K. S. Myths and realities: The performance impact of garbage collection. In Proceedings of the ACM Conference on Measurement&Modeling Computer Systems (2004), ACM Press, pp. 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. BLACKBURN, S. M., GARNER, R., HOFFMAN, C., KHAN, A. M., MCKINLEY, K. S., BENTZUR, R., DIWAN, A., FEINBERG, D., FRAMPTON, D., GUYER, S. Z., HIRZEL, M., HOSKING, A., JUMP, M., LEE, H., MOSS, J. E. B., PHANSALKAR, A., STEFANOVIĆ, D., VANDRUNEN, T., VON DINCKLAGE, D., AND WIEDERMANN, B. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA '06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-Oriented Programing, Systems, Languages, and Applications (New York, NY, USA, Oct. 2006), ACMPress, pp. 169--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. BURGER, D., GOODMAN, J. R., AND KAGI, A. Memory bandwidth limitations of future microprocessors. In ISCA (1996). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. CHEREM, S., AND RUGINA, R. Uniqueness inference for compile-time object deallocation. In ISMM (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. CORP., I. Intel microarchitecture (nehalem). http://www.intel.com/technology/architecturesilicon/nextgen/index.htm.Google ScholarGoogle Scholar
  8. DILLIG, I., DILLIG, T., YAHAV, E., AND CHANDRA, S. The closer: Automating resource management in java. In ISMM (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. FENICHEL, R. R., AND YOCHELSON, J. C. A lisp garbagecollector for virtual memory computer systems. Communications of the ACM (1969). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. GANESH, B., JALEEL, A., WANG, D., AND JACOB, B. Fully-buffered DIMM memory architectures: Understanding mechanisms, overheads and scaling. In HPCA (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. GEORGES, A., BUYTAERT, D., AND EECKHOUT, L. Statistically rigorous java performance evaluation. SIGPLAN Not. 42, 10 (2007), 57--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. HAMMOND, L., NAYFEH, B. A., AND OLUKOTUN, K. A single-chip multiprocessor. Computer 30, 9 (1997), 79--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. HOFSTEE, H. Power efficient processor architecture and the cell processor. High-Performance Computer Architecture, 2005. HPCA-11. 11th International Symposium on (12-16 Feb. 2005), 258--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. HUANG, W., QIAN, Y., SRISA-AN, W., AND CHANG, J. Object allocation and memory contention study of java multithreaded applications. Performance, Computing, and Communications, 2004 IEEE International Conference on (2004), 375--382.Google ScholarGoogle Scholar
  15. IBM CORP. http://www.ibm.com/systems/bladecenter/hardware/servers/hs21/index.html.Google ScholarGoogle Scholar
  16. INTEL CORP. http://processorfinder.intel.com/details.aspx?sspec=slac5.Google ScholarGoogle Scholar
  17. INTEL CORP. http://www.intel.com/Products/Server/Chipsets/5000P/5000Poverview.htm.Google ScholarGoogle Scholar
  18. IYER, R., BHAT, M., ZHAO, L., ILLIKKAL, R., MAKINENI, S., JONES, M., SHIV, K., AND NEWELL, D. Exploring smallscale and large-scale cmp architectures for commercial javaservers. IEEE Workload Characterization Symposium 0 (2006), 191--200.Google ScholarGoogle Scholar
  19. JOISHA, P. G. A principled approach to nondeferred reference-counting garbage collection. In VEE (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. KONGETIRA, P., AINGARAN, K., AND OLUKOTUN., K. Niagara: A 32-way multithreaded sparc processor. In IEEE Micro (2005). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. LARRY, M., AND CARL, S. lmbench: Portable tools for performance analysis. Proceedings of the USENIX 1996 Annual Technical Conference (1996). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. LEVON, J., AND ELIE., P. Oprofile: A system profiler for linux.Google ScholarGoogle Scholar
  23. LIEBERMAN, H., AND HEWITT, C. A realtime garbage collector based on the lifetimes of objects. Communications of the ACM (1983). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. LUO, Y., AND JOHN, L. K. Simulating java commercial throughput workload: A case study. In ICCD (2005). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. MARDEN, M., LIEN LU, S., LAI, K., AND LIPASTI, M. Comparison of memory system behavior in java and nonjava commercial workloads. In Proceedings of the Workshop on Computer Architecture Evaluation using Commercial Workloads (2002).Google ScholarGoogle Scholar
  26. MCCALPIN, J. D. Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter (Dec. 1995), 19--25.Google ScholarGoogle Scholar
  27. MCCARTHY, J. Recursive functions of symbolic expressions and their computation by machine. Communications of the ACM (1960). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. PERSSON, M. Java technology, IBM style: Garbage collection policies. IBM developerWorks (2006).Google ScholarGoogle Scholar
  29. SESHADRI, P., AND JOHN, L. K. Workload characterization of java server applications on two powerpc processors. In In Proceedings of the Third Annual Austin Center for Advanced Studies Conference (2002), pp. 328--333.Google ScholarGoogle Scholar
  30. SHAHAM, R., KOLODNER, E. K., AND SAGIV, M. Heap profiling for space-efficient java. In PLDI '01: Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation (New York, NY, USA, 2001), ACM, pp. 104--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. SHANKAR, A., ARNOLD, M., AND BODIK, R. Jolt: lightweight dynamic analysis and removal of object churn. In OOPSLA '08: Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications (New York, NY, USA, 2008), ACM, pp. 127--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. SHIV, K., CHOW, K., WANG, Y., AND PETROCHENKO, D. Specjvm2008 performance characterization. In SPEC Benchmark Workshop (2009), pp. 17--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. SHIV, K., IYER, R., BHAT, M., ILLIKKAL, R., JONES, M., MAKINENI, S., DOMER, J., AND NEWELL, D. Addressing cache/memory overheads in enterprise java cmp servers. In ISSWC (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. SPEC. SPECjbb2005 (Java Server Benchmark). http://www.spec.org/jbb2005/.Google ScholarGoogle Scholar
  35. SPEC. SPECjvm2008 Benchmarks. http://www.spec.org/jvm2008/docs/benchmarks/index.html.Google ScholarGoogle Scholar
  36. SPEC. SPECjvm2008 (Java Virtual Machine Benchmark). http://www.spec.org/jvm2008/.Google ScholarGoogle Scholar
  37. SPRACKLEN, L., AND ABRAHAM, S. G. Chip multithreading: Opportunities and challenges. In HPCA (2005). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. SUN MICROSYSTEMS. Tuning Garbage Collection with the 5.0 Java Virtual Machine.Google ScholarGoogle Scholar
  39. TIKIR, M. M., AND HOLLINGSWORTH, J. K. Numa-aware java heaps for server applications. In IPDPS '05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers (Washington, DC, USA, 2005), IEEE Computer Society, p. 108.2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. TSENG, J. H., YU, H., NAGAR, S., DUBEY, N., FRANKE, H., PATTNAIK, P., INOUE, H., AND NAKATANI, T. Performance studies of commercial workloads on a multi-core system. In IISWC (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. TUCK, N., AND TULLSEN, D. M. Initial observations of the simultaneous multithreading pentium 4 processor. Parallel Architectures and Compilation Techniques, International Conference on 0 (2003), 26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. UNGAR, D. Generation scavenging: a non-disruptive high performance storage reclamation algorithm. In ACM SIGSOFT Software Engineering Notes (1984). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. VOGT, P. D. Fully buffered DIMM (FB-DIMM) server memory architecture: Capacity, performance, reliability, and longevity. Intel Developer Forum (2004).Google ScholarGoogle Scholar
  44. WULF, W. A., AND MCKEE, S. A. Hitting the memory wall: implications of the obvious. ACM SIGARCH Computer Architecture News (1995). Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. XIAN, F., SRISA-AN, W., AND JIANG, H. Microphase: An approach to proactively invoking garbageGoogle ScholarGoogle Scholar

Index Terms

  1. Allocation wall: a limiting factor of Java applications on emerging multi-core platforms

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!