ABSTRACT
Multi-core processors are widely used in computer systems. As the performance of microprocessors greatly exceeds that of memory, the memory wall becomes a limiting factor. It is important to understand how the large disparity of speed between processor and memory influences the performance and scalability of Java applications on emerging multi-core platforms.
In this paper, we studied two popular Java benchmarks, SPECjbb2005 and SPECjvm2008, on multi-core platforms including Intel Clovertown and AMD Phenom. We focus on the "partially scalable" benchmark programs. With smaller number of CPU cores these programs scale perfectly, but when more cores and software threads are used, the slope of the scalability curve degrades dramatically.
We identified a strong correlation between scalability, object allocation rate and memory bus write traffic in our experiments with our partially scalable programs. We find that these applications allocate large amounts of memory and consume almost all the memory write bandwidth in our hardware platforms. Because the write bandwidth is so limited, we propose the following hypothesis: the scalability and performance is limited by the object allocation on emerging multi-core platforms for those objects-allocation intensive Java applications, as if these applications are running into an "allocation wall".
In order to verify this hypothesis, several experiments are performed, including measuring key architecture level metrics, composing a micro-benchmark program, and studying the effect of modifying some of the "partially scalable" programs. All the experiments strongly suggest the existence of the allocation wall.
- RTSJ(Real Time Specification for Java) Main Page. http://www.rtsj.org/.Google Scholar
- AMD. Amd phenom x4 quad-core and amd phenom x3 triplecore processors. http://www.amd.com/us-en/Processors/ProductInformation/0,,30 118 15331 15%332,00.html.Google Scholar
- BLACKBURN, S. M., CHENG, P., AND MCKINLEY, K. S. Myths and realities: The performance impact of garbage collection. In Proceedings of the ACM Conference on Measurement&Modeling Computer Systems (2004), ACM Press, pp. 25--36. Google Scholar
Digital Library
- BLACKBURN, S. M., GARNER, R., HOFFMAN, C., KHAN, A. M., MCKINLEY, K. S., BENTZUR, R., DIWAN, A., FEINBERG, D., FRAMPTON, D., GUYER, S. Z., HIRZEL, M., HOSKING, A., JUMP, M., LEE, H., MOSS, J. E. B., PHANSALKAR, A., STEFANOVIĆ, D., VANDRUNEN, T., VON DINCKLAGE, D., AND WIEDERMANN, B. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA '06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-Oriented Programing, Systems, Languages, and Applications (New York, NY, USA, Oct. 2006), ACMPress, pp. 169--190. Google Scholar
Digital Library
- BURGER, D., GOODMAN, J. R., AND KAGI, A. Memory bandwidth limitations of future microprocessors. In ISCA (1996). Google Scholar
Digital Library
- CHEREM, S., AND RUGINA, R. Uniqueness inference for compile-time object deallocation. In ISMM (2007). Google Scholar
Digital Library
- CORP., I. Intel microarchitecture (nehalem). http://www.intel.com/technology/architecturesilicon/nextgen/index.htm.Google Scholar
- DILLIG, I., DILLIG, T., YAHAV, E., AND CHANDRA, S. The closer: Automating resource management in java. In ISMM (2008). Google Scholar
Digital Library
- FENICHEL, R. R., AND YOCHELSON, J. C. A lisp garbagecollector for virtual memory computer systems. Communications of the ACM (1969). Google Scholar
Digital Library
- GANESH, B., JALEEL, A., WANG, D., AND JACOB, B. Fully-buffered DIMM memory architectures: Understanding mechanisms, overheads and scaling. In HPCA (2007). Google Scholar
Digital Library
- GEORGES, A., BUYTAERT, D., AND EECKHOUT, L. Statistically rigorous java performance evaluation. SIGPLAN Not. 42, 10 (2007), 57--76. Google Scholar
Digital Library
- HAMMOND, L., NAYFEH, B. A., AND OLUKOTUN, K. A single-chip multiprocessor. Computer 30, 9 (1997), 79--85. Google Scholar
Digital Library
- HOFSTEE, H. Power efficient processor architecture and the cell processor. High-Performance Computer Architecture, 2005. HPCA-11. 11th International Symposium on (12-16 Feb. 2005), 258--262. Google Scholar
Digital Library
- HUANG, W., QIAN, Y., SRISA-AN, W., AND CHANG, J. Object allocation and memory contention study of java multithreaded applications. Performance, Computing, and Communications, 2004 IEEE International Conference on (2004), 375--382.Google Scholar
- IBM CORP. http://www.ibm.com/systems/bladecenter/hardware/servers/hs21/index.html.Google Scholar
- INTEL CORP. http://processorfinder.intel.com/details.aspx?sspec=slac5.Google Scholar
- INTEL CORP. http://www.intel.com/Products/Server/Chipsets/5000P/5000Poverview.htm.Google Scholar
- IYER, R., BHAT, M., ZHAO, L., ILLIKKAL, R., MAKINENI, S., JONES, M., SHIV, K., AND NEWELL, D. Exploring smallscale and large-scale cmp architectures for commercial javaservers. IEEE Workload Characterization Symposium 0 (2006), 191--200.Google Scholar
- JOISHA, P. G. A principled approach to nondeferred reference-counting garbage collection. In VEE (2008). Google Scholar
Digital Library
- KONGETIRA, P., AINGARAN, K., AND OLUKOTUN., K. Niagara: A 32-way multithreaded sparc processor. In IEEE Micro (2005). Google Scholar
Digital Library
- LARRY, M., AND CARL, S. lmbench: Portable tools for performance analysis. Proceedings of the USENIX 1996 Annual Technical Conference (1996). Google Scholar
Digital Library
- LEVON, J., AND ELIE., P. Oprofile: A system profiler for linux.Google Scholar
- LIEBERMAN, H., AND HEWITT, C. A realtime garbage collector based on the lifetimes of objects. Communications of the ACM (1983). Google Scholar
Digital Library
- LUO, Y., AND JOHN, L. K. Simulating java commercial throughput workload: A case study. In ICCD (2005). Google Scholar
Digital Library
- MARDEN, M., LIEN LU, S., LAI, K., AND LIPASTI, M. Comparison of memory system behavior in java and nonjava commercial workloads. In Proceedings of the Workshop on Computer Architecture Evaluation using Commercial Workloads (2002).Google Scholar
- MCCALPIN, J. D. Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter (Dec. 1995), 19--25.Google Scholar
- MCCARTHY, J. Recursive functions of symbolic expressions and their computation by machine. Communications of the ACM (1960). Google Scholar
Digital Library
- PERSSON, M. Java technology, IBM style: Garbage collection policies. IBM developerWorks (2006).Google Scholar
- SESHADRI, P., AND JOHN, L. K. Workload characterization of java server applications on two powerpc processors. In In Proceedings of the Third Annual Austin Center for Advanced Studies Conference (2002), pp. 328--333.Google Scholar
- SHAHAM, R., KOLODNER, E. K., AND SAGIV, M. Heap profiling for space-efficient java. In PLDI '01: Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation (New York, NY, USA, 2001), ACM, pp. 104--113. Google Scholar
Digital Library
- SHANKAR, A., ARNOLD, M., AND BODIK, R. Jolt: lightweight dynamic analysis and removal of object churn. In OOPSLA '08: Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications (New York, NY, USA, 2008), ACM, pp. 127--142. Google Scholar
Digital Library
- SHIV, K., CHOW, K., WANG, Y., AND PETROCHENKO, D. Specjvm2008 performance characterization. In SPEC Benchmark Workshop (2009), pp. 17--35. Google Scholar
Digital Library
- SHIV, K., IYER, R., BHAT, M., ILLIKKAL, R., JONES, M., MAKINENI, S., DOMER, J., AND NEWELL, D. Addressing cache/memory overheads in enterprise java cmp servers. In ISSWC (2007). Google Scholar
Digital Library
- SPEC. SPECjbb2005 (Java Server Benchmark). http://www.spec.org/jbb2005/.Google Scholar
- SPEC. SPECjvm2008 Benchmarks. http://www.spec.org/jvm2008/docs/benchmarks/index.html.Google Scholar
- SPEC. SPECjvm2008 (Java Virtual Machine Benchmark). http://www.spec.org/jvm2008/.Google Scholar
- SPRACKLEN, L., AND ABRAHAM, S. G. Chip multithreading: Opportunities and challenges. In HPCA (2005). Google Scholar
Digital Library
- SUN MICROSYSTEMS. Tuning Garbage Collection with the 5.0 Java Virtual Machine.Google Scholar
- TIKIR, M. M., AND HOLLINGSWORTH, J. K. Numa-aware java heaps for server applications. In IPDPS '05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers (Washington, DC, USA, 2005), IEEE Computer Society, p. 108.2. Google Scholar
Digital Library
- TSENG, J. H., YU, H., NAGAR, S., DUBEY, N., FRANKE, H., PATTNAIK, P., INOUE, H., AND NAKATANI, T. Performance studies of commercial workloads on a multi-core system. In IISWC (2007). Google Scholar
Digital Library
- TUCK, N., AND TULLSEN, D. M. Initial observations of the simultaneous multithreading pentium 4 processor. Parallel Architectures and Compilation Techniques, International Conference on 0 (2003), 26. Google Scholar
Digital Library
- UNGAR, D. Generation scavenging: a non-disruptive high performance storage reclamation algorithm. In ACM SIGSOFT Software Engineering Notes (1984). Google Scholar
Digital Library
- VOGT, P. D. Fully buffered DIMM (FB-DIMM) server memory architecture: Capacity, performance, reliability, and longevity. Intel Developer Forum (2004).Google Scholar
- WULF, W. A., AND MCKEE, S. A. Hitting the memory wall: implications of the obvious. ACM SIGARCH Computer Architecture News (1995). Google Scholar
Digital Library
- XIAN, F., SRISA-AN, W., AND JIANG, H. Microphase: An approach to proactively invoking garbageGoogle Scholar
Index Terms
Allocation wall: a limiting factor of Java applications on emerging multi-core platforms
Recommendations
Allocation wall: a limiting factor of Java applications on emerging multi-core platforms
OOPSLA '09Multi-core processors are widely used in computer systems. As the performance of microprocessors greatly exceeds that of memory, the memory wall becomes a limiting factor. It is important to understand how the large disparity of speed between processor ...
Preliminary experiences with the uintah framework on Intel Xeon Phi and stampede
XSEDE '13: Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to DiscoveryIn this work, we describe our preliminary experiences on the Stampede system in the context of the Uintah Computational Framework. Uintah was developed to provide an environment for solving a broad class of fluid-structure interaction problems on ...
Component Allocation Optimization for Heterogeneous CPU-GPU Embedded Systems
SEAA '14: Proceedings of the 2014 40th EUROMICRO Conference on Software Engineering and Advanced ApplicationsIn a quest to improve system performance, embedded systems are today increasingly relying on heterogeneous platforms that combine different types of processing units such as CPUs, GPUs and FPGAs. However, having better hardware capability alone does not ...







Comments