ABSTRACT
Large-scale multicore architectures create new challenges for garbage collectors (GCs). In particular, throughput-oriented stop-the-world algorithms demonstrate good performance with a small number of cores, but have been shown to degrade badly beyond approximately 8 cores on a 48-core with OpenJDK 7. This negative result raises the question whether the stop-the-world design has intrinsic limitations that would require a radically different approach. Our study suggests that the answer is no, and that there is no compelling scalability reason to discard the existing highly-optimised throughput-oriented GC code on contemporary hardware. This paper studies the default throughput-oriented garbage collector of OpenJDK 7, called Parallel Scavenge. We identify its bottlenecks, and show how to eliminate them using well-established parallel programming techniques. On the SPECjbb2005, SPECjvm2008 and DaCapo 9.12 benchmarks, the improved GC matches the performance of Parallel Scavenge at low core count, but scales well, up to 48~cores.
References
- T. A. Anderson. Optimizations in a private nursery-based garbage collector. In ISMM '10, pages 21--30. ACM, 2010. Google Scholar
Digital Library
- A. W. Appel. Simple generational garbage collection and fast allocation. SP&E, 19 (2): 171--183, 1989. Google Scholar
Digital Library
- S. M. Blackburn and K. S. McKinley. Immix: a mark-region garbage collector with space efficiency, fast collection, and mutator performance. In PLDI '08, pages 22--32. ACM, 2008. Google Scholar
Digital Library
- S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA '06, pages 169--190. ACM, 2006. Google Scholar
Digital Library
- M. Dashti, A. Fedorova, J. Funston, F. Gaud, R. Lachaize, B. Lepers, V. Quema, and M. Roth. Traffic management: A holistic approach to memory placement on numa systems. In ASPLOS '13. ACM, 2013. Google Scholar
Digital Library
- D. Detlefs, C. Flood, S. Heller, and T. Printezis. Garbage-first garbage collection. In ISMM '04, pages 37--48. ACM, 2004. Google Scholar
Digital Library
- D. Doligez and X. Leroy. A concurrent, generational garbage collector for a multithreaded implementation of ml. In POPL '93, pages 113--123. ACM, 1993. Google Scholar
Digital Library
- C. H. Flood, D. Detlefs, N. Shavit, and X. Zhang. Parallel garbage collection for shared memory multiprocessors. In JVM '01, pages 21--21. USENIX Association, 2001. Google Scholar
Digital Library
- H. Franke and R. Russell M. K. Fuss, futexes and furwocks: Fast userlevel locking in linux. In Ottawa Linux Symposium, OLS '02, pages 479--495, 2002.Google Scholar
- L. Gidra, G. Thomas, J. Sopena, and M. Shapiro. Assessing the scalability of garbage collectors on many cores. In SOSP Workshop on Programming Languages and Operating Systems, PLOS '11, pages 1--5. ACM, 2011. Google Scholar
Digital Library
- B. Iyengar, G. Tene, M. Wolf, and E. Gehringer. The collie: a wait-free compacting collector. In ISMM '12, pages 61--72. ACM, 2012. Google Scholar
Digital Library
- R. Jones, A. Hosking, and E. Moss. The garbage collection handbook: the art of automatic memory management. Chapman & Hall/CRC, 1st edition, 2011. Google Scholar
Digital Library
- H. Lieberman and C. Hewitt. A real-time garbage collector based on the lifetimes of objects. CACM, 26 (6): 419--429, 1983. Google Scholar
Digital Library
- LinuxMemPolicy. What is linux memory policy? http://www.kernel.org/doc/Documentation/vm/numa_memory_policy.txt, 2012.Google Scholar
- J.-P. Lozi, F. David, G. Thomas, J. Lawall, and G. Muller. Remote Core Locking: migrating critical-section execution to improve the performance of multithreaded applications. In USENIX ATC '12, pages 65--76. USENIX Association, 2012. Google Scholar
Digital Library
- S. Marlow and S. Peyton Jones. Multicore garbage collection with local heaps. In ISMM '11, pages 21--32. ACM, 2011. Google Scholar
Digital Library
- S. Marlow, T. Harris, R. P. James, and S. Peyton Jones. Parallel generational-copying garbage collection with a block-structured heap. In ISMM '08, pages 11--20. ACM, 2008. Google Scholar
Digital Library
- M. M. Michael and M. L. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In PODC '96, pages 267--275. ACM, 1996. Google Scholar
Digital Library
- C. E. Oancea, A. Mycroft, and S. M. Watt. A new approach to parallelising tracing algorithms. In ISMM '09, pages 10--19. ACM, 2009. Google Scholar
Digital Library
- T. Ogasawara. NUMA-aware memory manager with dominant-thread-based copying GC. In OOPSLA '09, pages 377--390. ACM, 2009. Google Scholar
Digital Library
- OpenJDK Memory. Memory management in the Java hotspot#8482; virtual machine. Technical report, Sun Microsystems, 2006.Google Scholar
- F. Pizlo, D. Frampton, E. Petrank, and B. Steensgaard. Stopless: a real-time garbage collector for multiprocessors. In ISMM '07, pages 159--172. ACM, 2007. Google Scholar
Digital Library
- F. Pizlo, L. Ziarek, P. Maj, A. L. Hosking, E. Blanton, and J. Vitek. Schism: fragmentation-tolerant real-time garbage collection. In PLDI '10, pages 146--159. ACM, 2010. Google Scholar
Digital Library
- K. Sivaramakrishnan, L. Ziarek, and S. Jagannathan. Eliminating read barriers through procrastination and cleanliness. In ISMM '12, pages 49--60. ACM, 2012. Google Scholar
Digital Library
- SPECjbb2005. SPECjbb2005 home page. http://www.spec.org/jbb2005/, 2012.Google Scholar
- SPECjvm2008. SPECjvm2008 home page. http://www.spec.org/jvm2008/, 2012.Google Scholar
- B. Steensgaard. Thread-specific heaps for multi-threaded programs. In ISMM '00, pages 18--24. ACM, 2000. Google Scholar
Digital Library
- G. Tene, B. Iyengar, and M. Wolf. C4: the continuously concurrent compacting collector. In ISMM '11, pages 79--88. ACM, 2011. Google Scholar
Digital Library
- M. M. Tikir and J. K. Hollingsworth. NUMA-aware Java heaps for server applications. In IPDPS '05, pages 108--117. IEEE Computer Society, 2005. Google Scholar
Digital Library
- Tilera. TILE-Gx processor family. http://www.tilera.com/products/processors/TILE-Gx_Family, 2012.Google Scholar
- D. Ungar. Generation scavenging: A non-disruptive high performance storage reclamation algorithm. In SDE '84, pages 157--167. ACM, 1984. Google Scholar
Digital Library
- J. Zhou and B. Demsky. Memory management for many-core processors with software configurable locality policies. In ISMM '12, pages 3--14. ACM, 2012. Google Scholar
Digital Library
Index Terms
A study of the scalability of stop-the-world garbage collectors on multicores







Comments