Abstract
On contemporary cache-coherent Non-Uniform Memory Access (ccNUMA) architectures, applications with a large memory footprint suffer from the cost of the garbage collector (GC), because, as the GC scans the reference graph, it makes many remote memory accesses, saturating the interconnect between memory nodes. We address this problem with NumaGiC, a GC with a mostly-distributed design. In order to maximise memory access locality during collection, a GC thread avoids accessing a different memory node, instead notifying a remote GC thread with a message; nonetheless, NumaGiC avoids the drawbacks of a pure distributed design, which tends to decrease parallelism. We compare NumaGiC with Parallel Scavenge and NAPS on two different ccNUMA architectures running on the Hotspot Java Virtual Machine of OpenJDK 7. On Spark and Neo4j, two industry-strength analytics applications, with heap sizes ranging from 160GB to 350GB, and on SPECjbb2013 and SPECjbb2005, ourgc improves overall performance by up to 45% over NAPS (up to 94% over Parallel Scavenge), and increases the performance of the collector itself by up to 3.6x over NAPS (up to 5.4x over Parallel Scavenge).
- T. A. Anderson. Optimizations in a private nursery-based garbage collector. In ISMM '10, pages 21--30. ACM, 2010. Google Scholar
Digital Library
- A. W. Appel. Simple generational garbage collection and fast allocation. SP&E, 19(2):171--183, 1989. Google Scholar
Digital Library
- A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schupbach, and A. Singhania. The multikernel: a new OS architecture for scalable multicore systems. In SOSP '09, pages 29--44. ACM, 2009. Google Scholar
Digital Library
- S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovic, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA '06, pages 169--190. ACM, 2006. Google Scholar
Digital Library
- K. M. Chandy and L. Lamport. Distributed snapshots: Determining global states of distributed systems. TOCS, 3(1):63--75, 1985. Google Scholar
Digital Library
- M. Dashti, A. Fedorova, J. Funston, F. Gaud, R. Lachaize, B. Lepers, V. Quema, and M. Roth. Traffic management: A holistic approach to memory placement on NUMA systems. In ASPLOS '13, pages 381--394. ACM, 2013. Google Scholar
Digital Library
- D. Doligez and X. Leroy. A concurrent, generational garbage collector for a multithreaded implementation of ml. In POPL '93, pages 113--123. ACM, 1993. Google Scholar
Digital Library
- Friendster. SNAP: network datasets: Friendster social network. http://snap.stanford.edu/data/com-Friendster.html, 2014.Google Scholar
- L. Gidra, G. Thomas, J. Sopena, and M. Shapiro. Assessing the scalability of garbage collectors on many cores. In SOSP Workshop on Programming Languages and Operating Systems, PLOS '11, pages 1--5. ACM, 2011. Google Scholar
Digital Library
- L. Gidra, G. Thomas, J. Sopena, and M. Shapiro. A study of the scalability of stop-the-world garbage collectors on multicores. In ASPLOS '13, pages 229--240. ACM, 2013. Google Scholar
Digital Library
- H2. H2 database engine. http://www.h2database.com/, 2014.Google Scholar
- M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann, 2008. Google Scholar
Digital Library
- R. Jones, A. Hosking, and E. Moss. The garbage collection handbook: the art of automatic memory management. Chapman & Hall/CRC, 1st edition, 2011. Google Scholar
Digital Library
- H. Lieberman and C. Hewitt. A real-time garbage collector based on the lifetimes of objects. CACM, 26(6):419--429, 1983. Google Scholar
Digital Library
- LinuxMemPolicy. What is Linux memory policy? http://www.kernel.org/doc/Documentation/vm/numa_memory_policy.txt, 2014.Google Scholar
- S. Marlow and S. Peyton Jones. Multicore garbage collection with local heaps. In ISMM '11, pages 21--32. ACM, 2011. Google Scholar
Digital Library
- Neo4j. Neo4j -- the world's leading graph database. http://www.neo4j.org, 2014.Google Scholar
- E. B. Nightingale, O. Hodson, R. McIlroy, C. Hawblitzel, and G. Hunt. Helios: heterogeneous multiprocessing with satellite kernels. In SOSP '09, pages 221--234. ACM, 2009. Google Scholar
Digital Library
- T. Ogasawara. NUMA-aware memory manager with dominant-thread-based copying GC. In OOPSLA '09, pages 377--390. ACM, 2009. Google Scholar
Digital Library
- K. Sivaramakrishnan, L. Ziarek, and S. Jagannathan. Eliminating read barriers through procrastination and cleanliness. In ISMM '12, pages 49--60. ACM, 2012. Google Scholar
Digital Library
- P. Sobalvarro. A lifetime-based garbage collector for LISP systems on general-purpose computers. Technical report, Cam- bridge, MA, USA, 1988. Google Scholar
Digital Library
- X. Song, H. Chen, R. Chen, Y. Wang, and B. Zang. A case for scaling applications to many-core with OS clustering. In EuroSys '11, pages 61--76. ACM, 2011. Google Scholar
Digital Library
- Spark. Apache Spark-- lightning-fast cluster computing. http://spark.apache.org, 2014.Google Scholar
- SPECjbb2005. SPECjbb2005 home page. http://www.spec.org/jbb2005/, 2014.Google Scholar
- SPECjbb2013. SPECjbb2013 home page. http://www.spec.org/jbb2013/, 2014.Google Scholar
- I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In KDD '12, pages 1222--1230. ACM, 2012. Google Scholar
Digital Library
- B. Steensgaard. Thread-specific heaps for multi-threaded programs. In ISMM '00, pages 18--24. ACM, 2000. Google Scholar
Digital Library
- M. M. Tikir and J. K. Hollingsworth. NUMA-aware Java heaps for server applications. In IPDPS '05, pages 108--117. IEEE Computer Society, 2005. Google Scholar
Digital Library
- D. Ungar. Generation scavenging: A non-disruptive high performance storage reclamation algorithm. In SDE '84, pages 157--167. ACM, 1984. Google Scholar
Digital Library
- P. R. Wilson and T. G. Moher. A "card-marking" scheme for controlling intergenerational references in generation-based garbage collection on stock hardware. SIGPLAN Notice, 24 (5):87--92, 1989. Google Scholar
Digital Library
- J. Zhou and B. Demsky. Memory management for many-core processors with software configurable locality policies. In ISMM '12, pages 3--14. ACM, 2012. Google Scholar
Digital Library
Index Terms
NumaGiC: a Garbage Collector for Big Data on Big NUMA Machines
Recommendations
A study of the scalability of stop-the-world garbage collectors on multicores
ASPLOS '13Large-scale multicore architectures create new challenges for garbage collectors (GCs). In particular, throughput-oriented stop-the-world algorithms demonstrate good performance with a small number of cores, but have been shown to degrade badly beyond ...
NumaGiC: a Garbage Collector for Big Data on Big NUMA Machines
ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating SystemsOn contemporary cache-coherent Non-Uniform Memory Access (ccNUMA) architectures, applications with a large memory footprint suffer from the cost of the garbage collector (GC), because, as the GC scans the reference graph, it makes many remote memory ...
NumaGiC: a Garbage Collector for Big Data on Big NUMA Machines
ASPLOS '15On contemporary cache-coherent Non-Uniform Memory Access (ccNUMA) architectures, applications with a large memory footprint suffer from the cost of the garbage collector (GC), because, as the GC scans the reference graph, it makes many remote memory ...







Comments