Abstract
Parallel marking algorithms use multiple threads to walk through the object heap graph and mark each reachable object as live. Parallel marker threads mark an object "live" by atomically setting a bit in a mark-bitmap or a bit in the object header. Most of these parallel algorithms strive to improve the marking throughput by using work-stealing algorithms for load-balancing and to ensure that all participating threads are kept busy. A purely "processor-centric" load-balancing approach in conjunction with a need to atomically set the mark bit, results in significant contention during parallel marking. This limits the scalability and throughput of parallel marking algorithms.
We describe a new non-blocking and lock-free, work-sharing algorithm, the primary goal being to reduce contention during atomic updates of the mark-bitmap by parallel task-threads. Our work-sharing mechanism uses the address of a word in the mark-bitmap as the key to stripe work among parallel task-threads, with only a subset of the task-threads working on each stripe. This filters out most of the contention during parallel marking with 20% improvements in performance.
In case of concurrent and on-the-fly collector algorithms, mutator threads also generate marking-work for the marking task-threads. In these schemes, mutator threads are also provided with thread-local marking stacks where they collect references to potentially "gray" objects, i.e., objects that haven't been "marked-through" by the collector. We note that since this work is generated by mutators when they reference these objects, there is a high likelihood that these objects continue to be present in the processor cache. We describe and evaluate a scheme to distribute mutator generated marking work among the collector's task-threads that is cognizant of the processor and cache topology. We prototype both our algorithms within the C4 [28] collector that ships as part of an industrial strength JVM for the Linux-X86 platform.
- Intel® 64 and ia-32 architectures developer's manual: Combined volumes,. URL http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf.Google Scholar
- Intel® 64 architecture processor topology enumeration,. URL http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/.Google Scholar
- Standard performance evaluation corporation. spec jvm98. URL http://www.spec.org/jvm98/.Google Scholar
- The volano benchmark. URL http://www.volano.com/benchmarks.html.Google Scholar
- N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. In SPAA, pages 119--129, 1998. Google Scholar
Digital Library
- H. Azatchi, Y. Levanoni, H. Paz, and E. Petrank. An on-the-fly mark and sweep garbage collector based on sliding views. pages 269--281. 10.1145/949305.949329. Google Scholar
Digital Library
- K. Barabash, O. Ben-Yitzhak, I. Goft, E. K. Kolodner, V. Leikehman, Y. Ossia, A. Owshanko, and E. Petrank. A parallel, incremental, mostly concurrent garbage collector for servers. ACM Trans. Program. Lang. Syst., 27 (6): 1097--1146, 2005. Google Scholar
Digital Library
- VanDrunen, von Dincklage, and Wiedermann}dacapoS. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA '06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-Oriented Programing, Systems, Languages, and Applications, pages 169--190, New York, NY, USA, Oct. 2006. ACM Press. http://doi.acm.org/10.1145/1167473.1167488. Google Scholar
Digital Library
- H.-J. Boehm. Reducing garbage collector cache misses. pages 59--64. Google Scholar
Digital Library
- C.-Y. Cher, A. L. Hosking, and T. Vijaykumar. Software prefetching for mark-sweep garbage collection: Hardware analysis and software redesign. pages 199--210. 10.1145/1024393.1024417. Google Scholar
Digital Library
- C. Click, G. Tene, and M. Wolf. The pauseless gc algorithm. In Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments, VEE '05, pages 46--56, New York, NY, USA, 2005. ACM. ISBN 1-59593-047-7. URL http://doi.acm.org/10.1145/1064979.1064988. Google Scholar
Digital Library
- D. Detlefs and T. Printezis. A Generational Mostly-concurrent Garbage Collector. Technical report, Mountain View, CA, USA, 2000. Google Scholar
Digital Library
- D. Detlefs, C. H. Flood, S. Heller, and T. Printezis. Garbage-first garbage collection. In ISMM, pages 37--48, 2004. Google Scholar
Digital Library
- E. W. Dijkstra, L. Lamport, A. J. Martin, C. S. Scholten, and E. F. M. Steffens. On-the-fly garbage collection: An exercise in cooperation. In Language Hierarchies and Interfaces: International Summer School, volume 46, pages 43--56. Marktoberdorf, Germany, 1976. Google Scholar
Digital Library
- T. Domani, E. K. Kolodner, and E. Petrank. A generational on-the-fly garbage collector for java. In PLDI, pages 274--284, 2000. Google Scholar
Digital Library
- U. Drepper. What every programmer should know about memory. URL http://www.akkadia.org/drepper/cpumemory.pdf.Google Scholar
- T. Endo and K. Taura. Reducing pause time of conservative collectors. In MSP/ISMM, pages 119--131, 2002. Google Scholar
Digital Library
- T. Endo, K. Taura, and A. Yonezawa. Predicting scalability of parallel garbage collectors on shared memory multiprocessors. In Proceedings of the 15th International Parallel & Distributed Processing Symposium, IPDPS '01, pages 43--, Washington, DC, USA, 2001. IEEE Computer Society. ISBN 0-7695-0990-8. URL http://dl.acm.org/citation.cfm?id=645609.662496. Google Scholar
Digital Library
- C. H. Flood, D. Detlefs, N. Shavit, and X. Zhang. Parallel garbage collection for shared memory multiprocessors. In Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1, JVM'01, pages 21--21, Berkeley, CA, USA, 2001. USENIX Association. URL http://dl.acm.org/citation.cfm?id=1267847.1267868. Google Scholar
Digital Library
- R. Garner, S. M. Blackburn, and D. Frampton. Effective prefetch for mark-sweep garbage collection. In ISMM, pages 43--54, 2007. Google Scholar
Digital Library
- R. H. Halstead. Multilisp: A language for concurrent symbolic computation. ACM Trans. Prog. Lang. Syst., 7 (4): 501--538, Oct. 1985. 10.1145/4472.4478. Google Scholar
Digital Library
- R. Jones and C. Ryder. A study of Java object demographics. pages 121--130. 10.1145/1375634.1375652. Google Scholar
Digital Library
- R. Jones, A. Hosking, and E. Moss. The Garbage Collection Handbook: The Art of Automatic Memory Management. CRC Applied Algorithms and Data Structures. Chapman & Hall, Aug. 2011. ISBN 978-1420082791. Google Scholar
Digital Library
- T. Ogasawara. Numa-aware memory manager with dominant-thread-based copying gc. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, OOPSLA '09, pages 377--390, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-766-0. URL http://doi.acm.org/10.1145/1640089.1640117. Google Scholar
Digital Library
- Y. Ossia, O. Ben-Yitzhak, I. Goft, E. K. Kolodner, V. Leikehman, and A. Owshanko. A parallel, incremental and concurrent GC for servers. pages 129--140. 10.1145/512529.512546. Google Scholar
Digital Library
- F. Siebert. Limits of parallel marking garbage collection. In Proceedings of the 7th international symposium on Memory management, ISMM '08, pages 21--29, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-134-7. http://doi.acm.org/10.1145/1375634.1375638. URL http://doi.acm.org/10.1145/1375634.1375638. Google Scholar
Digital Library
- F. Siebert. Concurrent, parallel, real-time garbage-collection. In Proceedings of the 2010 international symposium on Memory management, ISMM '10, pages 11--20, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0054-4. http://doi.acm.org/10.1145/1806651.1806654. URL http://doi.acm.org/10.1145/1806651.1806654. Google Scholar
Digital Library
- G. Tene, B. Iyengar, and M. Wolf. C4: the continuously concurrent compacting collector. In Proceedings of the international symposium on Memory management, ISMM '11, pages 79--88, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0263-0. URL http://doi.acm.org/10.1145/1993478.1993491. Google Scholar
Digital Library
- M. M. Tikir and J. K. Hollingsworth. Numa-aware java heaps for server applications. IPDPS '05, pages 108.2--. IEEE Computer Society. ISBN 0-7695-2312-9. URL http://dx.doi.org/10.1109/IPDPS.2005.299. Google Scholar
Digital Library
- M. Wu and X.-F. Li. Task-pushing: a scalable parallel gc marking algorithm without synchronization operations. In IPDPS, pages 1--10, 2007.Google Scholar
Cross Ref
Index Terms
Scalable concurrent and parallel mark
Recommendations
Scalable concurrent and parallel mark
ISMM '12: Proceedings of the 2012 international symposium on Memory ManagementParallel marking algorithms use multiple threads to walk through the object heap graph and mark each reachable object as live. Parallel marker threads mark an object "live" by atomically setting a bit in a mark-bitmap or a bit in the object header. Most ...
Lock-free parallel and concurrent garbage collection by mark&sweep
This paper presents a lock-free algorithm for mark&sweep garbage collection (GC) in a realistic model using synchronization primitives load-linked/store-conditional (LL/SC) or compare-and-swap (CAS) offered by machine architectures. The algorithm is ...
Parallelism generics for Ada 2005 and beyond
SIGAda '10: Proceedings of the ACM SIGAda annual international conference on SIGAdaThe Ada programming language is seemingly well-positioned to take advantage of emerging multi-core technologies. While it has always been possible to write parallel algorithms in Ada, there are certain classes of problems however, where the level of ...







Comments