Abstract
Managed languages typically use read barriers to interpret forwarding pointers introduced to keep track of copied objects. For example, in a multicore environment with thread-local heaps and a global, shared heap, an object initially allocated on a local heap may be copied to a shared heap if it becomes the source of a store operation whose target location resides on the shared heap. As part of the copy operation, a forwarding pointer may be established in the original object to point to the copied object. This level of indirection avoids the need to update all of the references to the object that has been copied.
In this paper, we consider the design of a managed runtime that eliminates read barriers. Our design is premised on the availability of a sufficient degree of concurrency to stall operations that would otherwise necessitate the copy. Stalled actions are deferred until the next local collection, avoiding exposing forwarding pointers to the mutator. In certain important cases, procrastination is unnecessary -- lightweight runtime techniques can sometimes be used to allow objects to be eagerly copied when their set of incoming references is known, or when it can be determined that having multiple copies would not violate program semantics.
We evaluate our techniques on 3 platforms: a 16-core AMD64 machine, a 48-core Intel SCC, and an 864-core Azul Vega 3. Experimental results over a range of parallel benchmarks indicate that our approach leads to notable performance gains (20 - 32% on average) without incurring any additional complexity.
- T. A. Anderson. Optimizations in a Private Nursery-based Garbage Collector. In ISMM, pages 21--30, 2010. Google Scholar
Digital Library
- A. W. Appel. Simple Generational Garbage Collection and Fast Allocation. Software Practice and Experience, 19: 171--183, February 1989. Google Scholar
Digital Library
- S. Auhagen, L. Bergstrom, M. Fluet, and J. Reppy. Garbage Collection for Multicore NUMA Machines. In Workshop on Memory Systems Performance and Correctness, pages 51--57, 2011. Google Scholar
Digital Library
- D. F. Bacon, P. Cheng, and V. T. Rajan. A Real-Time Garbage Collector with Low Overhead and Consistent Utilization. In POPL, pages 285--298, 2003. Google Scholar
Digital Library
- H. G. Baker, Jr. List Processing in Real Time on a Serial Computer. Communication of the ACM, 21: 280--294, 1978. Google Scholar
Digital Library
- S. M. Blackburn and A. L. Hosking. Barriers: Friend or Foe? In ISMM, pages 143--151, 2004. Google Scholar
Digital Library
- H. Boehm. A Garbage Collector for C and C, 2012. URL http://www.hpl.hp.com/personal/Hans_Boehm/gcGoogle Scholar
- R. A. Brooks. Trading Data Space for Reduced Time and Code Space in Real-Time Garbage Collection on Stock Hardware. In Lisp and Functional Programming, pages 256--262, 1984. Google Scholar
Digital Library
- D. Doligez and X. Leroy. A Concurrent, Generational Garbage Collector for a Multithreaded Implementation of ML. In POPL, pages 113--123, 1993. Google Scholar
Digital Library
- L. Gidra, G. Thomas, J. Sopena, and M. Shapiro. Assessing the scalability of garbage collectors on many cores. SIGOPS Operating Systems Review, 45 (3): 15--19, 2012. Google Scholar
Digital Library
- P. Hartel, M. Feeley, M. Alt, and L. Augustsson. Benchmarking Implementations of Functional Languages with "Pseudoknot", a Float-Intensive Benchmark. Journal of Functional Programming, 6 (4): 621--655, 1996.Google Scholar
Cross Ref
- Intel. SCC Platform Overview, 2012. URL http://communities.intel.com/docs/DOC-5512.Google Scholar
- R. Jones and A. C. King. A Fast Analysis for Thread-Local Garbage Collection with Dynamic Class Loading. In International Workshop on Source Code Analysis and Manipulation, pages 129--138, 2005. Google Scholar
Digital Library
- S. Marlow and S. Peyton Jones. Multicore Garbage Collection with Local Heaps. In ISMM, pages 21--32, 2011. Google Scholar
Digital Library
- R. Milner, M. Tofte, and D. Macqueen. The Definition of Standard ML. MIT Press, Cambridge, MA, USA, 1997. Google Scholar
Digital Library
- MLton. The MLton Compiler and Runtime System, 2012. URL http://www.mlton.org.Google Scholar
- MultiMLton. MLton for Scalable Multicore Architectures, 2012. URL http://multimlton.cs.purdue.edu.Google Scholar
- J. Reppy. Concurrent Programming in ML. Cambridge University Press, 2007. Google Scholar
Digital Library
- P. M. Sansom. Dual-Mode Garbage Collection. In Proceedings of the Workshop on the Parallel Implementation of Functional Languages, pages 283--310, 1991.Google Scholar
- F. Siebert. Limits of parallel marking garbage collection. In ISMM, pages 21--29, 2008. Google Scholar
Digital Library
- G. L. Steele, Jr. Multiprocessing Compactifying Garbage Collection. Communcations of the ACM, 18: 495--508, September 1975. Google Scholar
Digital Library
- B. Steensgaard. Thread-Specific Heaps for Multi-Threaded Programs. In ISMM, pages 18--24, 2000. Google Scholar
Digital Library
- Streambench. The STREAM Benchmark: Computer Memory Bandwidth, 2012. URL http://http://www.streambench.org/.Google Scholar
- L. Ziarek, K. Sivaramakrishnan, and S. Jagannathan. Composable Asynchronous Events. In PLDI, pages 628--639, 2011. Google Scholar
Digital Library
Index Terms
Eliminating read barriers through procrastination and cleanliness
Recommendations
Eliminating read barriers through procrastination and cleanliness
ISMM '12: Proceedings of the 2012 international symposium on Memory ManagementManaged languages typically use read barriers to interpret forwarding pointers introduced to keep track of copied objects. For example, in a multicore environment with thread-local heaps and a global, shared heap, an object initially allocated on a ...
Optimizations in a private nursery-based garbage collector
ISMM '10: Proceedings of the 2010 international symposium on Memory managementThis paper describes a garbage collector designed around the use of permanent, private, thread-local nurseries and is principally oriented towards functional languages. We try to maximize the cache hit rate by having threads continually reuse their ...
A fully concurrent garbage collector for functional programs on multicore processors
ICFP 2016: Proceedings of the 21st ACM SIGPLAN International Conference on Functional ProgrammingThis paper presents a concurrent garbage collection method for functional programs running on a multicore processor. It is a concurrent extension of our bitmap-marking non-moving collector with Yuasa's snapshot-at-the-beginning strategy. Our collector ...







Comments