Abstract
As processors evolve towards higher core counts, architects will develop more sophisticated memory systems to satisfy the cores' increasing thirst for memory bandwidth. Early many-core processor designs suggest that future memory systems will likely include multiple controllers and distributed cache coherence protocols. Many-core processors that expose memory locality policies to the software system provide opportunities for automatic tuning that can achieve significant performance benefits.
Managed languages typically provide a simple heap abstraction. This paper presents techniques that bridge the gap between the simple heap abstraction of modern languages and the complicated memory systems of future processors. We present a NUMA-aware approach to garbage collection that balances the competing concerns of data locality and heap utilization to improve performance. We combine a lightweight approach for measuring an application's memory behavior with an online, adaptive algorithm for tuning the cache to optimize it for the specific application's behaviors.
We have implemented our garbage collector and cache tuning algorithm and present results on a 64-core TILEPro64 processor.
- nobench. http://www.cs.york.ac.uk/fp/nobench/, 2007.Google Scholar
- http://www.spec.org/jbb2005/, 2011.Google Scholar
- D. Abuaiadh, Y. Ossia et al. An efficient parallel heap compaction algorithm. In OOPSLA, 2004. Google Scholar
Digital Library
- T. A. Anderson. Optimizations in a private nursery-based garbage collector. In ISMM, 2010. Google Scholar
Digital Library
- C. Attanasio, D. Bacon et al. A comparative evaluation of parallel garbage collector implementations. In LCPC, 2001. Google Scholar
Digital Library
- D. F. Bacon, C. R. Attanasio et al. Java without the coffee breaks: A non-intrusive multiprocessor garbage collector. In PLDI, 2001. Google Scholar
Digital Library
- K. Barabash and E. Petrank. Tracing garbage collection on highly parallel platforms. In ISMM, 2010. Google Scholar
Digital Library
- E. D. Berger, K. S. McKinley et al. Hoard: A scalable memory allocator for multithreaded applications. In ASPLOS, 2000. Google Scholar
Digital Library
- S. M. Blackburn, R. L. Hudson et al. Starting with termination: A methodology for building distributed garbage collection algorithms. In ACSC, 2001. Google Scholar
Digital Library
- S. M. Blackburn and K. S. McKinley. Immix: A mark-region garbage collector with space efficiency, fast collection, and tutator performance. In PLDI, 2008. Google Scholar
Digital Library
- H. Boehm. GCBench. http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_bench.html, 1997.Google Scholar
- B. Cahoon and K. S. McKinley. Data flow analysis for software prefetching linked data structures in Java. In PACT, 2001. Google Scholar
Digital Library
- K. M. Chandy and L. Lamport. Distributed snapshots: Determining global states of distributed systems. TOCS, 1985. Google Scholar
Digital Library
- P. Cheng and G. E. Blelloch. A parallel, real-time garbage collector. In PLDI, 2001. Google Scholar
Digital Library
- C.-Y. Cher and M. Gschwind. Cell GC: Using the Cell synergistic processor as a garbage collection coprocessor. In VEE, 2008. Google Scholar
Digital Library
- C. Click, G. Tene et al. The pauseless GC algorithm. In VEE, 2005. Google Scholar
Digital Library
- T. Endo, K. Taura et al. A scalable mark-sweep garbage collector on large-scale shared-memory machines. In SC, 1997. Google Scholar
Digital Library
- C. H. Flood, D. Detlefs et al. Parallel garbage collection for shared memory multiprocessors. In JVM, 2001. Google Scholar
Digital Library
- D. Gay and A. Aiken. Memory management with explicit regions. In PLDI, 1998. Google Scholar
Digital Library
- R. H. Halstead, Jr. MULTILISP: A language for concurrent symbolic computation. TOPLAS, 1985. Google Scholar
Digital Library
- N. Hardavellas, M. Ferdman et al. Reactive NUCA: Near-optimal block placement and replication in distributed caches. In ISCA, 2009. Google Scholar
Digital Library
- A. Imai and E. Tick. Evaluation of parallel copying garbage collection on a shared-memory multiprocessor. TPDS, 1993. Google Scholar
Digital Library
- Single-chip Cloud Computer. http://techresearch.intel.com/UserFiles/en-us/File/SCC_Sympossium_Mar162010_GML_final.pdf, 2010.Google Scholar
- H. Kermany and E. Petrank. The Compressor: Concurrent, incremental, and parallel compaction. In PLDI, 2006. Google Scholar
Digital Library
- X.-F. Li, L. Wang et al. A fully parallel LISP2 compactor with preservation of the sliding properties. In LCPC, 2008. Google Scholar
Digital Library
- S. Marlow, T. Harris et al. Parallel generational-copying garbage collection with a block-structured heap. In ISMM, 2008. Google Scholar
Digital Library
- C. E. Oancea, A. Mycroft et al. A new approach to parallelising tracing algorithms. In ISMM, 2009. Google Scholar
Digital Library
- Y. Ossia, O. Ben-Yitzhak et al. A parallel, incremental and concurrent GC for servers. In PLDI, 2002. Google Scholar
Digital Library
- W. Partain. The nofib benchmark suite of Haskell programs. In Proceedings of the 1992 Glasgow Workshop on Functional Programming, 1993. Google Scholar
Digital Library
- A. Schüpbach, S. Peter et al. Embracing diversity in the Barrelfish manycore operating system. In MMCS, 2008.Google Scholar
- T. Sherwood, B. Calder et al. Reducing cache misses using hardware and software page placement. In ICS, 1999. Google Scholar
Digital Library
- Y. Shuf, M. Gupta et al. Creating and preserving locality of Java applications at allocation and garbage collection times. In Proceedings of the 17th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, 2002. Google Scholar
Digital Library
- L. A. Smith, J. M. Bull et al. A parallel Java Grande benchmark suite. In SC, 2001. Google Scholar
Digital Library
- S. Soman, C. Krintz et al. Dynamic selection of application-specific garbage collectors. In ISMM, 2004. Google Scholar
Digital Library
- G. Tene, B. Iyengar et al. C4: The continuously concurrent compacting collector. In ISMM, 2011. Google Scholar
Digital Library
- Tilera. http://www.tilera.com/.Google Scholar
- D. Ungar and S. S. Adams. Hosting an object heap on manycore hardware: An exploration. In DLS, 2009. Google Scholar
Digital Library
Index Terms
Memory management for many-core processors with software configurable locality policies
Recommendations
Memory management for many-core processors with software configurable locality policies
ISMM '12: Proceedings of the 2012 international symposium on Memory ManagementAs processors evolve towards higher core counts, architects will develop more sophisticated memory systems to satisfy the cores' increasing thirst for memory bandwidth. Early many-core processor designs suggest that future memory systems will likely ...
From GPGPU to Many-Core: Nvidia Fermi and Intel Many Integrated Core Architecture
Comparing the architectures and performance levels of an Nvidia Fermi accelerator with an Intel MIC Architecture coprocessor demonstrates the benefit of the coprocessor for bringing highly parallel applications into, or even beyond, GPGPU performance ...
High performance in silico virtual drug screening on many-core processors
Drug screening is an important part of the drug development pipeline for the pharmaceutical industry. Traditional, lab-based methods are increasingly being augmented with computational methods, ranging from simple molecular similarity searches through ...







Comments