skip to main content
research-article

Memory management for many-core processors with software configurable locality policies

Authors Info & Claims
Published:15 June 2012Publication History
Skip Abstract Section

Abstract

As processors evolve towards higher core counts, architects will develop more sophisticated memory systems to satisfy the cores' increasing thirst for memory bandwidth. Early many-core processor designs suggest that future memory systems will likely include multiple controllers and distributed cache coherence protocols. Many-core processors that expose memory locality policies to the software system provide opportunities for automatic tuning that can achieve significant performance benefits.

Managed languages typically provide a simple heap abstraction. This paper presents techniques that bridge the gap between the simple heap abstraction of modern languages and the complicated memory systems of future processors. We present a NUMA-aware approach to garbage collection that balances the competing concerns of data locality and heap utilization to improve performance. We combine a lightweight approach for measuring an application's memory behavior with an online, adaptive algorithm for tuning the cache to optimize it for the specific application's behaviors.

We have implemented our garbage collector and cache tuning algorithm and present results on a 64-core TILEPro64 processor.

References

  1. nobench. http://www.cs.york.ac.uk/fp/nobench/, 2007.Google ScholarGoogle Scholar
  2. http://www.spec.org/jbb2005/, 2011.Google ScholarGoogle Scholar
  3. D. Abuaiadh, Y. Ossia et al. An efficient parallel heap compaction algorithm. In OOPSLA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. A. Anderson. Optimizations in a private nursery-based garbage collector. In ISMM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Attanasio, D. Bacon et al. A comparative evaluation of parallel garbage collector implementations. In LCPC, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. F. Bacon, C. R. Attanasio et al. Java without the coffee breaks: A non-intrusive multiprocessor garbage collector. In PLDI, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. Barabash and E. Petrank. Tracing garbage collection on highly parallel platforms. In ISMM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. D. Berger, K. S. McKinley et al. Hoard: A scalable memory allocator for multithreaded applications. In ASPLOS, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. M. Blackburn, R. L. Hudson et al. Starting with termination: A methodology for building distributed garbage collection algorithms. In ACSC, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. M. Blackburn and K. S. McKinley. Immix: A mark-region garbage collector with space efficiency, fast collection, and tutator performance. In PLDI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. Boehm. GCBench. http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_bench.html, 1997.Google ScholarGoogle Scholar
  12. B. Cahoon and K. S. McKinley. Data flow analysis for software prefetching linked data structures in Java. In PACT, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. M. Chandy and L. Lamport. Distributed snapshots: Determining global states of distributed systems. TOCS, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Cheng and G. E. Blelloch. A parallel, real-time garbage collector. In PLDI, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C.-Y. Cher and M. Gschwind. Cell GC: Using the Cell synergistic processor as a garbage collection coprocessor. In VEE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Click, G. Tene et al. The pauseless GC algorithm. In VEE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Endo, K. Taura et al. A scalable mark-sweep garbage collector on large-scale shared-memory machines. In SC, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. H. Flood, D. Detlefs et al. Parallel garbage collection for shared memory multiprocessors. In JVM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Gay and A. Aiken. Memory management with explicit regions. In PLDI, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. H. Halstead, Jr. MULTILISP: A language for concurrent symbolic computation. TOPLAS, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. N. Hardavellas, M. Ferdman et al. Reactive NUCA: Near-optimal block placement and replication in distributed caches. In ISCA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Imai and E. Tick. Evaluation of parallel copying garbage collection on a shared-memory multiprocessor. TPDS, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Single-chip Cloud Computer. http://techresearch.intel.com/UserFiles/en-us/File/SCC_Sympossium_Mar162010_GML_final.pdf, 2010.Google ScholarGoogle Scholar
  24. H. Kermany and E. Petrank. The Compressor: Concurrent, incremental, and parallel compaction. In PLDI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. X.-F. Li, L. Wang et al. A fully parallel LISP2 compactor with preservation of the sliding properties. In LCPC, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Marlow, T. Harris et al. Parallel generational-copying garbage collection with a block-structured heap. In ISMM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. E. Oancea, A. Mycroft et al. A new approach to parallelising tracing algorithms. In ISMM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Y. Ossia, O. Ben-Yitzhak et al. A parallel, incremental and concurrent GC for servers. In PLDI, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. W. Partain. The nofib benchmark suite of Haskell programs. In Proceedings of the 1992 Glasgow Workshop on Functional Programming, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Schüpbach, S. Peter et al. Embracing diversity in the Barrelfish manycore operating system. In MMCS, 2008.Google ScholarGoogle Scholar
  31. T. Sherwood, B. Calder et al. Reducing cache misses using hardware and software page placement. In ICS, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Y. Shuf, M. Gupta et al. Creating and preserving locality of Java applications at allocation and garbage collection times. In Proceedings of the 17th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. L. A. Smith, J. M. Bull et al. A parallel Java Grande benchmark suite. In SC, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Soman, C. Krintz et al. Dynamic selection of application-specific garbage collectors. In ISMM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. G. Tene, B. Iyengar et al. C4: The continuously concurrent compacting collector. In ISMM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Tilera. http://www.tilera.com/.Google ScholarGoogle Scholar
  37. D. Ungar and S. S. Adams. Hosting an object heap on manycore hardware: An exploration. In DLS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Memory management for many-core processors with software configurable locality policies

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 47, Issue 11
    ISMM '12
    November 2012
    136 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2426642
    Issue’s Table of Contents
    • cover image ACM Conferences
      ISMM '12: Proceedings of the 2012 international symposium on Memory Management
      June 2012
      152 pages
      ISBN:9781450313506
      DOI:10.1145/2258996

    Copyright © 2012 ACM

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 15 June 2012

    Check for updates

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!