skip to main content
research-article

The myrmics memory allocator: hierarchical,message-passing allocation for global address spaces

Authors Info & Claims
Published:15 June 2012Publication History
Skip Abstract Section

Abstract

Constantly increasing hardware parallelism poses more and more challenges to programmers and language designers. One approach to harness the massive parallelism is to move to task-based programming models that rely on runtime systems for dependency analysis and scheduling. Such models generally benefit from the existence of a global address space. This paper presents the parallel memory allocator of the Myrmics runtime system, in which multiple allocator instances organized in a tree hierarchy cooperate to implement a global address space with dynamic region support on distributed memory machines. The Myrmics hierarchical memory allocator is step towards improved productivity and performance in parallel programming. Productivity is improved through the use of dynamic regions in a global address space, which provide a convenient shared memory abstraction for dynamic and irregular data structures. Performance is improved through scaling on manycore systems without system-wide cache coherency. We evaluate the stand-alone allocator on an MPI-based x86 cluster and find that it scales well for up to 512 worker cores, while it can outperform Unified Parallel C by a factor of 3.7-10.7x.

References

  1. C. Arens. The Bowyer-Watson Algorithm; An efficient Implementation in a Database Environment. Technical report, Delft University of Technology, January 2002.Google ScholarGoogle Scholar
  2. E. Ayguadé X. Teruel, P. Unnikrishnan, and G. Zhang. The Design of OpenMP Tasks. IEEE Transactions on Parallel and Distributed Systems, 20(3): 404--418, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. D. Berger, B. G. Zorn, and K. S. McKinley. Reconsidering Custom Memory Allocation. In OOPSLA '02: Proc. 2002 ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages and Applications, pages 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. E. D. Berger, K. S. McKinley, R. D. Blumofe, and P. R. Wilson. Hoard: A Scalable Memory Allocator for Multithreaded Applications. SIGPLAN Not., 35:117--128, November 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Blumofe, C. Joerg, B. Kuszmaul, C. Leiserson, K. Randall, and Y. Zhou. Cilk: An Efficient Multithreaded Runtime System. In PPoPP '95: Proc. 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 207--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Bonwick. The Slab Allocator: An Object-Caching Kernel Memory Allocator. In USTC '94: Proc. 1994 USENIX Summer Technical Conference, pages 87--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. L. Chamberlain, D. Callahan, and H. P. Zima. Paralle Programmability and the Chapel Language. IJHPCA, 21(3):291--312, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Charles, C. Grothoff, V. A. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: An Object-Oriented Approach to Non-Uniform Cluster Computing. In OOPSLA '05: Proc. 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 519--538. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Dubinski. A Parallel Tree Code. New Astronomy, 1(2):133--147, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  10. T. A. El-Ghazawi, W. W. Carlson, and J. M. Draper. UPC Language Specifications v1.1.1. October 2003.Google ScholarGoogle Scholar
  11. K. Fatahalian, D. R. Horn, T. J. Knight, L. Leem, M. Houston J. Y. Park, M. Erez, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: Programming the Memory Hierarchy. In SC '06: Proc. 2006 ACM/IEEE Conference on High Performance Networking and Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Frigo, C. E. Leiserson, and K. H. Randall. The Implementation of the Cilk-5 Multithreaded Language. In PLDI '98: Proc. 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 212--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Gay and A. Aiken. Language Support for Regions. In PLDI '01: Proc. 2001 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 70--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. E. Gay. Memory Management with Explicit Regions. PhD thesis, UC Berkeley, Berkeley, CA, USA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Grove, O. Tardieu, D. Cunningham, B. Herta, I. Peshansky, and V. Saraswat. A Performance Model for X10 Applications. In X10 '11: Proc. ACM SIGPLAN 2011 X10 Workshop. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. R. Hanson. Fast Allocation and Deallocation of Memory Based on Object Lifetimes. Software Practice and Experience, 20:5--12, January 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. N. Hilfinger, D. O. Bonachea, K. Datta, D. Gay, S. L. Graham, B. R. Liblit, G. Pike, J. Z. Su, and K. A. Yelick. Titanium Language Reference Manual, Version 2.19. Technical Report UCB/EECS-2005-15, EECS Berkeley, November 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Howard, S. Dighe, Y. Hoskote, S. R. Vangal, and D. Finan. A 48-Core IA-32 Message-Passing Processor with DVFS in 45nm CMOS. In ISSCC '10: Proc. 2010 IEEE International Solid-State Circuits Conference, pages 108--109.Google ScholarGoogle Scholar
  19. R. L. Hudson, B. Saha, A.-R. Adl-Tabatabai, and B. C. Hertzberg. McRT-Malloc: A Scalable Transactional Memory Allocator. In ISMM '06: Proc. 2006 International Symposium on Memory Management, pages 74--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Husbands, C. Iancu, and K. Yelick. A Performance Analysis of the Berkeley UPC Compiler. In ICS '03: Proc. 17th International Conference on Supercomputing, pages 63--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. S. Johnstone and P. R. Wilson. The Memory Fragmentation Problem: Solved? SIGPLAN Notices, 34:26--36, October 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Kahan and P. Konecny. MAMA!: A Memory Allocator for Multithreaded Architectures. In PPoPP '06: Proc. 11th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 178--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Kukanov and M. Voss. The Foundations for Scalable Multi- Core Software in Intel Threading Building Blocks. Intel Technology Journal, 11(4), Nov. 2007.Google ScholarGoogle ScholarCross RefCross Ref
  24. E. A. Lee. The Problem with Threads. Computer, 39(5):33--42, May 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. Linardakis. Decoupling Method for Parallel Delaunay Two- Dimensional Mesh Generation. PhD thesis, College of William & Mary, Williamsburg, VA, USA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. M. Michael. Scalable Lock-Free Dynamic Memory Allocation. SIGPLAN Notices, 39:35--46, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. W. Numrich and J. Reid. Co-Array Fortran for Parallel Programming. SIGPLAN Fortran Forum, 17:1--31, August 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. OpenMP ARB. OpenMP Application Program Interface, v. 3.1. www.openmp.org, July 2011.Google ScholarGoogle Scholar
  29. P. Pratikakis, H. Vandierendonck, S. Lyberis, and D. S. Nikolopoulos. A Programming Model for Deterministic Task Parallelism. In MSPC '11: Proc. 2011 ACM SIGPLAN workshop on Memory Systems Performance and Correctness, pages 7--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Tofte and J.-P. Talpin. Region-Based Memory Management. Information and Computation, 132(2):109--176, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The myrmics memory allocator: hierarchical,message-passing allocation for global address spaces

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 47, Issue 11
      ISMM '12
      November 2012
      136 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2426642
      Issue’s Table of Contents
      • cover image ACM Conferences
        ISMM '12: Proceedings of the 2012 international symposium on Memory Management
        June 2012
        152 pages
        ISBN:9781450313506
        DOI:10.1145/2258996

      Copyright © 2012 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 15 June 2012

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!