skip to main content
article

Hierarchical memory management for parallel programs

Published:04 September 2016Publication History
Skip Abstract Section

Abstract

An important feature of functional programs is that they are parallel by default. Implementing an efficient parallel functional language, however, is a major challenge, in part because the high rate of allocation and freeing associated with functional programs requires an efficient and scalable memory manager.

In this paper, we present a technique for parallel memory management for strict functional languages with nested parallelism. At the highest level of abstraction, the approach consists of a technique to organize memory as a hierarchy of heaps, and an algorithm for performing automatic memory reclamation by taking advantage of a disentanglement property of parallel functional programs. More specifically, the idea is to assign to each parallel task its own heap in memory and organize the heaps in a hierarchy/tree that mirrors the hierarchy of tasks.

We present a nested-parallel calculus that specifies hierarchical heaps and prove in this calculus a disentanglement property, which prohibits a task from accessing objects allocated by another task that might execute in parallel. Leveraging the disentanglement property, we present a garbage collection technique that can operate on any subtree in the memory hierarchy concurrently as other tasks (and/or other collections) proceed in parallel. We prove the safety of this collector by formalizing it in the context of our parallel calculus. In addition, we describe how the proposed techniques can be implemented on modern shared-memory machines and present a prototype implementation as an extension to MLton, a high-performance compiler for the Standard ML language. Finally, we evaluate the performance of this implementation on a number of parallel benchmarks.

References

  1. U. A. Acar, G. E. Blelloch, and R. D. Blumofe. The data locality of work stealing. Theory of Computing Systems (TOCS), 35(3):321–347, 2002.Google ScholarGoogle Scholar
  2. U. A. Acar, A. Charguéraud, and M. Rainey. Oracle scheduling: Controlling granularity in implicitly parallel languages. In ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. U. A. Acar, A. Charguéraud, and M. Rainey. Scheduling parallel programs by work stealing with private deques. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. U. A. Acar, G. Blelloch, M. Fluet, S. K. Muller, and R. Raghunathan. Coupling memory and computation for locality management. In Summit on Advances in Programming Languages (SNAPL), 2015.Google ScholarGoogle Scholar
  5. T. A. Anderson. Optimizations in a private nursery-based garbage collector. In J. Vitek and D. Lea, editors, 9th International Symposium on Memory Management, pages 21–30, Toronto, Canada, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. ACM Press..Google ScholarGoogle Scholar
  7. T. A. Anderson, M. O’Neill, and J. Sarracino. Chihuahua: A concurrent, moving, garbage collector using transactional memory. In TRANSACT, 2015.Google ScholarGoogle Scholar
  8. N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. Theory Comput. Syst., 34(2): 115–144, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  9. S. Auhagen, L. Bergstrom, M. Fluet, and J. H. Reppy. Garbage collection for multicore NUMA machines. In Proceedings of the 2011 ACM SIGPLAN workshop on Memory Systems Performance and Correctness (MSPC), pages 51–57, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Barabash, O. Ben-Yitzhak, I. Goft, E. K. Kolodner, V. Leikehman, Y. Ossia, A. Owshanko, and E. Petrank. A parallel, incremental, mostly concurrent garbage collector for servers. ACM Transactions on Programming Languages and Systems, 27(6):1097–1146, Nov. 2005.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. Berger, K. McKinley, R. Blumofe, and P. Wilson. Hoard: A scalable memory allocator for multithreaded applications. In 9th International Conference on Architectural Support for Programming Languages and Operating Systems, ACM SIGPLAN Notices 35(11), pages 117–128, Cambridge, MA, Nov. 2000. ACM Press.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Bergstrom. Parallel Functional Programming with Mutable State. PhD thesis, The University of Chicago, June 2013.Google ScholarGoogle Scholar
  13. L. Bergstrom, M. Fluet, M. Rainey, J. Reppy, and A. Shaw. Lazy tree splitting. J. Funct. Program., 22(4-5):382–438, Aug. 2012. ISSN 0956-7968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. Bergstrom, M. Fluet, M. Rainey, J. Reppy, S. Rosen, and A. Shaw. Data-only flattening for nested data parallelism. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’13, pages 81–92, 2013. ISBN 978-1-4503-1922- 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. E. Blelloch, J. T. Fineman, P. B. Gibbons, and H. V. Simhadri. Scheduling irregular parallel computations on hierarchical caches. In Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’11, pages 355–366, 2011. ISBN 978-1-4503- 0743-7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM, 46:720–748, Sept. 1999. ISSN 0004-5411. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. L. Bocchino, Jr., V. S. Adve, D. Dig, S. V. Adve, S. Heumann, R. Komuravelli, J. Overbey, P. Simmons, H. Sung, and M. Vakilian. A type and e ffect system for deterministic parallel java. In Proc. ACM SIGPLAN Conf. on Object Oriented Programming Systems Languages and Applications (OOPSLA), pages 97–116, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. P. Brent. The parallel evaluation of general arithmetic expressions. J. ACM, 21(2):201–206, 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. G. Bromley. Memory fragmentation in buddy methods for dynamic storage allocation. Acta Informatica, 14(2):107–117, Aug. 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Charles, C. Grotho ff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOPSLA ’05, pages 519–538. ACM, 2005. ISBN 1-59593-031-0. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. J. Cheney. A non-recursive list compacting algorithm. Communications of the ACM, 13(11):677–8, Nov. 1970.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Cheng and G. Blelloch. A parallel, real-time garbage collector. In ACM SIGPLAN Conference on Programming Language Design and Implementation, ACM SIGPLAN Notices 36(5), pages 125–136, Snowbird, UT, June 2001. ACM Press.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. E. W. Dijkstra, L. Lamport, A. J. Martin, C. S. Scholten, and E. F. M. Ste ffens. On-the-fly garbage collection: An exercise in cooperation. Communications of the ACM, 21(11):965–975, Nov. 1978.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Doligez and G. Gonthier. Portable, unobtrusive garbage collection for multiprocessor systems. In 21st Annual ACM Symposium on Principles of Programming Languages, pages 70–83, Portland, OR, Jan. 1994. ACM Press.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Doligez and X. Leroy. A concurrent generational garbage collector for a multi-threaded implementation of ML. In 20th Annual ACM Symposium on Principles of Programming Languages, pages 113–123, Charleston, SC, Jan. 1993. ACM Press.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. T. Domani, G. Goldshtein, E. K. Kolodner, E. Lewis, E. Petrank, and D. Sheinwald. Thread-local heaps for java. In Proc. International Symposium on Memory Management (ISMM), pages 183–194, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. Endo, K. Taura, and A. Yonezawa. A scalable mark-sweep garbage collector on large-scale shared-memory machines. In ACM /IEEE Conference on Supercomputing, San Jose, CA, Nov. 1997.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. Flood, D. Detlefs, N. Shavit, and C. Zhang. Parallel garbage collection for shared memory multiprocessors. In 1st Java Virtual Machine Research and Technology Symposium, Monterey, CA, Apr. 2001. USENIX. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Fluet, M. Rainey, J. Reppy, and A. Shaw. Implicitly threaded parallelism in Manticore. Journal of Functional Programming, 20(5-6): 1–40, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. PLDI ’98, pages 212–223, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Greiner and G. E. Blelloch. A provably time-e fficient parallel implementation of full speculation. ACM Trans. Program. Lang. Syst., 21(2):240–285, Mar. 1999. ISSN 0164-0925. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. H. Halstead. Multilisp: A language for concurrent symbolic computation. ACM Transactions on Programming Languages and Systems, 7(4):501–538, Oct. 1985.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Herlihy and J. E. B. Moss. Lock-free garbage collection for multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 3(3):304–311, May 1992.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. M. Imam and V. Sarkar. Habanero-Java library: a Java 8 framework for multicore programming. In 2014 International Conference on Principles and Practices of Programming on the Java Platform Virtual Machines, Languages and Tools, PPPJ ’14, Cracow, Poland, September 23-26, 2014, pages 75–86, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. G. Keller, M. M. Chakravarty, R. Leshchinskiy, S. Peyton Jones, and B. Lippmeier. Regular, shape-polymorphic, parallel arrays in Haskell. In Proceedings of the 15th ACM SIGPLAN international conference on Functional programming, ICFP ’10, pages 261–272, 2010. ISBN 978-1-60558-794-3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. D. Lea. A Java fork /join framework. In Proceedings of the ACM 2000 conference on Java Grande, JAVA ’00, pages 36–43, 2000. ISBN 1-58113-288-3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. S. Marlow. Parallel and concurrent programming in haskell. In Central European Functional Programming School - 4th Summer School, CEFP 2011, Budapest, Hungary, June 14-24, 2011, Revised Selected Papers, pages 339–401, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Marlow and S. L. Peyton Jones. Multicore garbage collection with local heaps. In Proceedings of the 10th International Symposium on Memory Management (ISMM), pages 21–32, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Marlow, T. Harris, R. James, and S. L. Peyton Jones. Parallel generational-copying garbage collection with a block-structured heap. In R. Jones and S. Blackburn, editors, 7th International Symposium on Memory Management, pages 11–20, Tucson, AZ, June 2008. ACM Press.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. MLton. MLton web site. http://www.mlton.org.Google ScholarGoogle Scholar
  41. G. Morrisett, M. Felleisen, and R. Harper. Abstract models of memory management. In Proceedings of the Seventh International Conference on Functional Programming Languages and Computer Architecture, FPCA ’95, pages 66–77, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. G. J. Narlikar and G. E. Blelloch. Space-e fficient scheduling of nested parallelism. ACM Transactions on Programming Languages and Systems, 21, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. R. S. Nikhil. ID language reference manual, 1991.Google ScholarGoogle Scholar
  44. F. Pizlo, A. L. Hosking, and J. Vitek. Hierarchical real-time garbage collection. In ACM SIGPLAN /SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, ACM SIGPLAN Notices 42(7), pages 123–133, San Diego, CA, June 2007. ACM Press.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. M. A. Rainey. E ffective Scheduling Techniques for High-Level Parallel Programming Languages. PhD thesis, The University of Chicago, Aug. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. H. V. Simhadri, G. E. Blelloch, J. T. Fineman, P. B. Gibbons, and A. Kyrola. Experimental analysis of space-bounded schedulers. In Proc. ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 30–41, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. K. C. Sivaramakrishnan, L. Ziarek, and S. Jagannathan. Multimlton: A multicore-aware runtime for standard ML. J. Funct. Program., 24(6): 613–674, 2014..Google ScholarGoogle ScholarCross RefCross Ref
  48. D. Spoonhower. Scheduling Deterministic Parallel Programs. PhD thesis, Carnegie Mellon University, May 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. O. Tardieu, B. Herta, D. Cunningham, D. Grove, P. Kambadur, V. Saraswat, A. Shinnar, M. Takeuchi, and M. Vaziri. X10 and APGAS at petascale. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’14, pages 53–66, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. A. Tzannes, G. C. Caragea, U. Vishkin, and R. Barua. Lazy scheduling: A runtime adaptive scheduler for declarative parallelism. ACM Trans. Program. Lang. Syst., 36(3):10:1–10:51, Sept. 2014. ISSN 0164-0925. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. J. Ullman. NP-complete scheduling problems. Journal of Computer and System Sciences, 10(3):384 – 393, 1975. ISSN 0022-0000.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. T. Wrigstad, F. Pizlo, F. Meawad, L. Zhao, and J. Vitek. Loci: Simple thread-locality for java. In Proc. European Conference on Oriented Programming (ECOOP), pages 445–469. Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Hierarchical memory management for parallel programs

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 51, Issue 9
            ICFP '16
            September 2016
            501 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/3022670
            Issue’s Table of Contents
            • cover image ACM Conferences
              ICFP 2016: Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming
              September 2016
              501 pages
              ISBN:9781450342193
              DOI:10.1145/2951913

            Copyright © 2016 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 4 September 2016

            Check for updates

            Qualifiers

            • article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!