Abstract
An important feature of functional programs is that they are parallel by default. Implementing an efficient parallel functional language, however, is a major challenge, in part because the high rate of allocation and freeing associated with functional programs requires an efficient and scalable memory manager.
In this paper, we present a technique for parallel memory management for strict functional languages with nested parallelism. At the highest level of abstraction, the approach consists of a technique to organize memory as a hierarchy of heaps, and an algorithm for performing automatic memory reclamation by taking advantage of a disentanglement property of parallel functional programs. More specifically, the idea is to assign to each parallel task its own heap in memory and organize the heaps in a hierarchy/tree that mirrors the hierarchy of tasks.
We present a nested-parallel calculus that specifies hierarchical heaps and prove in this calculus a disentanglement property, which prohibits a task from accessing objects allocated by another task that might execute in parallel. Leveraging the disentanglement property, we present a garbage collection technique that can operate on any subtree in the memory hierarchy concurrently as other tasks (and/or other collections) proceed in parallel. We prove the safety of this collector by formalizing it in the context of our parallel calculus. In addition, we describe how the proposed techniques can be implemented on modern shared-memory machines and present a prototype implementation as an extension to MLton, a high-performance compiler for the Standard ML language. Finally, we evaluate the performance of this implementation on a number of parallel benchmarks.
- U. A. Acar, G. E. Blelloch, and R. D. Blumofe. The data locality of work stealing. Theory of Computing Systems (TOCS), 35(3):321–347, 2002.Google Scholar
- U. A. Acar, A. Charguéraud, and M. Rainey. Oracle scheduling: Controlling granularity in implicitly parallel languages. In ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), 2011. Google Scholar
Digital Library
- U. A. Acar, A. Charguéraud, and M. Rainey. Scheduling parallel programs by work stealing with private deques. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP), 2013. Google Scholar
Digital Library
- U. A. Acar, G. Blelloch, M. Fluet, S. K. Muller, and R. Raghunathan. Coupling memory and computation for locality management. In Summit on Advances in Programming Languages (SNAPL), 2015.Google Scholar
- T. A. Anderson. Optimizations in a private nursery-based garbage collector. In J. Vitek and D. Lea, editors, 9th International Symposium on Memory Management, pages 21–30, Toronto, Canada, June 2010. Google Scholar
Digital Library
- ACM Press..Google Scholar
- T. A. Anderson, M. O’Neill, and J. Sarracino. Chihuahua: A concurrent, moving, garbage collector using transactional memory. In TRANSACT, 2015.Google Scholar
- N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. Theory Comput. Syst., 34(2): 115–144, 2001.Google Scholar
Cross Ref
- S. Auhagen, L. Bergstrom, M. Fluet, and J. H. Reppy. Garbage collection for multicore NUMA machines. In Proceedings of the 2011 ACM SIGPLAN workshop on Memory Systems Performance and Correctness (MSPC), pages 51–57, 2011. Google Scholar
Digital Library
- K. Barabash, O. Ben-Yitzhak, I. Goft, E. K. Kolodner, V. Leikehman, Y. Ossia, A. Owshanko, and E. Petrank. A parallel, incremental, mostly concurrent garbage collector for servers. ACM Transactions on Programming Languages and Systems, 27(6):1097–1146, Nov. 2005.. Google Scholar
Digital Library
- E. Berger, K. McKinley, R. Blumofe, and P. Wilson. Hoard: A scalable memory allocator for multithreaded applications. In 9th International Conference on Architectural Support for Programming Languages and Operating Systems, ACM SIGPLAN Notices 35(11), pages 117–128, Cambridge, MA, Nov. 2000. ACM Press.. Google Scholar
Digital Library
- L. Bergstrom. Parallel Functional Programming with Mutable State. PhD thesis, The University of Chicago, June 2013.Google Scholar
- L. Bergstrom, M. Fluet, M. Rainey, J. Reppy, and A. Shaw. Lazy tree splitting. J. Funct. Program., 22(4-5):382–438, Aug. 2012. ISSN 0956-7968. Google Scholar
Digital Library
- L. Bergstrom, M. Fluet, M. Rainey, J. Reppy, S. Rosen, and A. Shaw. Data-only flattening for nested data parallelism. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’13, pages 81–92, 2013. ISBN 978-1-4503-1922- 5. Google Scholar
Digital Library
- G. E. Blelloch, J. T. Fineman, P. B. Gibbons, and H. V. Simhadri. Scheduling irregular parallel computations on hierarchical caches. In Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’11, pages 355–366, 2011. ISBN 978-1-4503- 0743-7. Google Scholar
Digital Library
- R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM, 46:720–748, Sept. 1999. ISSN 0004-5411. Google Scholar
Digital Library
- R. L. Bocchino, Jr., V. S. Adve, D. Dig, S. V. Adve, S. Heumann, R. Komuravelli, J. Overbey, P. Simmons, H. Sung, and M. Vakilian. A type and e ffect system for deterministic parallel java. In Proc. ACM SIGPLAN Conf. on Object Oriented Programming Systems Languages and Applications (OOPSLA), pages 97–116, 2009. Google Scholar
Digital Library
- R. P. Brent. The parallel evaluation of general arithmetic expressions. J. ACM, 21(2):201–206, 1974. Google Scholar
Digital Library
- A. G. Bromley. Memory fragmentation in buddy methods for dynamic storage allocation. Acta Informatica, 14(2):107–117, Aug. 1980. Google Scholar
Digital Library
- P. Charles, C. Grotho ff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOPSLA ’05, pages 519–538. ACM, 2005. ISBN 1-59593-031-0. Google Scholar
Digital Library
- C. J. Cheney. A non-recursive list compacting algorithm. Communications of the ACM, 13(11):677–8, Nov. 1970.. Google Scholar
Digital Library
- P. Cheng and G. Blelloch. A parallel, real-time garbage collector. In ACM SIGPLAN Conference on Programming Language Design and Implementation, ACM SIGPLAN Notices 36(5), pages 125–136, Snowbird, UT, June 2001. ACM Press.. Google Scholar
Digital Library
- E. W. Dijkstra, L. Lamport, A. J. Martin, C. S. Scholten, and E. F. M. Ste ffens. On-the-fly garbage collection: An exercise in cooperation. Communications of the ACM, 21(11):965–975, Nov. 1978.. Google Scholar
Digital Library
- D. Doligez and G. Gonthier. Portable, unobtrusive garbage collection for multiprocessor systems. In 21st Annual ACM Symposium on Principles of Programming Languages, pages 70–83, Portland, OR, Jan. 1994. ACM Press.. Google Scholar
Digital Library
- D. Doligez and X. Leroy. A concurrent generational garbage collector for a multi-threaded implementation of ML. In 20th Annual ACM Symposium on Principles of Programming Languages, pages 113–123, Charleston, SC, Jan. 1993. ACM Press.. Google Scholar
Digital Library
- T. Domani, G. Goldshtein, E. K. Kolodner, E. Lewis, E. Petrank, and D. Sheinwald. Thread-local heaps for java. In Proc. International Symposium on Memory Management (ISMM), pages 183–194, 2002. Google Scholar
Digital Library
- T. Endo, K. Taura, and A. Yonezawa. A scalable mark-sweep garbage collector on large-scale shared-memory machines. In ACM /IEEE Conference on Supercomputing, San Jose, CA, Nov. 1997.. Google Scholar
Digital Library
- C. Flood, D. Detlefs, N. Shavit, and C. Zhang. Parallel garbage collection for shared memory multiprocessors. In 1st Java Virtual Machine Research and Technology Symposium, Monterey, CA, Apr. 2001. USENIX. Google Scholar
Digital Library
- M. Fluet, M. Rainey, J. Reppy, and A. Shaw. Implicitly threaded parallelism in Manticore. Journal of Functional Programming, 20(5-6): 1–40, 2011. Google Scholar
Digital Library
- M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. PLDI ’98, pages 212–223, 1998. Google Scholar
Digital Library
- J. Greiner and G. E. Blelloch. A provably time-e fficient parallel implementation of full speculation. ACM Trans. Program. Lang. Syst., 21(2):240–285, Mar. 1999. ISSN 0164-0925. Google Scholar
Digital Library
- R. H. Halstead. Multilisp: A language for concurrent symbolic computation. ACM Transactions on Programming Languages and Systems, 7(4):501–538, Oct. 1985.. Google Scholar
Digital Library
- M. Herlihy and J. E. B. Moss. Lock-free garbage collection for multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 3(3):304–311, May 1992.. Google Scholar
Digital Library
- S. M. Imam and V. Sarkar. Habanero-Java library: a Java 8 framework for multicore programming. In 2014 International Conference on Principles and Practices of Programming on the Java Platform Virtual Machines, Languages and Tools, PPPJ ’14, Cracow, Poland, September 23-26, 2014, pages 75–86, 2014. Google Scholar
Digital Library
- G. Keller, M. M. Chakravarty, R. Leshchinskiy, S. Peyton Jones, and B. Lippmeier. Regular, shape-polymorphic, parallel arrays in Haskell. In Proceedings of the 15th ACM SIGPLAN international conference on Functional programming, ICFP ’10, pages 261–272, 2010. ISBN 978-1-60558-794-3. Google Scholar
Digital Library
- D. Lea. A Java fork /join framework. In Proceedings of the ACM 2000 conference on Java Grande, JAVA ’00, pages 36–43, 2000. ISBN 1-58113-288-3. Google Scholar
Digital Library
- S. Marlow. Parallel and concurrent programming in haskell. In Central European Functional Programming School - 4th Summer School, CEFP 2011, Budapest, Hungary, June 14-24, 2011, Revised Selected Papers, pages 339–401, 2011. Google Scholar
Digital Library
- S. Marlow and S. L. Peyton Jones. Multicore garbage collection with local heaps. In Proceedings of the 10th International Symposium on Memory Management (ISMM), pages 21–32, 2011. Google Scholar
Digital Library
- S. Marlow, T. Harris, R. James, and S. L. Peyton Jones. Parallel generational-copying garbage collection with a block-structured heap. In R. Jones and S. Blackburn, editors, 7th International Symposium on Memory Management, pages 11–20, Tucson, AZ, June 2008. ACM Press.. Google Scholar
Digital Library
- MLton. MLton web site. http://www.mlton.org.Google Scholar
- G. Morrisett, M. Felleisen, and R. Harper. Abstract models of memory management. In Proceedings of the Seventh International Conference on Functional Programming Languages and Computer Architecture, FPCA ’95, pages 66–77, 1995. Google Scholar
Digital Library
- G. J. Narlikar and G. E. Blelloch. Space-e fficient scheduling of nested parallelism. ACM Transactions on Programming Languages and Systems, 21, 1999. Google Scholar
Digital Library
- R. S. Nikhil. ID language reference manual, 1991.Google Scholar
- F. Pizlo, A. L. Hosking, and J. Vitek. Hierarchical real-time garbage collection. In ACM SIGPLAN /SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, ACM SIGPLAN Notices 42(7), pages 123–133, San Diego, CA, June 2007. ACM Press.. Google Scholar
Digital Library
- M. A. Rainey. E ffective Scheduling Techniques for High-Level Parallel Programming Languages. PhD thesis, The University of Chicago, Aug. 2010. Google Scholar
Digital Library
- H. V. Simhadri, G. E. Blelloch, J. T. Fineman, P. B. Gibbons, and A. Kyrola. Experimental analysis of space-bounded schedulers. In Proc. ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 30–41, 2014. Google Scholar
Digital Library
- K. C. Sivaramakrishnan, L. Ziarek, and S. Jagannathan. Multimlton: A multicore-aware runtime for standard ML. J. Funct. Program., 24(6): 613–674, 2014..Google Scholar
Cross Ref
- D. Spoonhower. Scheduling Deterministic Parallel Programs. PhD thesis, Carnegie Mellon University, May 2009. Google Scholar
Digital Library
- O. Tardieu, B. Herta, D. Cunningham, D. Grove, P. Kambadur, V. Saraswat, A. Shinnar, M. Takeuchi, and M. Vaziri. X10 and APGAS at petascale. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’14, pages 53–66, 2014. Google Scholar
Digital Library
- A. Tzannes, G. C. Caragea, U. Vishkin, and R. Barua. Lazy scheduling: A runtime adaptive scheduler for declarative parallelism. ACM Trans. Program. Lang. Syst., 36(3):10:1–10:51, Sept. 2014. ISSN 0164-0925. Google Scholar
Digital Library
- J. Ullman. NP-complete scheduling problems. Journal of Computer and System Sciences, 10(3):384 – 393, 1975. ISSN 0022-0000.. Google Scholar
Digital Library
- T. Wrigstad, F. Pizlo, F. Meawad, L. Zhao, and J. Vitek. Loci: Simple thread-locality for java. In Proc. European Conference on Oriented Programming (ECOOP), pages 445–469. Springer, 2009. Google Scholar
Digital Library
Index Terms
Hierarchical memory management for parallel programs
Recommendations
Hierarchical memory management for parallel programs
ICFP 2016: Proceedings of the 21st ACM SIGPLAN International Conference on Functional ProgrammingAn important feature of functional programs is that they are parallel by default. Implementing an efficient parallel functional language, however, is a major challenge, in part because the high rate of allocation and freeing associated with functional ...
Implementation of a parallel Prolog interpreter on multiprocessors
IPPS '91: Proceedings of the Fifth International Parallel Processing SymposiumDescribes the implementation of the Reduce-OR process model for the parallel execution of logic programs in an interpreter for parallel Prolog. The interpreter supports full OR and independent AND parallelism in logic programs on both shared and ...
Idle time garbage collection scheduling
PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and ImplementationEfficient garbage collection is increasingly important in today's managed language runtime systems that demand low latency, low memory consumption, and high throughput. Garbage collection may pause the application for many milliseconds to identify live ...







Comments