skip to main content

Provably space-efficient parallel functional programming

Published:04 January 2021Publication History
Skip Abstract Section

Abstract

Because of its many desirable properties, such as its ability to control effects and thus potentially disastrous race conditions, functional programming offers a viable approach to programming modern multicore computers. Over the past decade several parallel functional languages, typically based on dialects of ML and Haskell, have been developed. These languages, however, have traditionally underperformed procedural languages (such as C and Java). The primary reason for this is their hunger for memory, which only grows with parallelism, causing traditional memory management techniques to buckle under increased demand for memory. Recent work opened a new angle of attack on this problem by identifying a memory property of determinacy-race-free parallel programs, called disentanglement, which limits the knowledge of concurrent computations about each other’s memory allocations. The work has showed some promise in delivering good time scalability.

In this paper, we present provably space-efficient automatic memory management techniques for determinacy-race-free functional parallel programs, allowing both pure and imperative programs where memory may be destructively updated. We prove that for a program with sequential live memory of R*, any P-processor garbage-collected parallel run requires at most O(R* · P) memory. We also prove a work bound of O(W+R*P) for P-processor executions, accounting also for the cost of garbage collection. To achieve these results, we integrate thread scheduling with memory management. The idea is to coordinate memory allocation and garbage collection with thread scheduling decisions so that each processor can allocate memory without synchronization and independently collect a portion of memory by consulting a collection policy, which we formulate. The collection policy is fully distributed and does not require communicating with other processors. We show that the approach is practical by implementing it as an extension to the MPL compiler for Parallel ML. Our experimental results confirm our theoretical bounds and show that the techniques perform and scale well.

References

  1. 2011. Finagle: A Protocol-Agnostic RPC System. https://twitter.github.io/finagle/.Google ScholarGoogle Scholar
  2. 2015. Folly: Facebook Open-source Library. https://github.com/facebook/folly.Google ScholarGoogle Scholar
  3. Umut A. Acar, Guy Blelloch, Matthew Fluet, Stefan K. Muller, and Ram Raghunathan. 2015. Coupling Memory and Computation for Locality Management. In Summit on Advances in Programming Languages (SNAPL).Google ScholarGoogle Scholar
  4. Umut A. Acar, Guy E. Blelloch, and Robert D. Blumofe. 2002. The Data Locality of Work Stealing. Theory of Computing Systems 35, 3 ( 2002 ), 321-347.Google ScholarGoogle Scholar
  5. Umut A. Acar, Arthur Charguéraud, Adrien Guatto, Mike Rainey, and Filip Sieczkowski. 2018. Heartbeat Scheduling: Provable Eficiency for Nested Parallelism. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (Philadelphia, PA, USA) ( PLDI 2018 ). 769-782.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Umut A. Acar, Arthur Charguéraud, and Mike Rainey. 2013. Scheduling Parallel Programs by Work Stealing with Private Deques. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '13).Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Umut A. Acar, Arthur Charguéraud, and Mike Rainey. 2016a. Oracle-guided scheduling for controlling granularity in implicitly parallel languages. Journal of Functional Programming (JFP) 26 ( 2016 ), e23.Google ScholarGoogle Scholar
  8. Umut A. Acar, Arthur Charguéraud, Mike Rainey, and Filip Sieczkowski. 2016b. Dag-calculus: A Calculus for Parallel Computation. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming (ICFP 2016 ). 18-32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Sarita V. Adve. 2010. Data races are evil with no exceptions: technical perspective. Commun. ACM 53, 11 ( 2010 ), 84.Google ScholarGoogle Scholar
  10. Shivali Agarwal, Rajkishore Barik, Dan Bonachea, Vivek Sarkar, R. K. Shyamasundar, and Katherine A. Yelick. 2007. Deadlock-free scheduling of X10 computations with bounded resources. In SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, San Diego, California, USA, June 9-11, 2007. 229-240.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. R. Allen and D. A. Padua. 1987. Debugging Fortran on a Shared Memory Machine. In Proceedings of the 1987 International Conference on Parallel Processing. 721-727.Google ScholarGoogle Scholar
  12. B. Alpern, L. Carter, and E. Feig. 1990. Uniform memory hierarchies. In Proceedings 31st Annual Symposium on Foundations of Computer Science. 600-608 vol. 2. https://doi.org/10.1109/FSCS. 1990.89581 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Todd A. Anderson. 2010. Optimizations in a private nursery-based garbage collector. In Proceedings of the 9th International Symposium on Memory Management, ISMM 2010, Toronto, Ontario, Canada, June 5-6, 2010. 21-30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Andrew W. Appel. 1989. Simple Generational Garbage Collection and Fast Allocation. Software Prac. Experience 19, 2 ( 1989 ), 171-183. http://www.cs.princeton.edu/fac/~appel/papers/143.psGoogle ScholarGoogle Scholar
  15. Andrew W. Appel and Zhong Shao. 1996. Empirical and analytic study of stack versus heap cost for languages with closures. Journal of Functional Programming 6, 1 (Jan. 1996 ), 47-74. ftp://dafy.cs.yale.edu/pub/papers/shao/stack.psGoogle ScholarGoogle ScholarCross RefCross Ref
  16. Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. 1998. Thread scheduling for multiprogrammed multiprocessors. In Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures (Puerto Vallarta, Mexico) (SPAA '98). ACM Press, 119-129.Google ScholarGoogle Scholar
  17. Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. 2001. Thread Scheduling for Multiprogrammed Multiprocessors. Theory of Computing Systems 34, 2 ( 2001 ), 115-144.Google ScholarGoogle Scholar
  18. Arvind, Rishiyur S. Nikhil, and Keshav K. Pingali. 1989. I-structures: Data Structures for Parallel Computing. ACM Trans. Program. Lang. Syst. 11, 4 (Oct. 1989 ), 598-632.Google ScholarGoogle Scholar
  19. Sven Auhagen, Lars Bergstrom, Matthew Fluet, and John H. Reppy. 2011. Garbage collection for multicore NUMA machines. In Proceedings of the 2011 ACM SIGPLAN workshop on Memory Systems Performance and Correctness (MSPC). 51-57.Google ScholarGoogle Scholar
  20. Shai Avidan and Ariel Shamir. 2007. Seam carving for content-aware image resizing. In ACM SIGGRAPH 2007 papers. 10-es.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. David F. Bacon, Perry Cheng, and V.T. Rajan. 2003. A Real-Time Garbage Collecor with Low Overhead and Consistent Utilization. In Conference Record of the Thirtieth Annual ACM Symposium on Principles of Programming Languages (ACM SIGPLAN Notices). ACM Press, New Orleans, LA.Google ScholarGoogle Scholar
  22. M. Bauer, S. Treichler, E. Slaughter, and A. Aiken. 2012. Legion: Expressing locality and independence with logical regions. In SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1-11. https://doi.org/10.1109/SC. 2012.71 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ales Bizjak, Daniel Gratzer, Robbert Krebbers, and Lars Birkedal. 2019. Iron: managing obligations in higher-order concurrent separation logic. PACMPL 3, POPL ( 2019 ), 65 : 1-65 : 30.Google ScholarGoogle Scholar
  24. Guy Blelloch and John Greiner. 1995. Parallelism in sequential functional languages. In Proceedings of the 7th International Conference on Functional Programming Languages and Computer Architecture (FPCA '95). ACM, 226-237.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Guy E. Blelloch. 1996. Programming Parallel Algorithms. Commun. ACM 39, 3 ( 1996 ), 85-97.Google ScholarGoogle Scholar
  26. Guy E. Blelloch and Perry Cheng. 1999. On Bounding Time and Space for Multiprocessor Garbage Collection. In Proceedings of SIGPLAN'99 Conference on Programming Languages Design and Implementation (ACM SIGPLAN Notices ). ACM Press, Atlanta, 104-117.Google ScholarGoogle Scholar
  27. Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Julian Shun. 2012. Internally deterministic parallel algorithms can be fast. In PPoPP ' 12 (New Orleans, Louisiana, USA). 181-192.Google ScholarGoogle Scholar
  28. Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Harsha Vardhan Simhadri. 2011. Scheduling irregular parallel computations on hierarchical caches. In Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (San Jose, California, USA) ( SPAA '11). 355-366.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Guy E. Blelloch and Phillip B. Gibbons. 2004. Efectively sharing a cache among threads. In SPAA (Barcelona, Spain).Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Guy E. Blelloch, Phillip B. Gibbons, and Yossi Matias. 1999. Provably eficient scheduling for languages with fine-grained parallelism. J. ACM 46 ( March 1999 ), 281-321. Issue 2.Google ScholarGoogle Scholar
  31. Guy E. Blelloch, Phillip B. Gibbons, Yossi Matias, and Girija J. Narlikar. 1997. Space-eficient Scheduling of Parallelism with Synchronization Variables. In Proceedings of the Ninth Annual ACM Symposium on Parallel Algorithms and Architectures (Newport, Rhode Island, USA) ( SPAA '97). 12-23.Google ScholarGoogle Scholar
  32. Guy E Blelloch, Phillip B Gibbons, and Harsha Vardhan Simhadri. 2010. Low depth cache-oblivious algorithms. In Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures. 189-199.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Guy E. Blelloch and John Greiner. 1996. A provable time and space eficient implementation of NESL. In Proceedings of the 1st ACM SIGPLAN International Conference on Functional Programming. ACM, 213-225.Google ScholarGoogle Scholar
  34. Guy E. Blelloch, Jonathan C. Hardwick, Jay Sipelstein, Marco Zagha, and Siddhartha Chatterjee. 1994. Implementation of a Portable Nested Data-Parallel Language. J. Parallel Distrib. Comput. 21, 1 ( 1994 ), 4-14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1995. Cilk: An Eficient Multithreaded Runtime System. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Santa Barbara, California, 207-216.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1996. Cilk: An Eficient Multithreaded Runtime System. J. Parallel and Distrib. Comput. 37, 1 ( 1996 ), 55-69.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Robert D. Blumofe and Charles E. Leiserson. 1998. Space-Eficient Scheduling of Multithreaded Computations. SIAM J. Comput. 27, 1 ( 1998 ), 202-229.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling multithreaded computations by work stealing. J. ACM 46 ( Sept. 1999 ), 720-748. Issue 5.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Robert L. Bocchino, Stephen Heumann, Nima Honarmand, Sarita V. Adve, Vikram S. Adve, Adam Welc, and Tatiana Shpeisman. 2011. Safe nondeterminism in a deterministic-by-default parallel language. In ACM POPL.Google ScholarGoogle Scholar
  40. Robert L. Bocchino, Jr., Vikram S. Adve, Danny Dig, Sarita V. Adve, Stephen Heumann, Rakesh Komuravelli, Jefrey Overbey, Patrick Simmons, Hyojin Sung, and Mohsen Vakilian. 2009. A type and efect system for deterministic parallel Java. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications (Orlando, Florida, USA) ( OOPSLA '09). 97-116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Robert L Bocchino, Jr., Vikram S. Adve, Sarita V. Adve, and Marc Snir. 2009. Parallel programming must be deterministic by default. In First USENIX Conference on Hot Topics in Parallelism.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Hans-Juergen Boehm. 2011. How to Miscompile Programs with "Benign" Data Races. In 3rd USENIX Workshop on Hot Topics in Parallelism, HotPar'11, Berkeley, CA, USA, May 26-27, 2011.Google ScholarGoogle Scholar
  43. Richard P. Brent. 1974. The parallel evaluation of general arithmetic expressions. J. ACM 21, 2 ( 1974 ), 201-206.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In SIAM SDM.Google ScholarGoogle Scholar
  45. Manuel M. T. Chakravarty, Roman Leshchinskiy, Simon L. Peyton Jones, Gabriele Keller, and Simon Marlow. 2007. Data parallel Haskell: a status report. In Proceedings of the POPL 2007 Workshop on Declarative Aspects of Multicore Programming, DAMP 2007, Nice, France, January 16, 2007. 10-18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Philippe Charles, Christian Grothof, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications (San Diego, CA, USA) ( OOPSLA '05). ACM, 519-538.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Guang-Ien Cheng, Mingdong Feng, Charles E. Leiserson, Keith H. Randall, and Andrew F. Stark. 1998. Detecting data races in Cilk programs that use locks. In Proceedings of the 10th ACM Symposium on Parallel Algorithms and Architectures (SPAA '98).Google ScholarGoogle Scholar
  48. Perry Cheng and Guy Blelloch. 2001. A Parallel, Real-Time Garbage Collector. In Proceedings of SIGPLAN 2001 Conference on Programming Languages Design and Implementation (ACM SIGPLAN Notices ). ACM Press, Snowbird, Utah, 125-136.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Rezaul Alam Chowdhury and Vijaya Ramachandran. 2008. Cache-eficient dynamic programming algorithms for multicores. In Proc. 20th ACM Symposium on Parallelism in Algorithms and Architectures (Munich, Germany). ACM, New York, NY, USA, 207-216.Google ScholarGoogle Scholar
  50. Intel Corp. 2017. Knights landing (KNL): 2nd Generation Intel Xeon Phi processor. In Intel Xeon Processor E7 v4 Family Specification. https://ark.intel.com/products/series/93797/ Intel-Xeon-Processor-E7-v4-Family.Google ScholarGoogle Scholar
  51. Damien Doligez and Georges Gonthier. 1994. Portable, Unobtrusive Garbage Collection for Multiprocessor Systems. In Conference Record of the Twenty-first Annual ACM Symposium on Principles of Programming Languages (ACM SIGPLAN Notices). ACM Press, Portland, OR. ftp://ftp.inria.fr/INRIA/Projects/para/doligez/DoligezGonthier94.ps.gzGoogle ScholarGoogle Scholar
  52. Damien Doligez and Xavier Leroy. 1993. A Concurrent Generational Garbage Collector for a Multi-Threaded Implementation of ML. In Conference Record of the Twentieth Annual ACM Symposium on Principles of Programming Languages (ACM SIGPLAN Notices). ACM Press, 113-123. file://ftp.inria.fr/INRIA/Projects/cristal/Xavier.Leroy/publications/concurrentgc.ps.gzGoogle ScholarGoogle Scholar
  53. Tamar Domani, Elliot K. Kolodner, Ethan Lewis, Erez Petrank, and Dafna Sheinwald. 2002. Thread-Local Heaps for Java. In ISMM'02 Proceedings of the Third International Symposium on Memory Management (ACM SIGPLAN Notices), David Detlefs (Ed.). ACM Press, Berlin, 76-87. http://www.cs.technion.ac.il/~erez/publications.htmlGoogle ScholarGoogle Scholar
  54. Derek L. Eager, John Zahorjan, and Edward D. Lazowska. 1989. Speedup versus eficiency in parallel systems. IEEE Transactions on Computing 38, 3 ( 1989 ), 408-423.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Perry A. Emrath, Sanjoy Ghosh, and David A. Padua. 1991. Event Synchronization Analysis for Debugging Parallel Programs. In Supercomputing ' 91. 580-588.Google ScholarGoogle Scholar
  56. Kayvon Fatahalian, Daniel Reiter Horn, Timothy J. Knight, Larkhoon Leem, Mike Houston, Ji Young Park, Mattan Erez, Manman Ren, Alex Aiken, William J. Dally, and Pat Hanrahan. 2006. Sequoia: Programming the Memory Hierarchy. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (Tampa, Florida) (SC '06). ACM, New York, NY, USA, Article 83.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Mingdong Feng and Charles E. Leiserson. 1997. Eficient Detection of Determinacy Races in Cilk Programs. In Proceedings of the Ninth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA). 1-11.Google ScholarGoogle Scholar
  58. Mingdong Feng and Charles E. Leiserson. 1999. Eficient Detection of Determinacy Races in Cilk Programs. Theory of Computing Systems 32, 3 ( 1999 ), 301-326.Google ScholarGoogle Scholar
  59. Cormac Flanagan and Stephen N. Freund. 2009. FastTrack: eficient and precise dynamic race detection. SIGPLAN Not. 44, 6 ( June 2009 ), 121-133. https://doi.org/10.1145/1543135.1542490 Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Cormac Flanagan, Stephen N. Freund, Marina Lifshin, and Shaz Qadeer. 2008. Types for atomicity: Static checking and inference for Java. ACM Trans. Program. Lang. Syst. 30, 4 ( 2008 ), 20 : 1-20 : 53.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Matthew Fluet, Greg Morrisett, and Amal J. Ahmed. 2006. Linear Regions Are All You Need. In Proceedings of the 15th Annual European Symposium on Programming (ESOP).Google ScholarGoogle Scholar
  62. Matthew Fluet, Mike Rainey, and John Reppy. 2008. A scheduling framework for general-purpose parallel languages. In ACM SIGPLAN International Conference on Functional Programming (ICFP).Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Matthew Fluet, Mike Rainey, John Reppy, and Adam Shaw. 2011. Implicitly threaded parallelism in Manticore. Journal of Functional Programming 20, 5-6 ( 2011 ), 1-40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Matteo Frigo, Pablo Halpern, Charles E. Leiserson, and Stephen Lewin-Berlin. 2009. Reducers and Other Cilk++ Hyperobjects. In 21st Annual ACM Symposium on Parallelism in Algorithms and Architectures. 79-90.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. 1998. The Implementation of the Cilk-5 Multithreaded Language. In PLDI. 212-223.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. David K. Giford and John M. Lucassen. 1986. Integrating Functional and Imperative Programming. In Proceedings of the ACM Symposium on Lisp and Functional Programming (LFP). ACM Press, 22-38.Google ScholarGoogle Scholar
  67. Marcelo J. R. Gonçalves. 1995. Cache Performance of Programs with Intensive Heap Allocation and Generational Garbage Collection. Ph.D. Dissertation. Department of Computer Science, Princeton University.Google ScholarGoogle Scholar
  68. Marcelo J. R. Gonçalves and Andrew W. Appel. 1995. Cache Performance of Fast-Allocating Programs. In Record of the 1995 Conference on Functional Programming and Computer Architecture.Google ScholarGoogle Scholar
  69. Dan Grossman, Greg Morrisett, Trevor Jim, Michael Hicks, Yanling Wang, and James Cheney. 2002. Region-Based Memory Management in Cyclone. In Proceedings of SIGPLAN 2002 Conference on Programming Languages Design and Implementation (ACM SIGPLAN Notices ). ACM Press, Berlin, 282-293.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Adrien Guatto, Sam Westrick, Ram Raghunathan, Umut A. Acar, and Matthew Fluet. 2018. Hierarchical memory management for mutable state. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2018, Vienna, Austria, February 24-28, 2018. 81-93.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Robert H. Halstead, Jr. 1984. Implementation of Multilisp: Lisp on a Multiprocessor. In Proceedings of the 1984 ACM Symposium on LISP and functional programming (Austin, Texas, United States) (LFP '84). ACM, 9-17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Kevin Hammond. 2011. Why Parallel Functional Programming Matters: Panel Statement. In Reliable Software Technologies-Ada-Europe 2011-16th Ada-Europe International Conference on Reliable Software Technologies, Edinburgh, UK, June 20-24, 2011. Proceedings. 201-205.Google ScholarGoogle Scholar
  73. David R. Hanson. 1990. Fast Allocation and Deallocation of Memory Based on Object Lifetimes. Software Prac. Experience 20, 1 (Jan. 1990 ), 5-12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Shams Mahmood Imam and Vivek Sarkar. 2014. Habanero-Java library: a Java 8 framework for multicore programming. In 2014 International Conference on Principles and Practices of Programming on the Java Platform Virtual Machines, Languages and Tools, PPPJ ' 14. 75-86.Google ScholarGoogle ScholarCross RefCross Ref
  75. Intel. 2011. Intel Threading Building Blocks. https://www.threadingbuildingblocks.org/.Google ScholarGoogle Scholar
  76. Intel Corporation 2009a. Intel Cilk++ SDK Programmer's Guide. Intel Corporation. Document Number: 322581-001US.Google ScholarGoogle Scholar
  77. Intel Corporation 2009b. Intel(R) Threading Building Blocks. Intel Corporation. Available from http://www. threadingbuildingblocks.org/documentation.php.Google ScholarGoogle Scholar
  78. Richard Jones, Antony Hosking, and Eliot Moss. 2011. The garbage collection handbook: the art of automatic memory management. Chapman & Hall/CRC.Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Ralf Jung, Jacques-Henri Jourdan, Robbert Krebbers, and Derek Dreyer. 2018a. RustBelt: securing the foundations of the rust programming language. PACMPL 2, POPL ( 2018 ), 66 : 1-66 : 34. https://doi.org/10.1145/3158154 Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Ralf Jung, Robbert Krebbers, Jacques-Henri Jourdan, Ales Bizjak, Lars Birkedal, and Derek Dreyer. 2018b. Iris from the ground up: A modular foundation for higher-order concurrent separation logic. J. Funct. Program. 28 ( 2018 ), e20.Google ScholarGoogle Scholar
  81. Gabriele Keller, Manuel M.T. Chakravarty, Roman Leshchinskiy, Simon Peyton Jones, and Ben Lippmeier. 2010. Regular, shape-polymorphic, parallel arrays in Haskell. In Proceedings of the 15th ACM SIGPLAN international conference on Functional programming (Baltimore, Maryland, USA) ( ICFP '10). 261-272.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. A. Krishnamurthy, D. E. Culler, A. Dusseau, S. C. Goldstein, S. Lumetta, T. von Eicken, and K. Yelick. 1993. Parallel Programming in Split-C. In Proceedings of the 1993 ACM/IEEE Conference on Supercomputing (Portland, Oregon, USA) ( Supercomputing '93). ACM, New York, NY, USA, 262-273. https://doi.org/10.1145/169627.169724 Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Milind Kulkarni, Keshav Pingali, Bruce Walter, Ganesh Ramanarayanan, Kavita Bala, and L. Paul Chew. 2007. Optimistic Parallelism Requires Abstractions. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (San Diego, California, USA) ( PLDI '07). 211-222.Google ScholarGoogle Scholar
  84. Lindsey Kuper and Ryan R Newton. 2013. LVars: lattice-based data structures for deterministic parallelism. In Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing. ACM, 71-84.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Lindsey Kuper, Aaron Todd, Sam Tobin-Hochstadt, and Ryan R. Newton. 2014a. Taming the Parallel Efect Zoo: Extensible Deterministic Parallelism with LVish. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (Edinburgh, United Kingdom) (PLDI '14). ACM, New York, NY, USA, 2-14. https://doi.org/10. 1145/2594291.2594312 Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Lindsey Kuper, Aaron Turon, Neelakantan R. Krishnaswami, and Ryan R. Newton. 2014b. Freeze After Writing: Quasideterministic Parallel Programming with LVars. In Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (San Diego, California, USA) ( POPL '14). ACM, New York, NY, USA, 257-270.Google ScholarGoogle Scholar
  87. John Launchbury and Simon L. Peyton Jones. 1994. Lazy Functional State Threads. In Proceedings of the ACM SIGPLAN'94 Conference on Programming Language Design and Implementation (PLDI), Orlando, Florida, USA, June 20-24, 1994. 24-35.Google ScholarGoogle Scholar
  88. Matthew Le and Matthew Fluet. 2015. Partial Aborts for Transactions via First-class Continuations. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming (Vancouver, BC, Canada) ( ICFP 2015 ). 230-242.Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Doug Lea. 2000. A Java fork/join framework. In Proceedings of the ACM 2000 conference on Java Grande (San Francisco, California, USA) ( JAVA '00). 36-43.Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. I-Ting Angelina Lee, Charles E. Leiserson, Tao B. Schardl, Zhunping Zhang, and Jim Sukha. 2015. On-the-Fly Pipeline Parallelism. TOPC 2, 3 ( 2015 ), 17 : 1-17 : 42.Google ScholarGoogle Scholar
  91. Daan Leijen, Wolfram Schulte, and Sebastian Burckhardt. 2009. The design of a task parallel library. In Proceedings of the 24th ACM SIGPLAN conference on Object Oriented Programming Systems Languages and Applications (Orlando, Florida, USA) ( OOPSLA '09). 227-242.Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Peng Li, Simon Marlow, Simon L. Peyton Jones, and Andrew P. Tolmach. 2007. Lightweight concurrency primitives for GHC. In Proceedings of the ACM SIGPLAN Workshop on Haskell, Haskell 2007, Freiburg, Germany, September 30, 2007. 107-118.Google ScholarGoogle Scholar
  93. J. M. Lucassen and D. K. Giford. 1988. Polymorphic Efect Systems. In Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (San Diego, California, USA) ( POPL '88). ACM, New York, NY, USA, 47-57.Google ScholarGoogle Scholar
  94. Simon Marlow and Simon L. Peyton Jones. 2011. Multicore garbage collection with local heaps. In Proceedings of the 10th International Symposium on Memory Management, ISMM 2011, San Jose, CA, USA, June 04-05, 2011, Hans-Juergen Boehm and David F. Bacon (Eds.). ACM, 21-32.Google ScholarGoogle Scholar
  95. John Mellor-Crummey. 1991. On-the-fly Detection of Data Races for Programs with Nested Fork-Join Parallelism. In Proceedings of Supercomputing'91. 24-33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Stefan Muller, Kyle Singer, Noah Goldstein, Umut A. Acar, Kunal Agrawal, and I-Ting Angelina Lee. 2020. Responsive Parallelism with Futures and State. In Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI).Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Stefan K. Muller and Umut A. Acar. 2016. Latency-Hiding Work Stealing: Scheduling Interacting Parallel Computations with Work Stealing. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2016, Asilomar State Beach/Pacific Grove, CA, USA, July 11-13, 2016. 71-82.Google ScholarGoogle Scholar
  98. Stefan K. Muller, Umut A. Acar, and Robert Harper. 2017. Responsive Parallel Computation: Bridging Competitive and Cooperative Threading. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (Barcelona, Spain) ( PLDI 2017). ACM, New York, NY, USA, 677-692.Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Stefan K. Muller, Umut A. Acar, and Robert Harper. 2018a. Competitive Parallelism: Getting Your Priorities Right. Proc. ACM Program. Lang. 2, ICFP, Article 95 ( July 2018 ), 30 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Stefan K. Muller, Umut A. Acar, and Robert Harper. 2018b. Types and Cost Models for Responsive Parallelism. In Proceedings of the 14th ACM SIGPLAN International Conference on Functional Programming (ICFP '18).Google ScholarGoogle Scholar
  101. Girija J. Narlikar and Guy E. Blelloch. 1999. Space-Eficient Scheduling of Nested Parallelism. ACM Transactions on Programming Languages and Systems 21 ( 1999 ).Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Robert H. B. Netzer and Barton P. Miller. 1992. What are Race Conditions? ACM Letters on Programming Languages and Systems 1, 1 (March 1992 ), 74-88.Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Robert W. Numrich and John Reid. 1998. Co-array Fortran for Parallel Programming. SIGPLAN Fortran Forum 17, 2 (Aug. 1998 ), 1-31. https://doi.org/10.1145/289918.289920 Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. Atsushi Ohori, Kenjiro Taura, and Katsuhiro Ueno. 2018. Making SML# a General-purpose High-performance Language. Unpublished Manuscript.Google ScholarGoogle Scholar
  105. OpenMP 5.0 2018. OpenMP Application Programming Interface, Version 5.0. Accessed in July 2018.Google ScholarGoogle Scholar
  106. Sungwoo Park, Frank Pfenning, and Sebastian Thrun. 2008. A Probabilistic Language Based on Sampling Functions. ACM Trans. Program. Lang. Syst. 31, 1, Article 4 ( Dec. 2008 ), 46 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. Simon L. Peyton Jones, Roman Leshchinskiy, Gabriele Keller, and Manuel M. T. Chakravarty. 2008. Harnessing the Multicores: Nested Data Parallelism in Haskell. In FSTTCS. 383-414.Google ScholarGoogle Scholar
  108. Simon L. Peyton Jones and Philip Wadler. 1993. Imperative Functional Programming. In Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Charleston, South Carolina, USA) ( POPL '93). 71-84.Google ScholarGoogle Scholar
  109. Keshav Pingali, Donald Nguyen, Milind Kulkarni, Martin Burtscher, Muhammad Amber Hassaan, Rashid Kaleem, TsungHsien Lee, Andrew Lenharth, Roman Manevich, Mario Méndez-Lojo, Dimitrios Prountzos, and Xin Sui. 2011. The tao of parallelism in algorithms. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4-8, 2011. 12-25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. Filip Pizlo, Erez Petrank, and Bjarne Steensgaard. 2008. A study of concurrent real-time garbage collectors. ACM SIGPLAN Notices 43, 6 ( 2008 ), 33-44.Google ScholarGoogle Scholar
  111. Ram Raghunathan, Stefan K. Muller, Umut A. Acar, and Guy Blelloch. 2016. Hierarchical Memory Management for Parallel Programs. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming (Nara, Japan) (ICFP 2016 ). ACM, New York, NY, USA, 392-406.Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. Raghavan Raman, Jisheng Zhao, Vivek Sarkar, Martin T. Vechev, and Eran Yahav. 2012. Scalable and precise dynamic datarace detection for structured parallelism. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ' 12, Beijing, China-June 11-16, 2012. 531-542.Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. John C. Reynolds. 1978. Syntactic Control of Interference. In Proceedings of the 5th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (Tucson, Arizona) (POPL '78). ACM, New York, NY, USA, 39-46.Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. John C. Reynolds. 2002. Separation Logic: A Logic for Shared Mutable Data Structures. In 17th IEEE Symposium on Logic in Computer Science (LICS 2002 ), 22-25 July 2002, Copenhagen, Denmark, Proceedings. 55-74.Google ScholarGoogle Scholar
  115. Dan Robinson. 2017. HPE shows The Machine-with 160TB of shared memory. Data Center Dynamics (May 2017 ).Google ScholarGoogle Scholar
  116. Mads Rosendahl. 1989. Automatic complexity analysis. In FPCA '89: Functional Programming Languages and Computer Architecture. ACM, 144-156.Google ScholarGoogle Scholar
  117. D. T. Ross. 1967. The AED Free Storage Package. Commun. ACM 10, 8 (Aug. 1967 ), 481-492.Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. Rust Team. 2019. Rust Language. https://www.rust-lang.org/Google ScholarGoogle Scholar
  119. David Sands. 1990a. Calculi for Time Analysis of Functional Programs. Ph.D. Dissertation. University of London, Imperial College.Google ScholarGoogle Scholar
  120. David Sands. 1990b. Complexity Analysis for a Lazy Higher-Order Language. In ESOP '90: Proceedings of the 3rd European Symposium on Programming. Springer-Verlag, London, UK, 361-376.Google ScholarGoogle ScholarCross RefCross Ref
  121. Patrick M. Sansom and Simon L. Peyton Jones. 1995. Time and space profiling for non-strict, higher-order functional languages. In Principles of Programming Languages (San Francisco, California, United States). 355-366.Google ScholarGoogle Scholar
  122. Jacob T. Schwartz. 1975. Optimization of very high level languages (parts I and II). Computer Languages 2-3, 1 ( 1975 ), 161-194, 197-218.Google ScholarGoogle Scholar
  123. Julian Shun and Guy E. Blelloch. 2013. Ligra: a lightweight graph processing framework for shared memory. In PPOPP '13. ACM, New York, NY, USA, 135-146.Google ScholarGoogle Scholar
  124. Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Aapo Kyrola, Harsha Vardhan Simhadri, and Kanat Tangwongsan. 2012. Brief Announcement: The Problem Based Benchmark Suite. In Proceedings of the Twenty-fourth Annual ACM Symposium on Parallelism in Algorithms and Architectures (Pittsburgh, Pennsylvania, USA) ( SPAA '12). 68-70.Google ScholarGoogle ScholarDigital LibraryDigital Library
  125. KC Sivaramakrishnan, Stephen Dolan, Leo White, Sadiq Jafer, Tom Kelly, Anmol Sahoo, Sudha Parimala, Atul Dhiman, and Anil Madhavapeddy. 2020. Retrofitting Parallelism onto OCaml. arXiv preprint arXiv: 2004. 11663 ( 2020 ).Google ScholarGoogle Scholar
  126. K. C. Sivaramakrishnan, Lukasz Ziarek, and Suresh Jagannathan. 2014. MultiMLton: A multicore-aware runtime for standard ML. Journal of Functional Programming FirstView (6 2014 ), 1-62.Google ScholarGoogle Scholar
  127. A. Sodani. 2015. Knights landing (KNL): 2nd Generation Intel Xeon Phi processor. In 2015 IEEE Hot Chips 27 Symposium (HCS). 1-24.Google ScholarGoogle ScholarCross RefCross Ref
  128. Daniel Spoonhower. 2009. Scheduling Deterministic Parallel Programs. Ph.D. Dissertation. Carnegie Mellon University. https://www.cs.cmu.edu/~rwh/theses/spoonhower.pdfGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  129. Daniel Spoonhower, Guy E. Blelloch, Phillip B. Gibbons, and Robert Harper. 2009. Beyond Nested Parallelism: Tight Bounds on Work-stealing Overheads for Parallel Futures. In Proceedings of the Twenty-first Annual Symposium on Parallelism in Algorithms and Architectures (Calgary, AB, Canada) ( SPAA '09). ACM, New York, NY, USA, 91-100.Google ScholarGoogle ScholarDigital LibraryDigital Library
  130. Daniel Spoonhower, Guy E. Blelloch, Robert Harper, and Phillip B. Gibbons. 2008. Space Profiling for Parallel Functional Programs. In International Conference on Functional Programming.Google ScholarGoogle Scholar
  131. Guy L. Steele, Jr. 1994. Building Interpreters by Composing Monads. In Proceedings of the 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Portland, Oregon, USA) ( POPL '94). ACM, New York, NY, USA, 472-492.Google ScholarGoogle ScholarDigital LibraryDigital Library
  132. Guy L. Steele Jr. 1990. Making Asynchronous Parallelism Safe for the World. In Proceedings of the Seventeenth Annual ACM Symposium on Principles of Programming Languages (POPL). ACM Press, 218-231.Google ScholarGoogle ScholarDigital LibraryDigital Library
  133. Tachio Terauchi and Alex Aiken. 2008. Witnessing Side Efects. ACM Trans. Program. Lang. Syst. 30, 3, Article 15 (May 2008 ), 42 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  134. Mads Tofte and Jean-Pierre Talpin. 1997. Region-Based Memory Management. Information and Computation (Feb. 1997 ). http://www.diku.dk/research-groups/topps/activities/kit2/infocomp97.psGoogle ScholarGoogle Scholar
  135. Aaron Turon, Derek Dreyer, and Lars Birkedal. 2013. Unifying refinement and hoare-style reasoning in a logic for higherorder concurrency. In ACM SIGPLAN International Conference on Functional Programming, ICFP'13, Boston, MA, USA-September 25-27, 2013. 377-390.Google ScholarGoogle ScholarDigital LibraryDigital Library
  136. Alexandros Tzannes, George C. Caragea, Uzi Vishkin, and Rajeev Barua. 2014. Lazy Scheduling: A Runtime Adaptive Scheduler for Declarative Parallelism. TOPLAS 36, 3, Article 10 ( Sept. 2014 ), 51 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  137. Robert Utterback, Kunal Agrawal, Jeremy T. Fineman, and I-Ting Angelina Lee. 2016. Provably Good and Practically Eficient Parallel Race Detection for Fork-Join Programs. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2016, Asilomar State Beach/Pacific Grove, CA, USA, July 11-13, 2016. 83-94.Google ScholarGoogle Scholar
  138. Viktor Vafeiadis and Matthew J. Parkinson. 2007. A Marriage of Rely/Guarantee and Separation Logic. In CONCUR 2007-Concurrency Theory, 18th International Conference, CONCUR 2007, Lisbon, Portugal, September 3-8, 2007, Proceedings. 256-271.Google ScholarGoogle Scholar
  139. David Walker. 2001. On Linear Types and Regions. In Proceedings of the First workshop on Semantics, Program Analysis and Computing Environments for Memory Management (SPACE'01). London. http://www.diku.dk/topps/space2001/program. html#DavidWalkerGoogle ScholarGoogle Scholar
  140. Sam Westrick, Rohan Yadav, Matthew Fluet, and Umut A. Acar. 2020. Disentanglement in Nested-Parallel Programs. In Proceedings of the 47th Annual ACM Symposium on Principles of Programming Languages (POPL)".Google ScholarGoogle Scholar
  141. Kathy Yelick, Luigi Semenzato, Geof Pike, Carleton Miyamoto, Ben Liblit, Arvind Krishnamurthy, Paul Hilfinger, Susan Graham, David Gay, Phil Colella, and Alex Aiken. 1998. Titanium: a high-performance Java dialect. Concurrency: Practice and Experience 10, 11-13 ( 1998 ), 825-836.Google ScholarGoogle ScholarCross RefCross Ref
  142. Lukasz Ziarek, K. C. Sivaramakrishnan, and Suresh Jagannathan. 2011. Composable asynchronous events. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4-8, 2011. 628-639.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Provably space-efficient parallel functional programming

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader