skip to main content
research-article
Open Access

Automatic Parallelism Management

Published:05 January 2024Publication History
Skip Abstract Section

Abstract

On any modern computer architecture today, parallelism comes with a modest cost, born from the creation and management of threads or tasks. Today, programmers battle this cost by manually optimizing/tuning their codes to minimize the cost of parallelism without harming its benefit, performance. This is a difficult battle: programmers must reason about architectural constant factors hidden behind layers of software abstractions, including thread schedulers and memory managers, and their impact on performance, also at scale. In languages that support higher-order functions, the battle hardens: higher order functions can make it difficult, if not impossible, to reason about the cost and benefits of parallelism.

Motivated by these challenges and the numerous advantages of high-level languages, we believe that it has become essential to manage parallelism automatically so as to minimize its cost and maximize its benefit. This is a challenging problem, even when considered on a case-by-case, application-specific basis. But if a solution were possible, then it could combine the many correctness benefits of high-level languages with performance by managing parallelism without the programmer effort needed to ensure performance. This paper proposes techniques for such automatic management of parallelism by combining static (compilation) and run-time techniques. Specifically, we consider the Parallel ML language with task parallelism, and describe a compiler pipeline that embeds "potential parallelism" directly into the call-stack and avoids the cost of task creation by default. We then pair this compilation pipeline with a run-time system that dynamically converts potential parallelism into actual parallel tasks. Together, the compiler and run-time system guarantee that the cost of parallelism remains low without losing its benefit. We prove that our techniques have no asymptotic impact on the work and span of parallel programs and thus preserve their asymptotic properties. We implement the proposed techniques by extending the MPL compiler for Parallel ML and show that it can eliminate the burden of manual optimization while delivering good practical performance.

References

  1. Umut A. Acar, Vitaly Aksenov, Arthur Charguéraud, and Mike Rainey. 2019. Provably and Practically Efficient Granularity Control. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (PPoPP ’19). Association for Computing Machinery, New York, NY, USA. 214–228. isbn:9781450362252 https://doi.org/10.1145/3293883.3295725 Google ScholarGoogle Scholar
  2. Umut A. Acar, Jatin Arora, Matthew Fluet, Ram Raghunathan, Sam Westrick, and Rohan Yadav. 2020. MPL: A High-Performance Compiler for Parallel ML. https://github.com/MPLLang/mpl Google ScholarGoogle Scholar
  3. Umut A. Acar, Guy Blelloch, Matthew Fluet, Stefan K. Muller, and Ram Raghunathan. 2015. Coupling Memory and Computation for Locality Management. In Summit on Advances in Programming Languages (SNAPL). Google ScholarGoogle Scholar
  4. Umut A. Acar, Guy E. Blelloch, and Robert D. Blumofe. 2002. The Data Locality of Work Stealing. Theory of Computing Systems, 35, 3 (2002), 321–347. Google ScholarGoogle Scholar
  5. Umut A. Acar, Arthur Charguéraud, Adrien Guatto, Mike Rainey, and Filip Sieczkowski. 2018. Heartbeat Scheduling: Provable Efficiency for Nested Parallelism. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). 769–782. isbn:978-1-4503-5698-5 Google ScholarGoogle Scholar
  6. Umut A. Acar, Arthur Charguéraud, and Mike Rainey. 2011. Oracle Scheduling: Controlling Granularity in Implicitly Parallel Languages. In ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). 499–518. Google ScholarGoogle Scholar
  7. Umut A. Acar, Arthur Charguéraud, Mike Rainey, and Filip Sieczkowski. 2016. Dag-calculus: A Calculus for Parallel Computation. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming (ICFP 2016). 18–32. Google ScholarGoogle Scholar
  8. Jatin Arora, Sam Westrick, and Umut A. Acar. 2021. Provably Space Efficient Parallel Functional Programming. In Proceedings of the 48th Annual ACM Symposium on Principles of Programming Languages (POPL)". Google ScholarGoogle Scholar
  9. Jatin Arora, Sam Westrick, and Umut A. Acar. 2023. Efficient Parallel Functional Programming with Effects. Proc. ACM Program. Lang., 7, PLDI (2023), 1558–1583. https://doi.org/10.1145/3591284 Google ScholarGoogle Scholar
  10. Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. 2001. Thread Scheduling for Multiprogrammed Multiprocessors. Theory of Computing Systems, 34, 2 (2001), 115–144. Google ScholarGoogle Scholar
  11. Arvind, Rishiyur S. Nikhil, and Keshav K. Pingali. 1989. I-structures: Data Structures for Parallel Computing. ACM Trans. Program. Lang. Syst., 11, 4 (1989), Oct., 598–632. Google ScholarGoogle Scholar
  12. Nilanjana Basu, Claudio Montanari, and Jakob Eriksson. 2021. Frequent Background Polling on a Shared Thread, Using Light-Weight Compiler Interrupts. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2021). Association for Computing Machinery, New York, NY, USA. 1249–1263. isbn:9781450383912 https://doi.org/10.1145/3453483.3454107 Google ScholarGoogle Scholar
  13. Michael A. Bender, Jeremy T. Fineman, Seth Gilbert, and Charles E. Leiserson. 2004. On-the-Fly Maintenance of Series-Parallel Relationships in Fork-Join Multithreaded Programs. In 16th Annual ACM Symposium on Parallel Algorithms and Architectures. 133–144. Google ScholarGoogle Scholar
  14. Lars Bergstrom, Matthew Fluet, Mike Rainey, John Reppy, and Adam Shaw. 2012. Lazy Tree Splitting. J. Funct. Program., 22, 4-5 (2012), Aug., 382–438. issn:0956-7968 Google ScholarGoogle Scholar
  15. Guy E. Blelloch. 1996. Programming Parallel Algorithms. Commun. ACM, 39, 3 (1996), 85–97. Google ScholarGoogle Scholar
  16. Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Harsha Vardhan Simhadri. 2011. Scheduling irregular parallel computations on hierarchical caches. In Proc. ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 355–366. Google ScholarGoogle Scholar
  17. Guy E. Blelloch and Phillip B. Gibbons. 2004. Effectively sharing a cache among threads. In SPAA. Google ScholarGoogle Scholar
  18. Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling multithreaded computations by work stealing. J. ACM, 46 (1999), Sept., 720–748. Google ScholarGoogle Scholar
  19. Robert L. Bocchino, Jr., Vikram S. Adve, Danny Dig, Sarita V. Adve, Stephen Heumann, Rakesh Komuravelli, Jeffrey Overbey, Patrick Simmons, Hyojin Sung, and Mohsen Vakilian. 2009. A type and effect system for deterministic parallel Java. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications (OOPSLA ’09). 97–116. Google ScholarGoogle Scholar
  20. Richard P. Brent. 1974. The parallel evaluation of general arithmetic expressions. J. ACM, 21, 2 (1974), 201–206. Google ScholarGoogle Scholar
  21. Henry Cejtin, Suresh Jagannathan, and Stephen T. Weeks. 2000. Flow-directed Closure Conversion for Typed Languages. In European Symposium on Programming. 56–71. Google ScholarGoogle Scholar
  22. Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications (OOPSLA ’05). ACM, 519–538. Google ScholarGoogle Scholar
  23. Guang-Ien Cheng, Mingdong Feng, Charles E. Leiserson, Keith H. Randall, and Andrew F. Stark. 1998. Detecting data races in Cilk programs that use locks. In Proceedings of the 10th ACM Symposium on Parallel Algorithms and Architectures (SPAA ’98). Google ScholarGoogle Scholar
  24. Rezaul Alam Chowdhury and Vijaya Ramachandran. 2008. Cache-efficient dynamic programming algorithms for multicores. In Proc. 20th ACM Symposium on Parallelism in Algorithms and Architectures. ACM, New York, NY, USA. 207–216. Google ScholarGoogle Scholar
  25. A. Duran, J. Corbalan, and E. Ayguade. 2008. An adaptive cut-off for task parallelism. In 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis. 1–11. Google ScholarGoogle Scholar
  26. Derek L. Eager, John Zahorjan, and Edward D. Lazowska. 1989. Speedup versus efficiency in parallel systems. IEEE Transactions on Computing, 38, 3 (1989), 408–423. Google ScholarGoogle Scholar
  27. Martin Elsman. 1999. Static Interpretation of Modules. In International Conference on Functional Programming. 208–219. Google ScholarGoogle Scholar
  28. Marc Feeley. 1993. Polling efficiently on stock hardware. In Proceedings of the conference on Functional programming languages and computer architecture (FPCA ’93). 179–187. Google ScholarGoogle Scholar
  29. Mingdong Feng and Charles E. Leiserson. 1997. Efficient Detection of Determinacy Races in Cilk Programs. In Proceedings of the Ninth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA). 1–11. Google ScholarGoogle Scholar
  30. Jeremy T. Fineman. 2005. Provably Good Race Detection That Runs in Parallel. Master’s thesis. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science. Cambridge, MA. Google ScholarGoogle Scholar
  31. Cormac Flanagan and Stephen N. Freund. 2009. FastTrack: efficient and precise dynamic race detection. SIGPLAN Not., 44, 6 (2009), June, 121–133. issn:0362-1340 https://doi.org/10.1145/1543135.1542490 Google ScholarGoogle Scholar
  32. Matthew Fluet, Mike Rainey, John Reppy, and Adam Shaw. 2011. Implicitly threaded parallelism in Manticore. Journal of Functional Programming, 20, 5-6 (2011), 1–40. Google ScholarGoogle Scholar
  33. Matthew Fluet, Mike Rainey, John Reppy, Adam Shaw, and Yingqi Xiao. 2007. Manticore: A Heterogeneous Parallel Language. In Proceedings of the 2007 Workshop on Declarative Aspects of Multicore Programming (DAMP ’07). 37–44. isbn:978-1-59593-690-5 Google ScholarGoogle Scholar
  34. Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. 1998. The Implementation of the Cilk-5 Multithreaded Language. In PLDI. 212–223. Google ScholarGoogle Scholar
  35. Souradip Ghosh, Michael Cuevas, Simone Campanoni, and Peter Dinda. 2020. Compiler-Based Timing For Extremely Fine-Grain Preemptive Parallelism. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. 1–15. https://doi.org/10.1109/SC41405.2020.00057 Google ScholarGoogle Scholar
  36. Adrien Guatto, Sam Westrick, Ram Raghunathan, Umut A. Acar, and Matthew Fluet. 2018. Hierarchical memory management for mutable state. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2018, Vienna, Austria, February 24-28, 2018. 81–93. Google ScholarGoogle Scholar
  37. Kyle C. Hale and Peter A. Dinda. 2018. An Evaluation of Asynchronous Software Events on Modern Hardware. In 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). 355–368. https://doi.org/10.1109/MASCOTS.2018.00041 Google ScholarGoogle Scholar
  38. Robert H. Halstead, Jr.. 1984. Implementation of Multilisp: Lisp on a Multiprocessor. In Proceedings of the 1984 ACM Symposium on LISP and functional programming (LFP ’84). ACM, 9–17. Google ScholarGoogle Scholar
  39. Tasuku Hiraishi, Masahiro Yasugi, Seiji Umatani, and Taiichi Yuasa. 2009. Backtracking-based load balancing. Proceedings of the 2009 ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming, 44, 4 (2009), February, 55–64. Google ScholarGoogle Scholar
  40. Lorenz Huelsbergen, James R. Larus, and Alexander Aiken. 1994. Using the Run-time Sizes of Data Structures to Guide Parallel-thread Creation. In Proceedings of the 1994 ACM Conference on LISP and Functional Programming (LFP ’94). 79–90. isbn:0-89791-643-3 Google ScholarGoogle Scholar
  41. Shams Mahmood Imam and Vivek Sarkar. 2014. Habanero-Java library: a Java 8 framework for multicore programming. In 2014 International Conference on Principles and Practices of Programming on the Java Platform Virtual Machines, Languages and Tools, PPPJ ’14. 75–86. Google ScholarGoogle Scholar
  42. Intel. 2011. Intel Threading Building Blocks. https://www.threadingbuildingblocks.org/ Google ScholarGoogle Scholar
  43. Shintaro Iwasaki and Kenjiro Taura. 2016. A static cut-off for task parallel programs. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation. 139–150. Google ScholarGoogle Scholar
  44. Dileep Kini, Umang Mathur, and Mahesh Viswanathan. 2017. Dynamic race prediction in linear time. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017, Albert Cohen and Martin T. Vechev (Eds.). ACM, 157–170. Google ScholarGoogle Scholar
  45. Doug Lea. 2000. A Java fork/join framework. In Proceedings of the ACM 2000 conference on Java Grande (JAVA ’00). 36–43. Google ScholarGoogle Scholar
  46. I-Ting Angelina Lee, Charles E. Leiserson, Tao B. Schardl, Zhunping Zhang, and Jim Sukha. 2015. On-the-Fly Pipeline Parallelism. TOPC, 2, 3 (2015), 17:1–17:42. Google ScholarGoogle Scholar
  47. Peng Li, Simon Marlow, Simon L. Peyton Jones, and Andrew P. Tolmach. 2007. Lightweight concurrency primitives for GHC. In Proceedings of the ACM SIGPLAN Workshop on Haskell, Haskell 2007, Freiburg, Germany, September 30, 2007. 107–118. Google ScholarGoogle Scholar
  48. Hans-Wolfgang Loidl and Kevin Hammond. 1995. On the granularity of divide-and-conquer parallelism. In Proceedings of the 1995 Glasgow Workshop on Functional Programming. 1–10. Google ScholarGoogle Scholar
  49. P. Lopez, M. Hermenegildo, and S. Debray. 1996. A methodology for granularity-based control of parallelism in logic programs. Journal of Symbolic Computation, 21 (1996), June, 715–734. Google ScholarGoogle Scholar
  50. Simon Marlow and Simon L. Peyton Jones. 2011. Multicore garbage collection with local heaps. In Proceedings of the 10th International Symposium on Memory Management, ISMM 2011, San Jose, CA, USA, June 04 - 05, 2011, Hans-Juergen Boehm and David F. Bacon (Eds.). ACM, 21–32. Google ScholarGoogle Scholar
  51. John Mellor-Crummey. 1991. On-the-fly Detection of Data Races for Programs with Nested Fork-Join Parallelism. In Proceedings of Supercomputing’91. 24–33. Google ScholarGoogle Scholar
  52. n.d.. MLton web site. http://www.mlton.org Google ScholarGoogle Scholar
  53. E. Mohr, D. A. Kranz, and R. H. Halstead. 1991. Lazy task creation: a technique for increasing the granularity of parallel programs. IEEE Transactions on Parallel and Distributed Systems, 2, 3 (1991), 264–280. Google ScholarGoogle Scholar
  54. Stefan Muller, Kyle Singer, Noah Goldstein, Umut A. Acar, Kunal Agrawal, and I-Ting Angelina Lee. 2020. Responsive Parallelism with Futures and State. In Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI). Google ScholarGoogle Scholar
  55. Stefan K. Muller and Umut A. Acar. 2016. Latency-Hiding Work Stealing: Scheduling Interacting Parallel Computations with Work Stealing. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2016, Asilomar State Beach/Pacific Grove, CA, USA, July 11-13, 2016. 71–82. Google ScholarGoogle Scholar
  56. Stefan K. Muller, Umut A. Acar, and Robert Harper. 2017. Responsive Parallel Computation: Bridging Competitive and Cooperative Threading. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM, New York, NY, USA. 677–692. isbn:978-1-4503-4988-8 Google ScholarGoogle Scholar
  57. Stefan K. Muller, Umut A. Acar, and Robert Harper. 2018. Types and Cost Models for Responsive Parallelism. In Proceedings of the 14th ACM SIGPLAN International Conference on Functional Programming (ICFP ’18). Google ScholarGoogle Scholar
  58. Stefan K. Muller, Kyle Singer, Devyn Terra Keeney, Andrew Neth, Kunal Agrawal, I-Ting Angelina Lee, and Umut A. Acar. 2023. Responsive Parallelism with Synchronization. Proc. ACM Program. Lang., 7, PLDI (2023), 712–735. Google ScholarGoogle Scholar
  59. Stefan K. Muller, Sam Westrick, and Umut A. Acar. 2019. Fairness in Responsive Parallelism. In Proceedings of the 24th ACM SIGPLAN International Conference on Functional Programming (ICFP 2019). Google ScholarGoogle Scholar
  60. Robert O’Callahan and Jong-Deok Choi. 2003. Hybrid dynamic data race detection. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2003, June 11-13, 2003, San Diego, CA, USA, Rudolf Eigenmann and Martin C. Rinard (Eds.). ACM, 167–178. Google ScholarGoogle Scholar
  61. Joseph Pehoushek and Joseph Weening. 1990. Low-cost process creation and dynamic partitioning in Qlisp. In Parallel Lisp: Languages and Systems, Takayasu Ito and Robert Halstead (Eds.) (Lecture Notes in Computer Science, Vol. 441). Springer Berlin / Heidelberg, 182–199. Google ScholarGoogle Scholar
  62. Simon L. Peyton Jones, Roman Leshchinskiy, Gabriele Keller, and Manuel M. T. Chakravarty. 2008. Harnessing the Multicores: Nested Data Parallelism in Haskell. In FSTTCS. 383–414. Google ScholarGoogle Scholar
  63. Ram Raghunathan, Stefan K. Muller, Umut A. Acar, and Guy Blelloch. 2016. Hierarchical Memory Management for Parallel Programs. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming (ICFP 2016). ACM, New York, NY, USA. 392–406. Google ScholarGoogle Scholar
  64. Mike Rainey. 2023. The best multicore-parallelization refactoring you’ve never heard of. arxiv:2307.10556. Google ScholarGoogle Scholar
  65. Mike Rainey, Kyle Hale, Ryan R. Newton, Nikos Hardavellas, Simone Campanoni, Peter Dinda, and Umut A. Acar. 2021. Task Parallel Assembly Language for Uncompromising Parallelism. In Proceedings of the 42nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’21). ACM, New York, NY, USA. http://mike-rainey.site/papers/tpal-long.pdf Google ScholarGoogle Scholar
  66. Raghavan Raman, Jisheng Zhao, Vivek Sarkar, Martin Vechev, and Eran Yahav. 2010. Efficient Data Race Detection for Async-Finish Parallelism. In Runtime Verification, Howard Barringer, Ylies Falcone, Bernd Finkbeiner, Klaus Havelund, Insup Lee, Gordon Pace, Grigore Rosu, Oleg Sokolsky, and Nikolai Tillmann (Eds.) (Lecture Notes in Computer Science, Vol. 6418). Springer Berlin / Heidelberg, 368–383. isbn:978-3-642-16611-2 Google ScholarGoogle Scholar
  67. Raghavan Raman, Jisheng Zhao, Vivek Sarkar, Martin Vechev, and Eran Yahav. 2012. Scalable and Precise Dynamic Datarace Detection for Structured Parallelism. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’12). 531–542. Google ScholarGoogle Scholar
  68. John C. Reynolds. 1972. Definitional Interpreters for Higher-order Programming Languages. In Proceedings of the 25^th ACM National Conference. 717–740. Google ScholarGoogle Scholar
  69. Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, and Thomas Anderson. 1997. Eraser: A Dynamic Race Detector for Multi-Threaded Programs. In Proceedings of the Sixteenth ACM Symposium on Operating Systems Principles (SOSP). Google ScholarGoogle Scholar
  70. Tao B. Schardl, William S. Moses, and Charles E. Leiserson. 2017. Tapir: Embedding Fork-Join Parallelism into LLVM’s Intermediate Representation. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’17). Association for Computing Machinery, New York, NY, USA. 249–265. isbn:9781450344937 https://doi.org/10.1145/3018743.3018758 Google ScholarGoogle Scholar
  71. Kish Shen, Vitor Santos Costa, and Andy King. 1999. Distance: A new metric for controlling granularity for parallel execution. Journal of Functional and Logic Programming, 1999 (1999), 1–23. Google ScholarGoogle Scholar
  72. K. C. Sivaramakrishnan, Lukasz Ziarek, and Suresh Jagannathan. 2014. MultiMLton: A multicore-aware runtime for standard ML. Journal of Functional Programming, FirstView (2014), 6, 1–62. Google ScholarGoogle Scholar
  73. Yannis Smaragdakis, Jacob Evans, Caitlin Sadowski, Jaeheon Yi, and Cormac Flanagan. 2012. Sound predictive race detection in polynomial time. In Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2012, Philadelphia, Pennsylvania, USA, January 22-28, 2012, John Field and Michael Hicks (Eds.). ACM, 387–400. Google ScholarGoogle Scholar
  74. Daniel Spoonhower. 2009. Scheduling Deterministic Parallel Programs. Ph.D. Dissertation. Carnegie Mellon University. https://www.cs.cmu.edu/~rwh/theses/spoonhower.pdf Google ScholarGoogle Scholar
  75. Daniel Spoonhower, Guy E. Blelloch, Phillip B. Gibbons, and Robert Harper. 2009. Beyond Nested Parallelism: Tight Bounds on Work-stealing Overheads for Parallel Futures. In Proceedings of the Twenty-first Annual Symposium on Parallelism in Algorithms and Architectures (SPAA ’09). ACM, New York, NY, USA. 91–100. Google ScholarGoogle Scholar
  76. Andrew P. Tolmach and Dino Oliva. 1998. From ML to Ada: Strongly-Typed Language Interoperability via Source Translation. Journal of Functional Programming, 8, 4 (1998), 367–412. citeseer.nj.nec.com/tolmach93from.html Google ScholarGoogle Scholar
  77. Alexandros Tzannes. 2012. Enhancing Productivity and Performance Portability of General-Purpose Parallel Programming. Ph.D. Dissertation. University of Maryland. Google ScholarGoogle Scholar
  78. Alexandros Tzannes, George C. Caragea, Rajeev Barua, and Uzi Vishkin. 2010. Lazy binary-splitting: a run-time adaptive work-stealing scheduler. In PPoPP ’10. 179–190. Google ScholarGoogle Scholar
  79. Alexandros Tzannes, George C. Caragea, Uzi Vishkin, and Rajeev Barua. 2014. Lazy Scheduling: A Runtime Adaptive Scheduler for Declarative Parallelism. TOPLAS, 36, 3 (2014), Article 10, Sept., 51 pages. Google ScholarGoogle Scholar
  80. Robert Utterback, Kunal Agrawal, Jeremy T. Fineman, and I-Ting Angelina Lee. 2016. Provably Good and Practically Efficient Parallel Race Detection for Fork-Join Programs. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2016, Asilomar State Beach/Pacific Grove, CA, USA, July 11-13, 2016. 83–94. Google ScholarGoogle Scholar
  81. Stephen Weeks. 2006. Whole-program compilation in MLton. In ML ’06: Proceedings of the 2006 workshop on ML. ACM, 1–1. Google ScholarGoogle Scholar
  82. Joseph S. Weening. 1989. Parallel Execution of Lisp Programs. Ph.D. Dissertation. Stanford University. Computer Science Technical Report STAN-CS-89-1265. Google ScholarGoogle Scholar
  83. Sam Westrick. 2022. Efficient and Scalable Parallel Functional Programming Through Disentanglement. Ph.D. Dissertation. Carnegie Mellon University. https://www.cs.cmu.edu/~swestric/22/thesis.pdf Google ScholarGoogle Scholar
  84. Sam Westrick, Jatin Arora, and Umut A. Acar. 2022. Entanglement Detection With Near-Zero Cost. In Proceedings of the 24th ACM SIGPLAN International Conference on Functional Programming (ICFP 2022). Google ScholarGoogle Scholar
  85. Sam Westrick, Rohan Yadav, Matthew Fluet, and Umut A. Acar. 2020. Disentanglement in Nested-Parallel Programs. In Proceedings of the 47th Annual ACM Symposium on Principles of Programming Languages (POPL)". Google ScholarGoogle Scholar
  86. Yifan Xu, Kyle Singer, and I-Ting Angelina Lee. 2020. Parallel determinacy race detection for futures. In PPoPP ’20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, California, USA, February 22-26, 2020, Rajiv Gupta and Xipeng Shen (Eds.). ACM, 217–231. https://doi.org/10.1145/3332466.3374536 Google ScholarGoogle Scholar
  87. Yuan Yu, Tom Rodeheffer, and Wei Chen. 2005. RaceTrack: efficient detection of data race conditions via adaptive tracking. In Proceedings of the 20th ACM Symposium on Operating Systems Principles 2005, SOSP 2005, Brighton, UK, October 23-26, 2005, Andrew Herbert and Kenneth P. Birman (Eds.). ACM, 221–234. Google ScholarGoogle Scholar

Index Terms

  1. Automatic Parallelism Management

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Proceedings of the ACM on Programming Languages
            Proceedings of the ACM on Programming Languages  Volume 8, Issue POPL
            January 2024
            2820 pages
            EISSN:2475-1421
            DOI:10.1145/3554315
            Issue’s Table of Contents

            Copyright © 2024 Owner/Author

            This work is licensed under a Creative Commons Attribution 4.0 International License.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 5 January 2024
            Published in pacmpl Volume 8, Issue POPL

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
          • Article Metrics

            • Downloads (Last 12 months)579
            • Downloads (Last 6 weeks)56

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader