Abstract
On any modern computer architecture today, parallelism comes with a modest cost, born from the creation and management of threads or tasks. Today, programmers battle this cost by manually optimizing/tuning their codes to minimize the cost of parallelism without harming its benefit, performance. This is a difficult battle: programmers must reason about architectural constant factors hidden behind layers of software abstractions, including thread schedulers and memory managers, and their impact on performance, also at scale. In languages that support higher-order functions, the battle hardens: higher order functions can make it difficult, if not impossible, to reason about the cost and benefits of parallelism.
Motivated by these challenges and the numerous advantages of high-level languages, we believe that it has become essential to manage parallelism automatically so as to minimize its cost and maximize its benefit. This is a challenging problem, even when considered on a case-by-case, application-specific basis. But if a solution were possible, then it could combine the many correctness benefits of high-level languages with performance by managing parallelism without the programmer effort needed to ensure performance. This paper proposes techniques for such automatic management of parallelism by combining static (compilation) and run-time techniques. Specifically, we consider the Parallel ML language with task parallelism, and describe a compiler pipeline that embeds "potential parallelism" directly into the call-stack and avoids the cost of task creation by default. We then pair this compilation pipeline with a run-time system that dynamically converts potential parallelism into actual parallel tasks. Together, the compiler and run-time system guarantee that the cost of parallelism remains low without losing its benefit. We prove that our techniques have no asymptotic impact on the work and span of parallel programs and thus preserve their asymptotic properties. We implement the proposed techniques by extending the MPL compiler for Parallel ML and show that it can eliminate the burden of manual optimization while delivering good practical performance.
- Umut A. Acar, Vitaly Aksenov, Arthur Charguéraud, and Mike Rainey. 2019. Provably and Practically Efficient Granularity Control. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (PPoPP ’19). Association for Computing Machinery, New York, NY, USA. 214–228. isbn:9781450362252 https://doi.org/10.1145/3293883.3295725
Google Scholar
- Umut A. Acar, Jatin Arora, Matthew Fluet, Ram Raghunathan, Sam Westrick, and Rohan Yadav. 2020. MPL: A High-Performance Compiler for Parallel ML. https://github.com/MPLLang/mpl
Google Scholar
- Umut A. Acar, Guy Blelloch, Matthew Fluet, Stefan K. Muller, and Ram Raghunathan. 2015. Coupling Memory and Computation for Locality Management. In Summit on Advances in Programming Languages (SNAPL).
Google Scholar
- Umut A. Acar, Guy E. Blelloch, and Robert D. Blumofe. 2002. The Data Locality of Work Stealing. Theory of Computing Systems, 35, 3 (2002), 321–347.
Google Scholar
- Umut A. Acar, Arthur Charguéraud, Adrien Guatto, Mike Rainey, and Filip Sieczkowski. 2018. Heartbeat Scheduling: Provable Efficiency for Nested Parallelism. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). 769–782. isbn:978-1-4503-5698-5
Google Scholar
- Umut A. Acar, Arthur Charguéraud, and Mike Rainey. 2011. Oracle Scheduling: Controlling Granularity in Implicitly Parallel Languages. In ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). 499–518.
Google Scholar
- Umut A. Acar, Arthur Charguéraud, Mike Rainey, and Filip Sieczkowski. 2016. Dag-calculus: A Calculus for Parallel Computation. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming (ICFP 2016). 18–32.
Google Scholar
- Jatin Arora, Sam Westrick, and Umut A. Acar. 2021. Provably Space Efficient Parallel Functional Programming. In Proceedings of the 48th Annual ACM Symposium on Principles of Programming Languages (POPL)".
Google Scholar
- Jatin Arora, Sam Westrick, and Umut A. Acar. 2023. Efficient Parallel Functional Programming with Effects. Proc. ACM Program. Lang., 7, PLDI (2023), 1558–1583. https://doi.org/10.1145/3591284
Google Scholar
- Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. 2001. Thread Scheduling for Multiprogrammed Multiprocessors. Theory of Computing Systems, 34, 2 (2001), 115–144.
Google Scholar
- Arvind, Rishiyur S. Nikhil, and Keshav K. Pingali. 1989. I-structures: Data Structures for Parallel Computing. ACM Trans. Program. Lang. Syst., 11, 4 (1989), Oct., 598–632.
Google Scholar
- Nilanjana Basu, Claudio Montanari, and Jakob Eriksson. 2021. Frequent Background Polling on a Shared Thread, Using Light-Weight Compiler Interrupts. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2021). Association for Computing Machinery, New York, NY, USA. 1249–1263. isbn:9781450383912 https://doi.org/10.1145/3453483.3454107
Google Scholar
- Michael A. Bender, Jeremy T. Fineman, Seth Gilbert, and Charles E. Leiserson. 2004. On-the-Fly Maintenance of Series-Parallel Relationships in Fork-Join Multithreaded Programs. In 16th Annual ACM Symposium on Parallel Algorithms and Architectures. 133–144.
Google Scholar
- Lars Bergstrom, Matthew Fluet, Mike Rainey, John Reppy, and Adam Shaw. 2012. Lazy Tree Splitting. J. Funct. Program., 22, 4-5 (2012), Aug., 382–438. issn:0956-7968
Google Scholar
- Guy E. Blelloch. 1996. Programming Parallel Algorithms. Commun. ACM, 39, 3 (1996), 85–97.
Google Scholar
- Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Harsha Vardhan Simhadri. 2011. Scheduling irregular parallel computations on hierarchical caches. In Proc. ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 355–366.
Google Scholar
- Guy E. Blelloch and Phillip B. Gibbons. 2004. Effectively sharing a cache among threads. In SPAA.
Google Scholar
- Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling multithreaded computations by work stealing. J. ACM, 46 (1999), Sept., 720–748.
Google Scholar
- Robert L. Bocchino, Jr., Vikram S. Adve, Danny Dig, Sarita V. Adve, Stephen Heumann, Rakesh Komuravelli, Jeffrey Overbey, Patrick Simmons, Hyojin Sung, and Mohsen Vakilian. 2009. A type and effect system for deterministic parallel Java. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications (OOPSLA ’09). 97–116.
Google Scholar
- Richard P. Brent. 1974. The parallel evaluation of general arithmetic expressions. J. ACM, 21, 2 (1974), 201–206.
Google Scholar
- Henry Cejtin, Suresh Jagannathan, and Stephen T. Weeks. 2000. Flow-directed Closure Conversion for Typed Languages. In European Symposium on Programming. 56–71.
Google Scholar
- Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications (OOPSLA ’05). ACM, 519–538.
Google Scholar
- Guang-Ien Cheng, Mingdong Feng, Charles E. Leiserson, Keith H. Randall, and Andrew F. Stark. 1998. Detecting data races in Cilk programs that use locks. In Proceedings of the 10th ACM Symposium on Parallel Algorithms and Architectures (SPAA ’98).
Google Scholar
- Rezaul Alam Chowdhury and Vijaya Ramachandran. 2008. Cache-efficient dynamic programming algorithms for multicores. In Proc. 20th ACM Symposium on Parallelism in Algorithms and Architectures. ACM, New York, NY, USA. 207–216.
Google Scholar
- A. Duran, J. Corbalan, and E. Ayguade. 2008. An adaptive cut-off for task parallelism. In 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis. 1–11.
Google Scholar
- Derek L. Eager, John Zahorjan, and Edward D. Lazowska. 1989. Speedup versus efficiency in parallel systems. IEEE Transactions on Computing, 38, 3 (1989), 408–423.
Google Scholar
- Martin Elsman. 1999. Static Interpretation of Modules. In International Conference on Functional Programming. 208–219.
Google Scholar
- Marc Feeley. 1993. Polling efficiently on stock hardware. In Proceedings of the conference on Functional programming languages and computer architecture (FPCA ’93). 179–187.
Google Scholar
- Mingdong Feng and Charles E. Leiserson. 1997. Efficient Detection of Determinacy Races in Cilk Programs. In Proceedings of the Ninth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA). 1–11.
Google Scholar
- Jeremy T. Fineman. 2005. Provably Good Race Detection That Runs in Parallel. Master’s thesis. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science. Cambridge, MA.
Google Scholar
- Cormac Flanagan and Stephen N. Freund. 2009. FastTrack: efficient and precise dynamic race detection. SIGPLAN Not., 44, 6 (2009), June, 121–133. issn:0362-1340 https://doi.org/10.1145/1543135.1542490
Google Scholar
- Matthew Fluet, Mike Rainey, John Reppy, and Adam Shaw. 2011. Implicitly threaded parallelism in Manticore. Journal of Functional Programming, 20, 5-6 (2011), 1–40.
Google Scholar
- Matthew Fluet, Mike Rainey, John Reppy, Adam Shaw, and Yingqi Xiao. 2007. Manticore: A Heterogeneous Parallel Language. In Proceedings of the 2007 Workshop on Declarative Aspects of Multicore Programming (DAMP ’07). 37–44. isbn:978-1-59593-690-5
Google Scholar
- Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. 1998. The Implementation of the Cilk-5 Multithreaded Language. In PLDI. 212–223.
Google Scholar
- Souradip Ghosh, Michael Cuevas, Simone Campanoni, and Peter Dinda. 2020. Compiler-Based Timing For Extremely Fine-Grain Preemptive Parallelism. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. 1–15. https://doi.org/10.1109/SC41405.2020.00057
Google Scholar
- Adrien Guatto, Sam Westrick, Ram Raghunathan, Umut A. Acar, and Matthew Fluet. 2018. Hierarchical memory management for mutable state. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2018, Vienna, Austria, February 24-28, 2018. 81–93.
Google Scholar
- Kyle C. Hale and Peter A. Dinda. 2018. An Evaluation of Asynchronous Software Events on Modern Hardware. In 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). 355–368. https://doi.org/10.1109/MASCOTS.2018.00041
Google Scholar
- Robert H. Halstead, Jr.. 1984. Implementation of Multilisp: Lisp on a Multiprocessor. In Proceedings of the 1984 ACM Symposium on LISP and functional programming (LFP ’84). ACM, 9–17.
Google Scholar
- Tasuku Hiraishi, Masahiro Yasugi, Seiji Umatani, and Taiichi Yuasa. 2009. Backtracking-based load balancing. Proceedings of the 2009 ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming, 44, 4 (2009), February, 55–64.
Google Scholar
- Lorenz Huelsbergen, James R. Larus, and Alexander Aiken. 1994. Using the Run-time Sizes of Data Structures to Guide Parallel-thread Creation. In Proceedings of the 1994 ACM Conference on LISP and Functional Programming (LFP ’94). 79–90. isbn:0-89791-643-3
Google Scholar
- Shams Mahmood Imam and Vivek Sarkar. 2014. Habanero-Java library: a Java 8 framework for multicore programming. In 2014 International Conference on Principles and Practices of Programming on the Java Platform Virtual Machines, Languages and Tools, PPPJ ’14. 75–86.
Google Scholar
- Intel. 2011. Intel Threading Building Blocks. https://www.threadingbuildingblocks.org/
Google Scholar
- Shintaro Iwasaki and Kenjiro Taura. 2016. A static cut-off for task parallel programs. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation. 139–150.
Google Scholar
- Dileep Kini, Umang Mathur, and Mahesh Viswanathan. 2017. Dynamic race prediction in linear time. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017, Albert Cohen and Martin T. Vechev (Eds.). ACM, 157–170.
Google Scholar
- Doug Lea. 2000. A Java fork/join framework. In Proceedings of the ACM 2000 conference on Java Grande (JAVA ’00). 36–43.
Google Scholar
- I-Ting Angelina Lee, Charles E. Leiserson, Tao B. Schardl, Zhunping Zhang, and Jim Sukha. 2015. On-the-Fly Pipeline Parallelism. TOPC, 2, 3 (2015), 17:1–17:42.
Google Scholar
- Peng Li, Simon Marlow, Simon L. Peyton Jones, and Andrew P. Tolmach. 2007. Lightweight concurrency primitives for GHC. In Proceedings of the ACM SIGPLAN Workshop on Haskell, Haskell 2007, Freiburg, Germany, September 30, 2007. 107–118.
Google Scholar
- Hans-Wolfgang Loidl and Kevin Hammond. 1995. On the granularity of divide-and-conquer parallelism. In Proceedings of the 1995 Glasgow Workshop on Functional Programming. 1–10.
Google Scholar
- P. Lopez, M. Hermenegildo, and S. Debray. 1996. A methodology for granularity-based control of parallelism in logic programs. Journal of Symbolic Computation, 21 (1996), June, 715–734.
Google Scholar
- Simon Marlow and Simon L. Peyton Jones. 2011. Multicore garbage collection with local heaps. In Proceedings of the 10th International Symposium on Memory Management, ISMM 2011, San Jose, CA, USA, June 04 - 05, 2011, Hans-Juergen Boehm and David F. Bacon (Eds.). ACM, 21–32.
Google Scholar
- John Mellor-Crummey. 1991. On-the-fly Detection of Data Races for Programs with Nested Fork-Join Parallelism. In Proceedings of Supercomputing’91. 24–33.
Google Scholar
- n.d.. MLton web site. http://www.mlton.org
Google Scholar
- E. Mohr, D. A. Kranz, and R. H. Halstead. 1991. Lazy task creation: a technique for increasing the granularity of parallel programs. IEEE Transactions on Parallel and Distributed Systems, 2, 3 (1991), 264–280.
Google Scholar
- Stefan Muller, Kyle Singer, Noah Goldstein, Umut A. Acar, Kunal Agrawal, and I-Ting Angelina Lee. 2020. Responsive Parallelism with Futures and State. In Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI).
Google Scholar
- Stefan K. Muller and Umut A. Acar. 2016. Latency-Hiding Work Stealing: Scheduling Interacting Parallel Computations with Work Stealing. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2016, Asilomar State Beach/Pacific Grove, CA, USA, July 11-13, 2016. 71–82.
Google Scholar
- Stefan K. Muller, Umut A. Acar, and Robert Harper. 2017. Responsive Parallel Computation: Bridging Competitive and Cooperative Threading. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM, New York, NY, USA. 677–692. isbn:978-1-4503-4988-8
Google Scholar
- Stefan K. Muller, Umut A. Acar, and Robert Harper. 2018. Types and Cost Models for Responsive Parallelism. In Proceedings of the 14th ACM SIGPLAN International Conference on Functional Programming (ICFP ’18).
Google Scholar
- Stefan K. Muller, Kyle Singer, Devyn Terra Keeney, Andrew Neth, Kunal Agrawal, I-Ting Angelina Lee, and Umut A. Acar. 2023. Responsive Parallelism with Synchronization. Proc. ACM Program. Lang., 7, PLDI (2023), 712–735.
Google Scholar
- Stefan K. Muller, Sam Westrick, and Umut A. Acar. 2019. Fairness in Responsive Parallelism. In Proceedings of the 24th ACM SIGPLAN International Conference on Functional Programming (ICFP 2019).
Google Scholar
- Robert O’Callahan and Jong-Deok Choi. 2003. Hybrid dynamic data race detection. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2003, June 11-13, 2003, San Diego, CA, USA, Rudolf Eigenmann and Martin C. Rinard (Eds.). ACM, 167–178.
Google Scholar
- Joseph Pehoushek and Joseph Weening. 1990. Low-cost process creation and dynamic partitioning in Qlisp. In Parallel Lisp: Languages and Systems, Takayasu Ito and Robert Halstead (Eds.) (Lecture Notes in Computer Science, Vol. 441). Springer Berlin / Heidelberg, 182–199.
Google Scholar
- Simon L. Peyton Jones, Roman Leshchinskiy, Gabriele Keller, and Manuel M. T. Chakravarty. 2008. Harnessing the Multicores: Nested Data Parallelism in Haskell. In FSTTCS. 383–414.
Google Scholar
- Ram Raghunathan, Stefan K. Muller, Umut A. Acar, and Guy Blelloch. 2016. Hierarchical Memory Management for Parallel Programs. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming (ICFP 2016). ACM, New York, NY, USA. 392–406.
Google Scholar
- Mike Rainey. 2023. The best multicore-parallelization refactoring you’ve never heard of. arxiv:2307.10556.
Google Scholar
- Mike Rainey, Kyle Hale, Ryan R. Newton, Nikos Hardavellas, Simone Campanoni, Peter Dinda, and Umut A. Acar. 2021. Task Parallel Assembly Language for Uncompromising Parallelism. In Proceedings of the 42nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’21). ACM, New York, NY, USA. http://mike-rainey.site/papers/tpal-long.pdf
Google Scholar
- Raghavan Raman, Jisheng Zhao, Vivek Sarkar, Martin Vechev, and Eran Yahav. 2010. Efficient Data Race Detection for Async-Finish Parallelism. In Runtime Verification, Howard Barringer, Ylies Falcone, Bernd Finkbeiner, Klaus Havelund, Insup Lee, Gordon Pace, Grigore Rosu, Oleg Sokolsky, and Nikolai Tillmann (Eds.) (Lecture Notes in Computer Science, Vol. 6418). Springer Berlin / Heidelberg, 368–383. isbn:978-3-642-16611-2
Google Scholar
- Raghavan Raman, Jisheng Zhao, Vivek Sarkar, Martin Vechev, and Eran Yahav. 2012. Scalable and Precise Dynamic Datarace Detection for Structured Parallelism. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’12). 531–542.
Google Scholar
- John C. Reynolds. 1972. Definitional Interpreters for Higher-order Programming Languages. In Proceedings of the 25^th ACM National Conference. 717–740.
Google Scholar
- Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, and Thomas Anderson. 1997. Eraser: A Dynamic Race Detector for Multi-Threaded Programs. In Proceedings of the Sixteenth ACM Symposium on Operating Systems Principles (SOSP).
Google Scholar
- Tao B. Schardl, William S. Moses, and Charles E. Leiserson. 2017. Tapir: Embedding Fork-Join Parallelism into LLVM’s Intermediate Representation. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’17). Association for Computing Machinery, New York, NY, USA. 249–265. isbn:9781450344937 https://doi.org/10.1145/3018743.3018758
Google Scholar
- Kish Shen, Vitor Santos Costa, and Andy King. 1999. Distance: A new metric for controlling granularity for parallel execution. Journal of Functional and Logic Programming, 1999 (1999), 1–23.
Google Scholar
- K. C. Sivaramakrishnan, Lukasz Ziarek, and Suresh Jagannathan. 2014. MultiMLton: A multicore-aware runtime for standard ML. Journal of Functional Programming, FirstView (2014), 6, 1–62.
Google Scholar
- Yannis Smaragdakis, Jacob Evans, Caitlin Sadowski, Jaeheon Yi, and Cormac Flanagan. 2012. Sound predictive race detection in polynomial time. In Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2012, Philadelphia, Pennsylvania, USA, January 22-28, 2012, John Field and Michael Hicks (Eds.). ACM, 387–400.
Google Scholar
- Daniel Spoonhower. 2009. Scheduling Deterministic Parallel Programs. Ph.D. Dissertation. Carnegie Mellon University. https://www.cs.cmu.edu/~rwh/theses/spoonhower.pdf
Google Scholar
- Daniel Spoonhower, Guy E. Blelloch, Phillip B. Gibbons, and Robert Harper. 2009. Beyond Nested Parallelism: Tight Bounds on Work-stealing Overheads for Parallel Futures. In Proceedings of the Twenty-first Annual Symposium on Parallelism in Algorithms and Architectures (SPAA ’09). ACM, New York, NY, USA. 91–100.
Google Scholar
- Andrew P. Tolmach and Dino Oliva. 1998. From ML to Ada: Strongly-Typed Language Interoperability via Source Translation. Journal of Functional Programming, 8, 4 (1998), 367–412. citeseer.nj.nec.com/tolmach93from.html
Google Scholar
- Alexandros Tzannes. 2012. Enhancing Productivity and Performance Portability of General-Purpose Parallel Programming. Ph.D. Dissertation. University of Maryland.
Google Scholar
- Alexandros Tzannes, George C. Caragea, Rajeev Barua, and Uzi Vishkin. 2010. Lazy binary-splitting: a run-time adaptive work-stealing scheduler. In PPoPP ’10. 179–190.
Google Scholar
- Alexandros Tzannes, George C. Caragea, Uzi Vishkin, and Rajeev Barua. 2014. Lazy Scheduling: A Runtime Adaptive Scheduler for Declarative Parallelism. TOPLAS, 36, 3 (2014), Article 10, Sept., 51 pages.
Google Scholar
- Robert Utterback, Kunal Agrawal, Jeremy T. Fineman, and I-Ting Angelina Lee. 2016. Provably Good and Practically Efficient Parallel Race Detection for Fork-Join Programs. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2016, Asilomar State Beach/Pacific Grove, CA, USA, July 11-13, 2016. 83–94.
Google Scholar
- Stephen Weeks. 2006. Whole-program compilation in MLton. In ML ’06: Proceedings of the 2006 workshop on ML. ACM, 1–1.
Google Scholar
- Joseph S. Weening. 1989. Parallel Execution of Lisp Programs. Ph.D. Dissertation. Stanford University. Computer Science Technical Report STAN-CS-89-1265.
Google Scholar
- Sam Westrick. 2022. Efficient and Scalable Parallel Functional Programming Through Disentanglement. Ph.D. Dissertation. Carnegie Mellon University. https://www.cs.cmu.edu/~swestric/22/thesis.pdf
Google Scholar
- Sam Westrick, Jatin Arora, and Umut A. Acar. 2022. Entanglement Detection With Near-Zero Cost. In Proceedings of the 24th ACM SIGPLAN International Conference on Functional Programming (ICFP 2022).
Google Scholar
- Sam Westrick, Rohan Yadav, Matthew Fluet, and Umut A. Acar. 2020. Disentanglement in Nested-Parallel Programs. In Proceedings of the 47th Annual ACM Symposium on Principles of Programming Languages (POPL)".
Google Scholar
- Yifan Xu, Kyle Singer, and I-Ting Angelina Lee. 2020. Parallel determinacy race detection for futures. In PPoPP ’20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, California, USA, February 22-26, 2020, Rajiv Gupta and Xipeng Shen (Eds.). ACM, 217–231. https://doi.org/10.1145/3332466.3374536
Google Scholar
- Yuan Yu, Tom Rodeheffer, and Wei Chen. 2005. RaceTrack: efficient detection of data race conditions via adaptive tracking. In Proceedings of the 20th ACM Symposium on Operating Systems Principles 2005, SOSP 2005, Brighton, UK, October 23-26, 2005, Andrew Herbert and Kenneth P. Birman (Eds.). ACM, 221–234.
Google Scholar
Index Terms
- Automatic Parallelism Management
Recommendations
Heartbeat scheduling: provable efficiency for nested parallelism
PLDI '18A classic problem in parallel computing is to take a high-level parallel program written, for example, in nested-parallel style with fork-join constructs and run it efficiently on a real machine. The problem could be considered solved in theory, but not ...
Provably and practically efficient granularity control
PPoPP '19: Proceedings of the 24th Symposium on Principles and Practice of Parallel ProgrammingOver the past decade, many programming languages and systems for parallel-computing have been developed, e.g., Fork/Join and Habanero Java, Parallel Haskell, Parallel ML, and X10. Although these systems raise the level of abstraction for writing ...
Heartbeat scheduling: provable efficiency for nested parallelism
PLDI 2018: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and ImplementationA classic problem in parallel computing is to take a high-level parallel program written, for example, in nested-parallel style with fork-join constructs and run it efficiently on a real machine. The problem could be considered solved in theory, but not ...





Comments