Abstract
A program is said to have a determinacy race if logically parallel parts of a program access the same memory location and one of the accesses is a write. These races are generally bugs in the program since they lead to non-deterministic program behavior --- different schedules of the program can lead to different results. Most prior work on detecting these races focuses on a subclass of programs with fork-join parallelism.
This paper presents a race-detection algorithm, 2D-Order, for detecting races in a more general class of programs, namely programs whose dependence structure can be represented as planar dags embedded in 2D grids. Such dependence structures arise from programs that use pipelined parallelism or dynamic programming recurrences. Given a computation with T1 work and T∞ span, 2D-Order executes it while also detecting races in O(T1/P + T∞) time on P processors, which is asymptotically optimal.
We also implemented PRacer, a race-detection algorithm based on 2D-Order for Cilk-P, which is a language for expressing pipeline parallelism. Empirical results demonstrate that PRacer incurs reasonable overhead and exhibits scalability similar to the baseline (executions without race detection) when running on multiple cores.
Supplemental Material
Available for Download
Supplement
- Kunal Agrawal, Charles E. Leiserson, and Jim Sukha. 2010. Executing Task Graphs Using Work-Stealing. In 24th IEEE International Parallel and Distributed Processing Symposium. 1--12.Google Scholar
- Todd R. Allen and David A. Padua. 1987. Debugging Fortran on a Shared Memory Machine. In Proceedings of the 1987 International Conference on Parallel Processing. 721--727.Google Scholar
- K. A. Baker, P. C. Fishburn, and F. S. Roberts. 1972. Partial orders of dimension 2. Networks 2, 1 (1972), 11--28.Google Scholar
Cross Ref
- Rajkishore Barik, Zoran Budimlić, Vincent Cavè, Sanjay Chatterjee, Yi Guo, David Peixotto, Raghavan Raman, Jun Shirako, Sağnak Taşirlar, Yonghong Yan, Yisheng Zhao, and Vivek Sarkar. 2009. The Habanero Multicore Software Research Project. In Proceedings of the 24th ACM SIGPLAN Conference Companion on Object Oriented Programming Systems Languages and Applications. ACM, 735--736. Google Scholar
Digital Library
- Michael A. Bender, Richard Cole, Erik D. Demaine, Martin Farach-Colton, and Jack Zito. 2002. Two Simplified Algorithms for Maintaining Order in a List. In Proceedings of the 10th European Symposium on Algorithms. 152--164. Google Scholar
Digital Library
- Michael A. Bender, Jeremy T. Fineman, Seth Gilbert, and Charles E. Leiserson. 2004. On-the-Fly Maintenance of Series-Parallel Relationships in Fork-Join Multithreaded Programs. In 16th Annual ACM Symposium on Parallel Algorithms and Architectures. 133--144. Google Scholar
Digital Library
- Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. Google Scholar
Digital Library
- Vincent Cavé, Jisheng Zhao, Jun Shirako, and Vivek Sarkar. 2011. Habanero-Java: the new adventures of old X10. In Proceedings of the 9th International Conference on Principles and Practice of Programming in Java. 51--61. Google Scholar
Digital Library
- Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: An Object-Oriented Approach to Non-Uniform Cluster Computing. In 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications. 519--538. Google Scholar
Digital Library
- Jong-Deok Choi, Barton P. Miller, and Robert H. B. Netzer. 1991. Techniques for debugging parallel programs with flowback analysis. ACM Transactions on Programming Languages and Systems 13, 4 (1991), 491--530. Google Scholar
Digital Library
- Charles Consel, Hedi Hamdi, Laurent Réveillère, Lenin Singaravelu, Haiyan Yu, and Calton Pu. 2003. Spidle: a DSL approach to specifying streaming applications. In Proceedings of the 2nd International Conference on Generative Programming and Component Engineering. 1--17. Google Scholar
Digital Library
- John S. Danaher, I-Ting Angelina Lee, and Charles E. Leiserson. 2008. Programming with exceptions in JCilk. Science of Computer Programming 63, 2 (Dec. 2008), 147--171. Google Scholar
Digital Library
- P. Dietz and D. Sleator. 1987. Two Algorithms for Maintaining Order in a List. In Proceedings of the 19th Annual ACM Symposium on Theory of Computing. New York City, 365--372. Google Scholar
Digital Library
- Dimitar Dimitrov, Martin Vechev, and Vivek Sarkar. 2015. Race Detection in Two Dimensions. In Proceedings of the 27th ACM Symposium on Parallelism in Algorithms and Architectures. ACM, Portland, Oregon, USA, 101--110. Google Scholar
Digital Library
- Tayfun Elmas, Shaz Qadeer, and Serdar Tasiran. 2007. Goldilocks: A Race and Transaction-aware Java Runtime. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, San Diego, California, USA, 245--255. Google Scholar
Digital Library
- Perry A. Emrath and Davis A. Padua. 1988. Automatic Detection of Nondeterminacy in Parallel Programs. In Proceedings of the Workshop on Parallel and Distributed Debugging. 89--99. Google Scholar
Digital Library
- Dawson Engler and Ken Ashcraft. 2003. RacerX: Effective, Static Detection of Race Conditions and Deadlocks. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles. ACM, Bolton Landing, NY, USA, 237--252. Google Scholar
Digital Library
- Mingdong Feng and Charles E. Leiserson. 1997. Efficient Detection of Determinacy Races in Cilk Programs. In Proceedings of the Ninth Annual ACM Symposium on Parallel Algorithms and Architectures. 1--11. Google Scholar
Digital Library
- Mingdong Feng and Charles E. Leiserson. 1999. Efficient Detection of Determinacy Races in Cilk Programs. Theory of Computing Systems (1999).Google Scholar
- Jeremy T. Fineman. 2005. Provably Good Race Detection That Runs in Parallel. Master's thesis. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, Cambridge, MA.Google Scholar
- Cormac Flanagan and Stephen N. Freund. 2009. FastTrack: efficient and precise dynamic race detection. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, Dublin, Ireland, 121--133. Google Scholar
Digital Library
- Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. 1998. The Implementation of the Cilk-5 Multithreaded Language. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation. ACM, 212--223. Google Scholar
Digital Library
- Michael I. Gordon, William Thies, and Saman Amarasinghe. 2006. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 151--162. Google Scholar
Digital Library
- Jialu Huang, Arun Raman, Thomas B. Jablin, Yun Zhang, Tzu-Han Hung, and David I. August. 2010. Decoupled Software Pipelining Creates Parallelization Opportunities. In Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, 121--130. Google Scholar
Digital Library
- Intel Corporation. 2011. Intel® Cilk™Plus. Available from https://www.cilkplus.org/. (2011). Accessed: August 2017.Google Scholar
- Intel Corporation. 2013. Piper: Experimental Language Support for Pipeline Parallelism In Intel® Cilk™Plus. Available from https://www.cilkplus.org/piper-experimental-language-support-pipeline-parallelism-intel-cilk-plus. (2013). Accessed: August 2017.Google Scholar
- I-Ting Angelina Lee, Silas Boyd-Wickizer, Zhiyi Huang, and Charles E. Leiserson. 2010. Using Memory Mapping to Support Cactus Stacks in Work-Stealing Runtime Systems. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques. ACM, 411--420. Google Scholar
Digital Library
- I-Ting Angelina Lee, Charles E. Leiserson, Tao B. Schardl, Jim Sukha, and Zhunping Zhang. 2013. On-the-Fly Pipeline Parallelism. In Proceedings of the 25th Annual ACM Symposium on Parallelism in Algorithms and Architectures. 140--151. Google Scholar
Digital Library
- I-Ting Angelina Lee, Charles E. Leiserson, Tao B. Schardl, Jim Sukha, and Zhunping Zhang. 2015. On-the-Fly Pipeline Parallelism. ACM Transactions on Parallel Computing 2, 3, Article 17 (Sept. 2015), 42 pages. Google Scholar
Digital Library
- I-Ting Angelina Lee and Tao B. Schardl. 2015. Efficiently Detecting Races in Cilk Programs That Use Reducer Hyperobjects. In SPAA '15: Proceedings of the 27th ACM on Symposium on Parallelism in Algorithms and Architectures (SPAA '15). ACM, Portland, Oregon, USA, 111--122. Google Scholar
Digital Library
- Charles E. Leiserson. 2010. The Cilk++ Concurrency Platform. Journal of Supercomputing 51, 3 (March 2010), 244--257. Google Scholar
Digital Library
- Steve MacDonald, Duane Szafron, and Jonathan Schaeffer. 2004. Rethinking the Pipeline as Object-Oriented States with Transformations. In 9th International Workshop on High-Level Parallel Programming Models and Supportive Environments at IPDPS. 12--21.Google Scholar
Cross Ref
- William R. Mark, R. Steven Glanville, Kurt Akeley, and Mark J. Kilgard. 2003. Cg: a system for programming graphics hardware in a C-like language. In ACM SIGGRAPH. 896--907. Google Scholar
Digital Library
- Michael McCool, Arch D. Robison, and James Reinders. 2012. Structured Parallel Programming: Patterns for Efficient Computation. Elsevier Science. Google Scholar
Digital Library
- John Mellor-Crummey. 1991. On-the-fly Detection of Data Races for Programs with Nested Fork-Join Parallelism. In Proceedings of Supercomputing'91. 24--33. Google Scholar
Digital Library
- John Mellor-Crummey. 1993. Compile-time Support for Efficient Data Race Detection in Shared-Memory Parallel Programs. In Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging. ACM Press, 129--139. Google Scholar
Digital Library
- Barton P. Miller and Jong-Deok Choi. 1988. A Mechanism for Efficient Debugging of Parallel Programs. In Proceedings of the 1988 ACM SIGPLAN Conference on Programming Language Design and Implementation. 135--144. Google Scholar
Digital Library
- Robert H. B. Netzer and Barton P. Miller. 1989. Detecting Data Races in Parallel Program Executions. In In Advances in Languages and Compilers for Parallel Computing, 1990 Workshop. MIT Press, 109--129.Google Scholar
- Robert H. B. Netzer and Barton P. Miller. 1992. What are Race Conditions? ACM Letters on Programming Languages and Systems 1, 1 (March 1992), 74--88. Google Scholar
Digital Library
- Itzhak Nudler and Larry Rudolph. 1986. Tools for the Efficient Development of Efficient Parallel Programs. In Proceedings of the First Israeli Conference on Computer Systems Engineering.Google Scholar
- Robert O'Callahan and Jong-Deok Choi. 2003. Hybrid Dynamic Data Race Detection. In Proceedings of the Ninth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '03). ACM, New York, NY, USA, 167--178. Google Scholar
Digital Library
- OpenMP Architecture Review Board. 2013. OpenMP Application Program Interface, Version 4.0. Available from http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf. (2013).Google Scholar
- Guilherme Ottoni, Ram Rangan, Adam Stoler, and David I. August. 2005. Automatic Thread Extraction with Decoupled Software Pipelining. In Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 105--118. Google Scholar
Digital Library
- Antoniu Pop and Albert Cohen. 2011. A Stream-computing Extension to OpenMP. In Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers. ACM, 5--14. Google Scholar
Digital Library
- Eli Pozniansky and Assaf Schuster 2007. MultiRace: Efficient On-the-fly Data Race Detection in Multithreaded C++ Programs: Research Articles. Concurrency and Computation: Practice and Experience 19, 3 (March 2007), 327--340. Google Scholar
Digital Library
- Easwaran Raman, Guilherme Ottoni, Arun Raman, Matthew J. Bridges, and David I. August. 2008. Parallel-stage Decoupled Software Pipelining. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, 114--123. Google Scholar
Digital Library
- Raghavan Raman, Jisheng Zhao, Vivek Sarkar, Martin Vechev, and Eran Yahav. 2010. Efficient Data Race Detection for Async-Finish Parallelism. In Runtime Verification, Howard Barringer, Ylies Falcone, Bernd Finkbeiner, Klaus Havelund, Insup Lee, Gordon Pace, Grigore Rosu, Oleg Sokolsky, and Nikolai Tillmann (Eds.). Lecture Notes in Computer Science, Vol. 6418. Springer Berlin / Heidelberg, 368--383. Google Scholar
Digital Library
- Raghavan Raman, Jisheng Zhao, Vivek Sarkar, Martin Vechev, and Eran Yahav. 2012. Scalable and Precise Dynamic Datarace Detection for Structured Parallelism. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation. 531--542. Google Scholar
Digital Library
- Ram Rangan, Neil Vachharajani, Manish Vachharajani, and David I. August. 2004. Decoupled Software Pipelining with the Synchronization Array. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer Society, 177--188. Google Scholar
Digital Library
- Lawrence Rauchwerger, Nancy M. Amato, and David A. Padua. 1995. Run-time Methods for Parallelizing Partially Parallel Loops. In Proceedings of the 9th International Conference on Supercomputing. ACM, 137--146. Google Scholar
Digital Library
- Lawrence Rauchwerger, Nancy M. Amato, and David A. Padua. 1995. A scalable method for run-time loop parallelization. International Journal of Parallel Programming 23, 6 (01 Dec. 1995), 537--576. Google Scholar
Digital Library
- Lawrence Rauchwerger and David A. Padua. 1999. The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Transactions on Parallel and Distributed Systems 10, 2 (Feb. 1999), 160--180. Google Scholar
Digital Library
- Daniel Sanchez, David Lo, Richard M. Yoo, Jeremy Sugerman, and Christos Kozyrakis. 2011. Dynamic Fine-Grain Scheduling of Pipeline Parallelism. In 2011 International Conference on Parallel Architectures and Compilation Techniques. IEEE, 22--32. Google Scholar
Digital Library
- Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, and Thomas Anderson. 1997. Eraser: A Dynamic Race Detector for Multi-Threaded Programs. In Proceedings of the Sixteenth ACM Symposium on Operating Systems Principles. Google Scholar
Digital Library
- Konstantin Serebryany and Timur Iskhodzhanov. 2009. ThreadSanitizer: Data Race Detection in Practice. In Proceedings of the Workshop on Binary Instrumentation and Applications. ACM, 62--71. Google Scholar
Digital Library
- M. Aater Suleman, Moinuddin K. Qureshi, Khubaib, and Yale N. Patt. 2010. Feedback-directed Pipeline Parallelism. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques. ACM, 147--156. Google Scholar
Digital Library
- Rishi Surendran and Vivek Sarkar. 2016. Dynamic Determinacy Race Detection for Task Parallelism with Futures. Springer International Publishing, 368--385.Google Scholar
- Robert Endre Tarjan. 1979. Applications of Path Compression on Balanced Trees. Journal of the Association for Computing Machinery 26, 4 (October 1979), 690--715. Google Scholar
Digital Library
- William Thies, Vikram Chandrasekhar, and Saman Amarasinghe. 2007. A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 356--369. Google Scholar
Digital Library
- Robert Utterback, Kunal Agrawal, Jeremy T. Fineman, and I-Ting Angelina Lee. 2016. Provably Good and Practically Efficient Parallel Race Detection for Fork-Join Programs. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures. 83--94. Google Scholar
Digital Library
- Jacobo Valdes. 1978. Parsing Flowcharts and Series-Parallel Graphs. Ph.D. Dissertation. Stanford University. STAN-CS-78-682. Google Scholar
Digital Library
- Yuan Yu, Tom Rodeheffer, and Wei Chen. 2005. RaceTrack: Efficient Detection of Data Race Conditions via Adaptive Tracking. In Proceedings of the Twentieth ACM Symposium on Operating Systems Principles. ACM, 221--234. Google Scholar
Digital Library
Recommendations
Efficient Parallel Determinacy Race Detection for Structured Futures
SPAA '21: Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and ArchitecturesIn task-parallel code, a determinancy race occurs when two logically parallel instructions access the same memory location in a conflicting way. A determinacy race tends to be a bug as it leads to non-deterministic program behaviors.
Researchers have ...
Parallel determinacy race detection for futures
PPoPP '20: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingThe use of futures can generate arbitrary dependences in the computation, making it difficult to detect races efficiently. Algorithms proposed by prior work to detect races on programs with futures all have to execute the program sequentially. We ...
Provably Good and Practically Efficient Parallel Race Detection for Fork-Join Programs
SPAA '16: Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and ArchitecturesIf a parallel program has determinacy race(s), different schedules can result in memory accesses that observe different values --- various race-detection tools have been designed to find such bugs. A key component of race detectors is an algorithm for ...







Comments