Abstract
We present Armus, a verification tool for dynamically detecting or avoiding barrier deadlocks. The core design of Armus is based on phasers, a generalisation of barriers that supports split-phase synchronisation, dynamic membership, and optional-waits. This allows Armus to handle the key barrier synchronisation patterns found in modern languages and libraries. We implement Armus for X10 and Java, giving the first sound and complete barrier deadlock verification tools in these settings.
Armus introduces a novel event-based graph model of barrier concurrency constraints that distinguishes task-event and event-task dependencies. Decoupling these two kinds of dependencies facilitates the verification of distributed barriers with dynamic membership, a challenging feature of X10. Further, our base graph representation can be dynamically switched between a task-to-task model, Wait-for Graph (WFG), and an event-to-event model, State Graph (SG), to improve the scalability of the analysis.
Formally, we show that the verification is sound and complete with respect to the occurrence of deadlock in our core phaser language, and that switching graph representations preserves the soundness and completeness properties. These results are machine checked with the Coq proof assistant. Practically, we evaluate the runtime overhead of our implementations using three benchmark suites in local and distributed scenarios. Regarding deadlock detection, distributed scenarios show negligible overheads and local scenarios show overheads below 1.15×. Deadlock avoidance is more demanding, and highlights the potential gains from dynamic graph selection. In one benchmark scenario, the runtime overheads vary from 1.8× for dynamic selection, 2.6× for SG-static selection, and 5.9× for WFG-static selection.
- Shivali Agarwal, Rajkishore Barik, Vivek Sarkar, and Rudrapatna K. Shyamasundar. 2007. May-happen-in-parallel analysis of X10 programs. In PPoPP. ACM, 183--193. Google Scholar
Digital Library
- Daniel Atkins, Alex Potanin, and Lindsay Groves. 2013. The design and implementation of clocked variables in X10. In ACSC (CRPIT), Vol. 135. ACS, 87--95. http://crpit.com/abstracts/CRPITV135Atkins.html. Google Scholar
Digital Library
- David A. Bader and Kamesh Madduri. 2005. Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors. In HiPC. Lecture Notes in Computer Science, Vol. 3769. Springer, 465--476. Google Scholar
Digital Library
- Jørgen Bang-Jensen and Gregory Z. Gutin. 2009. Digraphs: Theory, Algorithms and Applications (2nd ed.). Springer. Google Scholar
Digital Library
- Ferenc Belik. 1990. An efficient deadlock avoidance technique. Transactions on Computers 39 (1990), 882--888. Google Scholar
Digital Library
- Gérard Boudol. 2009. A deadlock-free semantics for shared memory concurrency. In ICTAC. Lecture Notes in Computer Science, Vol. 5684. Springer, 140--154. Google Scholar
Digital Library
- Yan Cai and Wing-Kwong Chan. 2014. Magiclock: Scalable detection of potential deadlocks in large-scale multithreaded programs. Transactions on Software Engineering 40, 3 (2014), 266--281. Google Scholar
Digital Library
- Vincent Cavé, Jisheng Zhao, Jun Shirako, and Vivek Sarkar. 2011. Habanero-Java: The new adventures of old X10. In PPPJ. ACM, 51--61. Google Scholar
Digital Library
- Soumen Chakrabarti, Manish Gupta, and Jong-Deok Choi. 1996. Global communication analysis and optimization. ACM SIGPLAN Notices (1996), 68--78. Google Scholar
Digital Library
- Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: An object-oriented approach to non-uniform cluster computing. In OOPSLA. ACM, 519--538. Google Scholar
Digital Library
- Sung-Eun Choi and Lawrence Snyder. 1997. Quantifying the effects of communication optimizations. In ICPP. IEEE, 218--222. Google Scholar
Digital Library
- Edward G. Coffman, Jr., M. J. Elphick, and Arie Shoshani. 1971. System deadlocks. Computing Surveys 3, 2 (1971), 67--78. Google Scholar
Digital Library
- Tiago Cogumbreiro, Raymond Hu, Francisco Martins, and Nobuko Yoshida. 2015. Dynamic deadlock verification for general barrier synchronisation. In PPoPP. ACM, 150--160. Google Scholar
Digital Library
- Tiago Cogumbreiro, Francisco Martins, and Vasco Thudichum Vasconcelos. 2013. Coordinating phased activities while maintaining progress. In COORDINATION, Lecture Notes in Computer Science, Vol. 7890. Springer, 31--44.Google Scholar
- Tiago Cogumbreiro, Jun Shirako, and Vivek Sarkar. 2017. Formalization of Habanero phasers using Coq. Journal of Logical and Algebraic Methods in Programming 90 (2017), 50--60.Google Scholar
Cross Ref
- Tiago Cogumbreiro, Rishi Surendran, Francisco Martins, Vivek Sarkar, Vasco T. Vasconcelos, and Max Grossman. 2017. Deadlock avoidance in parallel programs with futures: Why parallel tasks should not wait for strangers. Proceedings of the ACM on Programming Languages 1, OOPSLA, Article 103 (2017), 26 pages. Google Scholar
Digital Library
- Don Coppersmith and Shmuel Winograd. 1990. Matrix multiplication via arithmetic progressions. Symbolic Computation 9, 3 (1990), 251--280. Google Scholar
Digital Library
- Silvia Crafa, David Cunningham, Vijay Saraswat, Avraham Shinnar, and Olivier Tardieu. 2014. Semantics of (Resilient) X10. In ECOOP, Lecture Notes in Computer Science, Vol. 8586. Springer, 670--696. Google Scholar
Digital Library
- Steve Deitz. 2006. Parallel Programming in Chapel. Retrieved January 2018 from https://www.cct.lsu.edu/∼estrabd/LACSI2006/Programming%20Models/deitz.pdf. Presented at LACSI.Google Scholar
- Camil Demetrescu and Giuseppe F. Italiano. 2005. Trade-offs for fully dynamic transitive closure on DAGs: Breaking through the O(n<sup>2</sup>) barrier. Journal of the ACM 52, 2 (2005), 147--156. Google Scholar
Digital Library
- Jyotirmoy V. Deshmukh, E. Allen Emerson, and Sriram Sankaranarayanan. 2011. Symbolic modular deadlock analysis. Automated Software Engineering 18, 3--4 (2011), 325--362. Google Scholar
Digital Library
- Edsger W. Dijkstra. 1965. Cooperating Sequential Processes. Technical Report. Technical University of Eindhoven. https://www.cs.utexas.edu/users/EWD/transcriptions/EWD01xx/EWD123.html EWD-123. Google Scholar
- Mahdi Eslamimehr and Jens Palsberg. 2014. Sherlock: Scalable deadlock detection for concurrent programs. In FSE. ACM, 353--365. Google Scholar
Digital Library
- Michael A. Frumkin, Matthew Schultz, Haoqiang Jin, and Jerry Yan. 2003. Performance and scalability of the NAS parallel benchmarks in Java. In IPDPS. IEEE. Google Scholar
Digital Library
- Zeinab Ganjei, Ahmed Rezine, Petru Eles, and Zebo Peng. 2017. Safety verification of phaser programs. In FMCAD. IEEE, 68--75. Google Scholar
Digital Library
- Andy Georges, Dries Buytaert, and Lieven Eeckhout. 2007. Statistically rigorous Java performance evaluation. In OOPSLA. ACM, 57--76. Google Scholar
Digital Library
- Prodromos Gerakios, Nikolaos Papaspyrou, Konstantinos Sagonas, and Panagiotis Vekris. 2011. Dynamic deadlock avoidance in systems code using statically inferred effects. In PLOS. ACM, 1--5. Google Scholar
Digital Library
- Milos Gligoric, Peter C. Mehlitz, and Darko Marinov. 2012. X10X: Model checking a new programming language with an “old” model checker. In ICST. IEEE, 11--20. Google Scholar
Digital Library
- Rajiv Gupta. 1989. The fuzzy barrier: A mechanism for high speed synchronization of processors. SIGARCH Computer Architecture News 17, 2 (1989), 54--63. Google Scholar
Digital Library
- Tobias Hilbrich, Bronis R. de Supinski, Fabian Hänsel, Matthias S. Müller, Martin Schulz, and Wolfgang E. Nagel. 2013. Runtime MPI collective checking with tree-based overlay networks. In EuroMPI. ACM, 129--134. Google Scholar
Digital Library
- Tobias Hilbrich, Bronis R. de Supinski, Wolfgang E. Nagel, Joachim Protze, Christel Baier, and Matthias S. Müller. 2013. Distributed wait state tracking for runtime MPI deadlock detection. In SC. ACM, 1--12. Google Scholar
Digital Library
- Tobias Hilbrich, Bronis R. de Supinski, Martin Schulz, and Matthias S. Müller. 2009. A graph based approach for MPI deadlock detection. In ICS. ACM, 296--305. Google Scholar
Digital Library
- Tobias Hilbrich, Matthias S. Müller, Martin Schulz, and Bronis R. de Supinski. 2011. Order preserving event aggregation in TBONs. In EuroMPI, Lecture Notes in Computer Science, Vol. 6960. Springer, 19--28. Google Scholar
Digital Library
- Tobias Hilbrich, Joachim Protze, Martin Schulz, Bronis R. de Supinski, and Matthias S. Müller. 2012. MPI runtime error detection with MUST: Advances in deadlock detection. In SC. IEEE, 1--11. Google Scholar
Digital Library
- Richard C. Holt. 1972. Some deadlock properties of computer systems. Computing Surveys 4, 3 (1972), 179--196. Google Scholar
Digital Library
- Shams Mahmood Imam and Vivek Sarkar. 2014. Cooperative scheduling of parallel tasks with general synchronization patterns. In ECOOP, Lecture Notes in Computer Science, Vol. 8586. Springer, 618--643.Google Scholar
Cross Ref
- Kamal Jain, MohammadTaghi Hajiaghayi, and Kunal Talwar. 2005. The generalized deadlock resolution problem. In ICALP, Lecture Notes in Computer Science, Vol. 3580. Springer, 853--865. Google Scholar
Digital Library
- Inbum Jung, Jongwoong Hyun, Joonwon Lee, and Joongsoo Ma. 2001. Two-phase barrier: A synchronization primitive for improving the processor utilization. International Journal of Parallel Programming 29, 6 (2001), 607--627. Google Scholar
Digital Library
- Amir Kamil and Katherine Yelick. 2009. Enforcing textual alignment of collectives using dynamic checks. In LCPC. Lecture Notes in Computer Science, Vol. 5898. Springer, 368--382. Google Scholar
Digital Library
- Edgar Knapp. 1987. Deadlock detection in distributed databases. Computing Survey 19, 4 (1987), 303--328. Google Scholar
Digital Library
- Leslie Lamport. 1978. Time, clocks, and the ordering of events in a distributed system. Commuications of the ACM 21, 7 (1978), 558--565. Google Scholar
Digital Library
- Duy-Khanh Le, Wei-Ngan Chin, and Yong-Meng Teo. 2013. Verification of static and dynamic barrier synchronization using bounded permissions. In ICFEM, Lecture Notes in Computer Science, Vol. 8144. Springer, 231--248.Google Scholar
Cross Ref
- Jonathan K. Lee and Jens Palsberg. 2010. Featherweight X10: A core calculus for async-finish parallelism. In PPoPP. ACM, 25--36. Google Scholar
Digital Library
- Daan Leijen, Wolfram Schulte, and Sebastian Burckhardt. 2009. The design of a task parallel library. In OOPSLA. ACM, 227--242. Google Scholar
Digital Library
- Peng Li, Kunal Agrawal, Jeremy Buhler, and Roger D. Chamberlain. 2010. Deadlock avoidance for streaming computations with filtering. In SPAA. ACM, 243--252. Google Scholar
Digital Library
- Piotr R. Luszczek, David H. Bailey, Jack J. Dongarra, Jeremy Kepner, Robert F. Lucas, Rolf Rabenseifner, and Daisuke Takahashi. 2006. The HPC challenge (HPCC) benchmark suite. In SC. ACM. Google Scholar
Digital Library
- Toshimi Minoura. 1982. Deadlock avoidance revisited. Journal of the ACM 29, 4 (1982), 1023--1048. Google Scholar
Digital Library
- Ian Munro. 1971. Efficient determination of the transitive closure of a directed graph. Information Processing Letters 1, 2 (1971), 56--58. Google Scholar
Digital Library
- Karthik Murthy, Sri Raj Paul, Kuldeep S. Meel, Tiago Cogumbreiro, and John M. Mellor-Crummey. 2016. Design and verification of distributed phasers. In EuroPAR. Lecture Notes in Computer Science, Vol. 9833. Springer, 405--418. Google Scholar
Digital Library
- Armand Navabi, Xiangyu Zhang, and Suresh Jagannathan. 2008. Quasi-static scheduling for safe futures. In PPoPP. ACM, 23--32. Google Scholar
Digital Library
- Yarden Nir-Buchbinder, Rachel Tzoref, and Shmuel Ur. 2008. Deadlocks: From exhibiting to healing. Lecture Notes in Computer Science, Vol. 5289. Springer, 104--118.Google Scholar
- Yusuke Nonaka, Kazuo Ushijima, Hibiki Serizawa, Shigeru Murata, and Jingde Cheng. 2001. A run-time deadlock detector for concurrent Java programs. In APSEC. IEEE, 45--52. Google Scholar
Digital Library
- Matthew T. O’Keefe and Henry G. Dietz. 1990. Hardware barrier synchronization: Dynamic barrier MIMD (DBM). In ICPP. Pennsylvania State University, 43--46.Google Scholar
- Antoniu Pop and Albert Cohen. 2013. OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs. Transactions on Architecture and Code Optimization 9, 4 (2013), Article 53, 25 pages. Google Scholar
Digital Library
- Hari K. Pyla and Srinidhi Varadarajan. 2010. Avoiding deadlock avoidance. In PACT. ACM, 75--86. Google Scholar
Digital Library
- Spiridon A. Reveliotis, Mark A. Lawley, and Placid M. Ferreira. 1997. Polynomial-complexity deadlock avoidance policies for sequential resource allocation systems. Transactions on Automatic Control 42, 10 (1997), 1344--1357.Google Scholar
Cross Ref
- Indranil Roy, Glenn R. Luecke, James Coyle, and Marina Kraeva. 2013. A scalable deadlock detection algorithm for UPC collective operations. In PGAS. University of Edinburgh, 2--15. http://www.pgas2013.org.uk/sites/default/files/pgas2013proceedings.pdf.Google Scholar
- Malavika Samak and Murali Krishna Ramanathan. 2014. Trace driven dynamic deadlock detection and reproduction. In PPoPP. ACM, 29--42. Google Scholar
Digital Library
- Vijay Saraswat and Radha Jagadeesan. 2005. Concurrent clustered programming. In CONCUR. Lecture Notes in Computer Science, Vol. 3653. Springer, 353--367. Google Scholar
Digital Library
- Rahul Sharma, Michael Bauer, and Alex Aiken. 2015. Verification of producer-consumer synchronization in GPU programs. In PLDI. ACM, 88--98. Google Scholar
Digital Library
- Chia Shih and John A. Stankovic. 1990. Survey of Deadlock Detection in Distributed Concurrent Programming Environments and Its Application to Real-Time Systems. Technical Report. University of Massachusetts. https://web.cs.umass.edu/publication/details.php?id=447 UM-CS-1990-069. Google Scholar
Digital Library
- Jun Shirako, David M. Peixotto, Vivek Sarkar, and William N. Scherer. 2008. Phasers: A unified deadlock-free construct for collective and point-to-point synchronization. In ICS. ACM, 277--288. Google Scholar
Digital Library
- Jun Shirako, David M. Peixotto, Vivek Sarkar, and William N. Scherer. 2009. Phaser accumulators: A new reduction construct for dynamic parallelism. In IPDPS. IEEE, 1--12. Google Scholar
Digital Library
- Jun Shirako, David M. Peixotto, Dragoş-Dumitru Sbîrlea, and Vivek Sarkar. 2011. Phaser beams: Integrating stream parallelism with task parallelism. Presented at the X10 Workshop.Google Scholar
- Lorna A. Smith, J. Mark Bull, and Jan Obdrzálek. 2001. A parallel Java Grande benchmark suite. In SC. ACM, 10. Google Scholar
Digital Library
- Robert Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM Journal on Computing 1, 2 (1972), 146--160.Google Scholar
Digital Library
- Franklyn Turbak. 1996. First-class synchronization barriers. In ICFP. ACM, 157--168. Google Scholar
Digital Library
- Nalini Vasudevan, Olivier Tardieu, Julian Dolby, and Stephen A. Edwards. 2009. Compile-time analysis and specialization of clocks in concurrent programs. In CC. Lecture Notes in Computer Science, Vol. 5501. Springer, 48--62. Google Scholar
Digital Library
- Anh Vo. 2011. Scalable Formal Dynamic Verification of MPI Programs Through Distributed Causality Tracking. Ph.D. dissertation. University of Utah. Advisor(s) Gopalakrishnan, Ganesh. AAI3454168. Google Scholar
Digital Library
- Yin Wang, Terence Kelly, Manjunath Kudlur, Stéphane Lafortune, and Scott Mahlke. 2008. Gadara: Dynamic deadlock avoidance for multithreaded programs. In OSDI. USENIX, 281--294. https://www.usenix.org/conference/osdi-08/gadara-dynamic-deadlock-avoidance-multithreaded-programs. Google Scholar
Digital Library
- Haitao Wei, Hong Tan, Xiaoxian Liu, and Junqing Yu. 2012. StreamX10: A stream programming framework on X10. In X10. ACM, 1--6. Google Scholar
Digital Library
- Adam Welc, Suresh Jagannathan, and Antony Hosking. 2005. Safe futures for Java. In OOPSLA. ACM, 439--453. Google Scholar
Digital Library
- Yuan Zhang, Evelyn Duesterwald, and Guang R. Gao. 2008. Concurrency analysis for shared memory programs with textually unaligned barriers. In LCPC. Lecture Notes in Computer Science, Vol. 5234. Springer, 95--109. Google Scholar
Digital Library
- Yingchun Zhu and Laurie J. Hendren. 1998. Communication optimizations for parallel C programs. In PLDI. ACM, 199--211. Google Scholar
Digital Library
Index Terms
Dynamic Deadlock Verification for General Barrier Synchronisation
Recommendations
Dynamic deadlock verification for general barrier synchronisation
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingWe present Armus, a dynamic verification tool for deadlock detection and avoidance specialised in barrier synchronisation. Barriers are used to coordinate the execution of groups of tasks, and serve as a building block of parallel computing. Our tool ...
Dynamic deadlock verification for general barrier synchronisation
PPoPP '15We present Armus, a dynamic verification tool for deadlock detection and avoidance specialised in barrier synchronisation. Barriers are used to coordinate the execution of groups of tasks, and serve as a building block of parallel computing. Our tool ...
Scrider: Using Single Critical Sections to Avoid Deadlocks
IMCCC '14: Proceedings of the 2014 Fourth International Conference on Instrumentation and Measurement, Computer, Communication and ControlWe propose a method, called Scrider, to avoid dead-locks before they manifest themselves. When running it togeth-er with a multithreaded program, Scrider interferes in thread scheduling to ensure that at any time there is only one thread in the critical ...






Comments