skip to main content
research-article
Open Access

Dynamic Deadlock Verification for General Barrier Synchronisation

Published:11 December 2018Publication History
Skip Abstract Section

Abstract

We present Armus, a verification tool for dynamically detecting or avoiding barrier deadlocks. The core design of Armus is based on phasers, a generalisation of barriers that supports split-phase synchronisation, dynamic membership, and optional-waits. This allows Armus to handle the key barrier synchronisation patterns found in modern languages and libraries. We implement Armus for X10 and Java, giving the first sound and complete barrier deadlock verification tools in these settings.

Armus introduces a novel event-based graph model of barrier concurrency constraints that distinguishes task-event and event-task dependencies. Decoupling these two kinds of dependencies facilitates the verification of distributed barriers with dynamic membership, a challenging feature of X10. Further, our base graph representation can be dynamically switched between a task-to-task model, Wait-for Graph (WFG), and an event-to-event model, State Graph (SG), to improve the scalability of the analysis.

Formally, we show that the verification is sound and complete with respect to the occurrence of deadlock in our core phaser language, and that switching graph representations preserves the soundness and completeness properties. These results are machine checked with the Coq proof assistant. Practically, we evaluate the runtime overhead of our implementations using three benchmark suites in local and distributed scenarios. Regarding deadlock detection, distributed scenarios show negligible overheads and local scenarios show overheads below 1.15×. Deadlock avoidance is more demanding, and highlights the potential gains from dynamic graph selection. In one benchmark scenario, the runtime overheads vary from 1.8× for dynamic selection, 2.6× for SG-static selection, and 5.9× for WFG-static selection.

References

  1. Shivali Agarwal, Rajkishore Barik, Vivek Sarkar, and Rudrapatna K. Shyamasundar. 2007. May-happen-in-parallel analysis of X10 programs. In PPoPP. ACM, 183--193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Daniel Atkins, Alex Potanin, and Lindsay Groves. 2013. The design and implementation of clocked variables in X10. In ACSC (CRPIT), Vol. 135. ACS, 87--95. http://crpit.com/abstracts/CRPITV135Atkins.html. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. David A. Bader and Kamesh Madduri. 2005. Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors. In HiPC. Lecture Notes in Computer Science, Vol. 3769. Springer, 465--476. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jørgen Bang-Jensen and Gregory Z. Gutin. 2009. Digraphs: Theory, Algorithms and Applications (2nd ed.). Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ferenc Belik. 1990. An efficient deadlock avoidance technique. Transactions on Computers 39 (1990), 882--888. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Gérard Boudol. 2009. A deadlock-free semantics for shared memory concurrency. In ICTAC. Lecture Notes in Computer Science, Vol. 5684. Springer, 140--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Yan Cai and Wing-Kwong Chan. 2014. Magiclock: Scalable detection of potential deadlocks in large-scale multithreaded programs. Transactions on Software Engineering 40, 3 (2014), 266--281. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Vincent Cavé, Jisheng Zhao, Jun Shirako, and Vivek Sarkar. 2011. Habanero-Java: The new adventures of old X10. In PPPJ. ACM, 51--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Soumen Chakrabarti, Manish Gupta, and Jong-Deok Choi. 1996. Global communication analysis and optimization. ACM SIGPLAN Notices (1996), 68--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: An object-oriented approach to non-uniform cluster computing. In OOPSLA. ACM, 519--538. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Sung-Eun Choi and Lawrence Snyder. 1997. Quantifying the effects of communication optimizations. In ICPP. IEEE, 218--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Edward G. Coffman, Jr., M. J. Elphick, and Arie Shoshani. 1971. System deadlocks. Computing Surveys 3, 2 (1971), 67--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Tiago Cogumbreiro, Raymond Hu, Francisco Martins, and Nobuko Yoshida. 2015. Dynamic deadlock verification for general barrier synchronisation. In PPoPP. ACM, 150--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Tiago Cogumbreiro, Francisco Martins, and Vasco Thudichum Vasconcelos. 2013. Coordinating phased activities while maintaining progress. In COORDINATION, Lecture Notes in Computer Science, Vol. 7890. Springer, 31--44.Google ScholarGoogle Scholar
  15. Tiago Cogumbreiro, Jun Shirako, and Vivek Sarkar. 2017. Formalization of Habanero phasers using Coq. Journal of Logical and Algebraic Methods in Programming 90 (2017), 50--60.Google ScholarGoogle ScholarCross RefCross Ref
  16. Tiago Cogumbreiro, Rishi Surendran, Francisco Martins, Vivek Sarkar, Vasco T. Vasconcelos, and Max Grossman. 2017. Deadlock avoidance in parallel programs with futures: Why parallel tasks should not wait for strangers. Proceedings of the ACM on Programming Languages 1, OOPSLA, Article 103 (2017), 26 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Don Coppersmith and Shmuel Winograd. 1990. Matrix multiplication via arithmetic progressions. Symbolic Computation 9, 3 (1990), 251--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Silvia Crafa, David Cunningham, Vijay Saraswat, Avraham Shinnar, and Olivier Tardieu. 2014. Semantics of (Resilient) X10. In ECOOP, Lecture Notes in Computer Science, Vol. 8586. Springer, 670--696. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Steve Deitz. 2006. Parallel Programming in Chapel. Retrieved January 2018 from https://www.cct.lsu.edu/∼estrabd/LACSI2006/Programming%20Models/deitz.pdf. Presented at LACSI.Google ScholarGoogle Scholar
  20. Camil Demetrescu and Giuseppe F. Italiano. 2005. Trade-offs for fully dynamic transitive closure on DAGs: Breaking through the O(n<sup>2</sup>) barrier. Journal of the ACM 52, 2 (2005), 147--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jyotirmoy V. Deshmukh, E. Allen Emerson, and Sriram Sankaranarayanan. 2011. Symbolic modular deadlock analysis. Automated Software Engineering 18, 3--4 (2011), 325--362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Edsger W. Dijkstra. 1965. Cooperating Sequential Processes. Technical Report. Technical University of Eindhoven. https://www.cs.utexas.edu/users/EWD/transcriptions/EWD01xx/EWD123.html EWD-123. Google ScholarGoogle Scholar
  23. Mahdi Eslamimehr and Jens Palsberg. 2014. Sherlock: Scalable deadlock detection for concurrent programs. In FSE. ACM, 353--365. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Michael A. Frumkin, Matthew Schultz, Haoqiang Jin, and Jerry Yan. 2003. Performance and scalability of the NAS parallel benchmarks in Java. In IPDPS. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Zeinab Ganjei, Ahmed Rezine, Petru Eles, and Zebo Peng. 2017. Safety verification of phaser programs. In FMCAD. IEEE, 68--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Andy Georges, Dries Buytaert, and Lieven Eeckhout. 2007. Statistically rigorous Java performance evaluation. In OOPSLA. ACM, 57--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Prodromos Gerakios, Nikolaos Papaspyrou, Konstantinos Sagonas, and Panagiotis Vekris. 2011. Dynamic deadlock avoidance in systems code using statically inferred effects. In PLOS. ACM, 1--5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Milos Gligoric, Peter C. Mehlitz, and Darko Marinov. 2012. X10X: Model checking a new programming language with an “old” model checker. In ICST. IEEE, 11--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Rajiv Gupta. 1989. The fuzzy barrier: A mechanism for high speed synchronization of processors. SIGARCH Computer Architecture News 17, 2 (1989), 54--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Tobias Hilbrich, Bronis R. de Supinski, Fabian Hänsel, Matthias S. Müller, Martin Schulz, and Wolfgang E. Nagel. 2013. Runtime MPI collective checking with tree-based overlay networks. In EuroMPI. ACM, 129--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Tobias Hilbrich, Bronis R. de Supinski, Wolfgang E. Nagel, Joachim Protze, Christel Baier, and Matthias S. Müller. 2013. Distributed wait state tracking for runtime MPI deadlock detection. In SC. ACM, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Tobias Hilbrich, Bronis R. de Supinski, Martin Schulz, and Matthias S. Müller. 2009. A graph based approach for MPI deadlock detection. In ICS. ACM, 296--305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Tobias Hilbrich, Matthias S. Müller, Martin Schulz, and Bronis R. de Supinski. 2011. Order preserving event aggregation in TBONs. In EuroMPI, Lecture Notes in Computer Science, Vol. 6960. Springer, 19--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Tobias Hilbrich, Joachim Protze, Martin Schulz, Bronis R. de Supinski, and Matthias S. Müller. 2012. MPI runtime error detection with MUST: Advances in deadlock detection. In SC. IEEE, 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Richard C. Holt. 1972. Some deadlock properties of computer systems. Computing Surveys 4, 3 (1972), 179--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Shams Mahmood Imam and Vivek Sarkar. 2014. Cooperative scheduling of parallel tasks with general synchronization patterns. In ECOOP, Lecture Notes in Computer Science, Vol. 8586. Springer, 618--643.Google ScholarGoogle ScholarCross RefCross Ref
  37. Kamal Jain, MohammadTaghi Hajiaghayi, and Kunal Talwar. 2005. The generalized deadlock resolution problem. In ICALP, Lecture Notes in Computer Science, Vol. 3580. Springer, 853--865. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Inbum Jung, Jongwoong Hyun, Joonwon Lee, and Joongsoo Ma. 2001. Two-phase barrier: A synchronization primitive for improving the processor utilization. International Journal of Parallel Programming 29, 6 (2001), 607--627. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Amir Kamil and Katherine Yelick. 2009. Enforcing textual alignment of collectives using dynamic checks. In LCPC. Lecture Notes in Computer Science, Vol. 5898. Springer, 368--382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Edgar Knapp. 1987. Deadlock detection in distributed databases. Computing Survey 19, 4 (1987), 303--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Leslie Lamport. 1978. Time, clocks, and the ordering of events in a distributed system. Commuications of the ACM 21, 7 (1978), 558--565. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Duy-Khanh Le, Wei-Ngan Chin, and Yong-Meng Teo. 2013. Verification of static and dynamic barrier synchronization using bounded permissions. In ICFEM, Lecture Notes in Computer Science, Vol. 8144. Springer, 231--248.Google ScholarGoogle ScholarCross RefCross Ref
  43. Jonathan K. Lee and Jens Palsberg. 2010. Featherweight X10: A core calculus for async-finish parallelism. In PPoPP. ACM, 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Daan Leijen, Wolfram Schulte, and Sebastian Burckhardt. 2009. The design of a task parallel library. In OOPSLA. ACM, 227--242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Peng Li, Kunal Agrawal, Jeremy Buhler, and Roger D. Chamberlain. 2010. Deadlock avoidance for streaming computations with filtering. In SPAA. ACM, 243--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Piotr R. Luszczek, David H. Bailey, Jack J. Dongarra, Jeremy Kepner, Robert F. Lucas, Rolf Rabenseifner, and Daisuke Takahashi. 2006. The HPC challenge (HPCC) benchmark suite. In SC. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Toshimi Minoura. 1982. Deadlock avoidance revisited. Journal of the ACM 29, 4 (1982), 1023--1048. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Ian Munro. 1971. Efficient determination of the transitive closure of a directed graph. Information Processing Letters 1, 2 (1971), 56--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Karthik Murthy, Sri Raj Paul, Kuldeep S. Meel, Tiago Cogumbreiro, and John M. Mellor-Crummey. 2016. Design and verification of distributed phasers. In EuroPAR. Lecture Notes in Computer Science, Vol. 9833. Springer, 405--418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Armand Navabi, Xiangyu Zhang, and Suresh Jagannathan. 2008. Quasi-static scheduling for safe futures. In PPoPP. ACM, 23--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Yarden Nir-Buchbinder, Rachel Tzoref, and Shmuel Ur. 2008. Deadlocks: From exhibiting to healing. Lecture Notes in Computer Science, Vol. 5289. Springer, 104--118.Google ScholarGoogle Scholar
  52. Yusuke Nonaka, Kazuo Ushijima, Hibiki Serizawa, Shigeru Murata, and Jingde Cheng. 2001. A run-time deadlock detector for concurrent Java programs. In APSEC. IEEE, 45--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Matthew T. O’Keefe and Henry G. Dietz. 1990. Hardware barrier synchronization: Dynamic barrier MIMD (DBM). In ICPP. Pennsylvania State University, 43--46.Google ScholarGoogle Scholar
  54. Antoniu Pop and Albert Cohen. 2013. OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs. Transactions on Architecture and Code Optimization 9, 4 (2013), Article 53, 25 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Hari K. Pyla and Srinidhi Varadarajan. 2010. Avoiding deadlock avoidance. In PACT. ACM, 75--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Spiridon A. Reveliotis, Mark A. Lawley, and Placid M. Ferreira. 1997. Polynomial-complexity deadlock avoidance policies for sequential resource allocation systems. Transactions on Automatic Control 42, 10 (1997), 1344--1357.Google ScholarGoogle ScholarCross RefCross Ref
  57. Indranil Roy, Glenn R. Luecke, James Coyle, and Marina Kraeva. 2013. A scalable deadlock detection algorithm for UPC collective operations. In PGAS. University of Edinburgh, 2--15. http://www.pgas2013.org.uk/sites/default/files/pgas2013proceedings.pdf.Google ScholarGoogle Scholar
  58. Malavika Samak and Murali Krishna Ramanathan. 2014. Trace driven dynamic deadlock detection and reproduction. In PPoPP. ACM, 29--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Vijay Saraswat and Radha Jagadeesan. 2005. Concurrent clustered programming. In CONCUR. Lecture Notes in Computer Science, Vol. 3653. Springer, 353--367. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Rahul Sharma, Michael Bauer, and Alex Aiken. 2015. Verification of producer-consumer synchronization in GPU programs. In PLDI. ACM, 88--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Chia Shih and John A. Stankovic. 1990. Survey of Deadlock Detection in Distributed Concurrent Programming Environments and Its Application to Real-Time Systems. Technical Report. University of Massachusetts. https://web.cs.umass.edu/publication/details.php?id&equals;447 UM-CS-1990-069. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Jun Shirako, David M. Peixotto, Vivek Sarkar, and William N. Scherer. 2008. Phasers: A unified deadlock-free construct for collective and point-to-point synchronization. In ICS. ACM, 277--288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Jun Shirako, David M. Peixotto, Vivek Sarkar, and William N. Scherer. 2009. Phaser accumulators: A new reduction construct for dynamic parallelism. In IPDPS. IEEE, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Jun Shirako, David M. Peixotto, Dragoş-Dumitru Sbîrlea, and Vivek Sarkar. 2011. Phaser beams: Integrating stream parallelism with task parallelism. Presented at the X10 Workshop.Google ScholarGoogle Scholar
  65. Lorna A. Smith, J. Mark Bull, and Jan Obdrzálek. 2001. A parallel Java Grande benchmark suite. In SC. ACM, 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Robert Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM Journal on Computing 1, 2 (1972), 146--160.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Franklyn Turbak. 1996. First-class synchronization barriers. In ICFP. ACM, 157--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Nalini Vasudevan, Olivier Tardieu, Julian Dolby, and Stephen A. Edwards. 2009. Compile-time analysis and specialization of clocks in concurrent programs. In CC. Lecture Notes in Computer Science, Vol. 5501. Springer, 48--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Anh Vo. 2011. Scalable Formal Dynamic Verification of MPI Programs Through Distributed Causality Tracking. Ph.D. dissertation. University of Utah. Advisor(s) Gopalakrishnan, Ganesh. AAI3454168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Yin Wang, Terence Kelly, Manjunath Kudlur, Stéphane Lafortune, and Scott Mahlke. 2008. Gadara: Dynamic deadlock avoidance for multithreaded programs. In OSDI. USENIX, 281--294. https://www.usenix.org/conference/osdi-08/gadara-dynamic-deadlock-avoidance-multithreaded-programs. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Haitao Wei, Hong Tan, Xiaoxian Liu, and Junqing Yu. 2012. StreamX10: A stream programming framework on X10. In X10. ACM, 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Adam Welc, Suresh Jagannathan, and Antony Hosking. 2005. Safe futures for Java. In OOPSLA. ACM, 439--453. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Yuan Zhang, Evelyn Duesterwald, and Guang R. Gao. 2008. Concurrency analysis for shared memory programs with textually unaligned barriers. In LCPC. Lecture Notes in Computer Science, Vol. 5234. Springer, 95--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Yingchun Zhu and Laurie J. Hendren. 1998. Communication optimizations for parallel C programs. In PLDI. ACM, 199--211. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Dynamic Deadlock Verification for General Barrier Synchronisation

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Programming Languages and Systems
            ACM Transactions on Programming Languages and Systems  Volume 41, Issue 1
            March 2019
            235 pages
            ISSN:0164-0925
            EISSN:1558-4593
            DOI:10.1145/3299867
            Issue’s Table of Contents

            Copyright © 2018 Owner/Author

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 11 December 2018
            • Accepted: 1 May 2018
            • Revised: 1 January 2018
            • Received: 1 March 2017
            Published in toplas Volume 41, Issue 1

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!