Abstract
We present Armus, a dynamic verification tool for deadlock detection and avoidance specialised in barrier synchronisation. Barriers are used to coordinate the execution of groups of tasks, and serve as a building block of parallel computing. Our tool verifies more barrier synchronisation patterns than current state-of-the-art. To improve the scalability of verification, we introduce a novel event-based representation of concurrency constraints, and a graph-based technique for deadlock analysis. The implementation is distributed and fault-tolerant, and can verify X10 and Java programs. To formalise the notion of barrier deadlock, we introduce a core language expressive enough to represent the three most widespread barrier synchronisation patterns: group, split-phase, and dynamic membership. We propose a graph analysis technique that selects from two alternative graph representations: the Wait-For Graph, that favours programs with more tasks than barriers; and the State Graph, optimised for programs with more barriers than tasks. We prove that finding a deadlock in either representation is equivalent, and that the verification algorithm is sound and complete with respect to the notion of deadlock in our core language. Armus is evaluated with three benchmark suites in local and distributed scenarios. The benchmarks show that graph analysis with automatic graph-representation selection can record a 7-fold execution increase versus the traditional fixed graph representation. The performance measurements for distributed deadlock detection between 64 processes show negligible overheads.
- S. Agarwal, R. Barik, V. Sarkar, and R. K. Shyamasundar. Mayhappen-in-parallel analysis of X10 programs. In PPoPP’10, pages 183–193. ACM, 2007. Google Scholar
Digital Library
- S. Agarwal, S. Joshi, and R. K. Shyamasundar. Distributed generalized dynamic barrier synchronization. In ICDCN’11, pages 143–154. Springer, 2011. Google Scholar
Digital Library
- Armus homepage. bitbucket.org/cogumbreiro/armus/wiki/ PPoPP15.Google Scholar
- D. Atkins, A. Potanin, and L. Groves. The design and implementation of clocked variables in X10. In ACSC’13, pages 87–95. Australian Computer Society, 2013. Google Scholar
Digital Library
- D. A. Bader and K. Madduri. Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors. In HiPC’05, volume 3769 of LNCS, pages 465–476. Springer, 2005. Google Scholar
Digital Library
- J. Bang-Jensen and G. Z. Gutin. Digraphs: Theory, Algorithms and Applications. Springer, 2nd edition, 2009. Google Scholar
Cross Ref
- V. Cavé, J. Zhao, J. Shirako, and V. Sarkar. Habanero-Java: the new adventures of old X10. In PPPJ’11, pages 51–61. ACM, 2011. Google Scholar
Digital Library
- T. Cogumbreiro, R. Hu, F. Martins, and N. Yoshida. Dynamic deadlock verification for general barrier synchronisation. Technical Report DTR14-12, Imperial College London, 2014.Google Scholar
- T. Cogumbreiro, F. Martins, and V. T. Vasconcelos. Coordinating phased activities while maintaining progress. In COORDINATION’13, volume 7890 of LNCS, pages 31–44. Springer, 2013.Google Scholar
- D. Cunningham, D. Grove, B. Herta, A. Iyengar, K. Kawachiya, H. Murata, V. Saraswat, M. Takeuchi, and O. Tardieu. Resilient X10: Efficient failure-aware programming. In PPoPP’14, pages 67–80. ACM, 2014. Google Scholar
Digital Library
- M. A. Frumkin, M. Schultz, H. Jin, and J. Yan. Performance and scalability of the NAS Parallel Benchmarks in Java. In IPDPS’03. IEEE, 2003. Google Scholar
Digital Library
- A. Georges, D. Buytaert, and L. Eeckhout. Statistically rigorous Java performance evaluation. In OOPSLA’07, pages 57–76. ACM, 2007. Google Scholar
Digital Library
- M. Gligoric, P. C. Mehlitz, and D. Marinov. X10X: Model checking a new programming language with an "old" model checker. In ICST’12, pages 11–20. IEEE, 2012. Google Scholar
Digital Library
- T. Hilbrich, B. R. de Supinski, M. Schulz, and M. S. Müller. A graph based approach for MPI deadlock detection. In ICS’09, pages 296– 305. ACM, 2009. Google Scholar
Digital Library
- T. Hilbrich, J. Protze, M. Schulz, B. R. de Supinski, and M. S. Müller. MPI runtime error detection with MUST: advances in deadlock detection. In SC’12, pages 1–11. IEEE, 2012. Google Scholar
Digital Library
- R. C. Holt. Some deadlock properties of computer systems. ACM Computing Surveys, 4(3):179–196, Sept. 1972. Google Scholar
Digital Library
- Java 7 Phaser API. docs.oracle.com/javase/7/docs/api/ java/util/concurrent/Phaser.html.Google Scholar
- JGraphT homepage. jgrapht.org.Google Scholar
- E. G. C. Jr., M. J. Elphick, and A. Shoshani. System deadlocks. ACM Computing Surveys, 3(2):67–78, 1971. Google Scholar
Digital Library
- E. Knapp. Deadlock detection in distributed databases. ACM Computing Survey, 19(4):303–328, 1987. Google Scholar
Digital Library
- A. D. Kshemkalyani and M. Singhal. Correct two-phase and onephase deadlock detection algorithms for distributed systems. In SPDP’90, pages 126–129, 1990. Google Scholar
Digital Library
- L. Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558–565, 1978. Google Scholar
Digital Library
- D.-K. Le, W.-N. Chin, and Y.-M. Teo. Verification of static and dynamic barrier synchronization using bounded permissions. In ICFEM’13, volume 8144 of LNCS, pages 231–248. Springer, 2013.Google Scholar
- J. K. Lee and J. Palsberg. Featherweight X10: a core calculus for async-finish parallelism. In PPoPP’10, pages 25–36. ACM, 2010. Google Scholar
Digital Library
- D. Leijen, W. Schulte, and S. Burckhardt. The design of a task parallel library. In OOPSLA’09, pages 227–242. ACM, 2009. Google Scholar
Digital Library
- P. R. Luszczek, D. H. Bailey, J. J. Dongarra, J. Kepner, R. F. Lucas, R. Rabenseifner, and D. Takahashi. The HPC Challenge (HPCC) benchmark suite. In SC’06. ACM, 2006. Google Scholar
Digital Library
- S. Marr, S. Verhaegen, B. D. Fraine, T. D’Hondt, and W. D. Meuter. Insertion tree phasers: Efficient and scalable barrier synchronization for fine-grained parallelism. In HPCC’10, pages 130–137. IEEE, 2010. Google Scholar
Digital Library
- Message Passing Interface (MPI) homepage. mpi-forum.org.Google Scholar
- M. T. O’Keefe and H. G. Dietz. Hardware barrier synchronization: Dynamic barrier MIMD (DBM). In ICPP’90, pages 43–46. Pennsylvania State University, 1990.Google Scholar
- OpenMP homepage. openmp.org.Google Scholar
- Redis homepage. redis.io.Google Scholar
- I. Roy, G. R. Luecke, J. Coyle, and M. Kraeva. A scalable deadlock detection algorithm for UPC collective operations. In PGAS’13, pages 2–15. The University of Edinburgh, 2013.Google Scholar
- V. Saraswat and R. Jagadeesan. Concurrent clustered programming. In CONCUR’05, volume 3653 of LNCS, pages 353–367. Springer, 2005. Google Scholar
Digital Library
- J. Shirako, D. M. Peixotto, V. Sarkar, and W. N. Scherer. Phasers: a unified deadlock-free construct for collective and point-to-point synchronization. In ICS’08, pages 277–288. ACM, 2008. Google Scholar
Digital Library
- J. Shirako, D. M. Peixotto, V. Sarkar, and W. N. Scherer. Phaser accumulators: A new reduction construct for dynamic parallelism. In IPDPS’09, pages 1–12. IEEE, 2009. Google Scholar
Digital Library
- J. Shirako, D. M. Peixotto, D.-D. Sbîrlea, and V. Sarkar. Phaser beams: Integrating stream parallelism with task parallelism. Presented at the X10’11, 2011.Google Scholar
- J. Shirako and V. Sarkar. Hierarchical phasers for scalable synchronization and reductions in dynamic parallelism. In IPDPS’10, pages 1–12. IEEE, 2010.Google Scholar
- J. Shirako, K. Sharma, and V. Sarkar. Unifying barrier and pointto-point synchronization in OpenMP with Phasers. In IWOMP’11, volume 6665 of LNCS, pages 122–137. Springer, 2011. Google Scholar
Digital Library
- L. A. Smith, J. M. Bull, and J. Obdrzálek. A parallel Java Grande benchmark suite. In SC’01. ACM, 2001. Google Scholar
Digital Library
- R. Tarjan. Depth-first search and linear graph algorithms. SIAM Journal on Computing, 1(2):146–160, 1972.Google Scholar
Digital Library
- F. Turbak. First-class synchronization barriers. In ICFP’96, pages 157–168. ACM, 1996. Google Scholar
Digital Library
- UPC homepage. upc-lang.org.Google Scholar
- N. Vasudevan, O. Tardieu, J. Dolby, and S. A. Edwards. Compiletime analysis and specialization of clocks in concurrent programs. In CC’09, volume 5501 of LNCS, pages 48–62. Springer, 2009. Google Scholar
Digital Library
- A. Vo. Scalable Formal Dynamic Verification of MPI Programs Through Distributed Causality Tracking. PhD thesis, University of Utah, 2011. AAI3454168. Google Scholar
Digital Library
- X10 homepage. x10-lang.org.Google Scholar
- Course materials of principles and practice of parallel programming. www.cs.columbia.edu/~martha/courses/4130/au13/, 2013.Google Scholar
Index Terms
Dynamic deadlock verification for general barrier synchronisation
Recommendations
Deadlock avoidance in parallel programs with futures: why parallel tasks should not wait for strangers
Futures are an elegant approach to expressing parallelism in functional programs. However, combining futures with imperative programming (as in C++ or in Java) can lead to pernicious bugs in the form of data races and deadlocks, as a consequence of ...
Dynamic Deadlock Verification for General Barrier Synchronisation
We present Armus, a verification tool for dynamically detecting or avoiding barrier deadlocks. The core design of Armus is based on phasers, a generalisation of barriers that supports split-phase synchronisation, dynamic membership, and optional-waits. ...
Dynamic deadlock verification for general barrier synchronisation
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingWe present Armus, a dynamic verification tool for deadlock detection and avoidance specialised in barrier synchronisation. Barriers are used to coordinate the execution of groups of tasks, and serve as a building block of parallel computing. Our tool ...






Comments