skip to main content
research-article
Free Access

Precise Predictive Analysis for Discovering Communication Deadlocks in MPI Programs

Published:17 August 2017Publication History
Skip Abstract Section

Abstract

The Message Passing Interface (MPI) is the standard API for parallelization in high-performance and scientific computing. Communication deadlocks are a frequent problem in MPI programs, and this article addresses the problem of discovering such deadlocks. We begin by showing that if an MPI program is single path, the problem of discovering communication deadlocks is NP-complete. We then present a novel propositional encoding scheme that captures the existence of communication deadlocks. The encoding is based on modeling executions with partial orders and implemented in a tool called MOPPER. The tool executes an MPI program, collects the trace, builds a formula from the trace using the propositional encoding scheme, and checks its satisfiability. Finally, we present experimental results that quantify the benefit of the approach in comparison to other analyzers and demonstrate that it offers a scalable solution for single-path programs.

References

  1. Jade Alglave, Daniel Kroening, and Michael Tautschnig. 2013. Partial orders for efficient bounded model checking of concurrent software. In Computer Aided Verification. Lecture Notes in Computer Science, Vol. 8044. Springer, 141--157.Google ScholarGoogle Scholar
  2. Olivier Bailleux and Yacine Boufkhad. 2003. Efficient CNF encoding of Boolean cardinality constraints. In Principles and Practice of Constraint Programming. Lecture Notes in Computer Science, Vol. 2833. Springer, 108--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Stanislav Böhm, Ondrej Meca, and Petr Jancar. 2016. State-space reduction of non-deterministically synchronizing systems applicable to deadlock detection in MPI. In Formal Methods. Lecture Notes in Computer Science, Vol. 9995. Springer, 102--118.Google ScholarGoogle Scholar
  4. Stefan Bucur, Vlad Ureche, Cristian Zamfir, and George Candea. 2011. Parallel symbolic execution for automated real-world software testing. In Proceedings of the Computer Systems Conference (EuroSys’11). ACM, New York, NY, 183--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. 2008. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI’08). 209--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. John D. Carter, William B. Gardner, and Gary Grewal. 2010. The pilot library for novice MPI programmers. In Proceedings of the Conference on Principles and Practice of Parallel Programming (PPoPP’10). ACM, New York, NY, 351--352. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Feng Chen, Traian Florin Serbanuta, and Grigore Rosu. 2008. jPredictor: A predictive runtime analysis tool for Java. In Proceedings of the International Conference on Software Engineering (ICSE’08). ACM, New York, NY, 221--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Allan Cheng, Javier Esparza, and Jens Palsberg. 1995. Complexity results for 1-safe nets. Theoretical Computer Science 147, 1--2, 117--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Edmund Clarke, Daniel Kroening, and Flavio Lerda. 2004. A tool for checking ANSI-C programs. In Tools and Algorithms for the Construction and Analysis of Systems. Lecture Notes in Computer Science, Vol. 2988. Springer, 168--176.Google ScholarGoogle Scholar
  10. Etem Deniz, Alper Sen, and Jim Holt. 2012. Verification and coverage of message passing multicore applications. ACM Transactions on Design Automation of Electronic Systems 17, 3, 23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Niklas Eén and Niklas Sörensson. 2003. An extensible SAT-solver. In Theory and Applications of Satisfiability Testing. Lecture Notes in Computer Science, Vol. 2919. Springer, 502--518.Google ScholarGoogle Scholar
  12. Mohamed Elwakil and Zijiang Yang. 2010. Debugging support tool for MCAPI applications. In Proceedings of the Conference on Parallel and Distributed Systems: Testing, Analysis, and Debugging (PADTAD’10). ACM, New York, NY, 20--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Mahdi Eslamimehr and Jens Palsberg. 2014. Sherlock: Scalable deadlock detection for concurrent programs. In Proceedings of the Conference on Foundations of Software Engineering (FSE’14). ACM, New York, NY, 353--365. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Vojtech Forejt, Daniel Kroening, Ganesh Narayanaswamy, and Subodh Sharma. 2014. Precise predictive analysis for discovering communication deadlocks in MPI programs. In Formal Methods. Lecture Notes in Computer Science, Vol. 8442. Springer, 263--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Alan M. Frisch and Paul A. Giannaros. 2010. SAT encodings of the at-most- constraint: Some old, some new, some fast, some slow. In Proceedings of the 9th International Workshop on Constraint Modelling and Reformulation (ModRef’10).Google ScholarGoogle Scholar
  16. Xianjin Fu, Zhenbang Chen, Chun Huang, Wei Dong, and Ji Wang. 2014. Synchronization error detection of MPI programs by symbolic execution. In Proceedings of the Asia-Pacific Software Engineering Conference (APSEC’14). IEEE, Los Alamitos, CA, 127--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Xianjin Fu, Zhenbang Chen, Yufeng Zhang, Chun Huang, Wei Dong, and Ji Wang. 2015. MPISE: Symbolic execution of MPI programs. In Proceedings of the Conference on High Assurance Systems Engineering (HASE’15). IEEE, Los Alamitos, CA, 181--188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sara Gradara, Antonella Santone, and Maria Luisa Villani. 2006. DELFIN: An efficient deadlock detection tool for CCS processes. Journal of Computer and System Sciences 72, 8, 1397--1412. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Waqar Haque. 2006. Concurrent deadlock detection in parallel programs. International Journal of Computers and Applications 28, 1, 19--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Tobias Hilbrich, Joachim Protze, Martin Schulz, Bronis R. de Supinski, and Matthias S. Müller. 2012. MPI runtime error detection with MUST: Advances in deadlock detection. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’12). IEEE, Los Alamitos, CA, Article No. 30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jim Holt, Anant Agarwal, Sven Brehmer, Max Domeika, Patrick Griffin, and Frank Schirrmeister. 2009. Software standards for the multicore era. IEEE Micro 29, 3, 40--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yu Huang and Eric Mercer. 2015. Detecting MPI zero buffer incompatibility by SMT encoding. In NASA Formal Methods. Lecture Notes in Computer Science, Vol. 9058. Springer, 219--233.Google ScholarGoogle Scholar
  23. Yu Huang, Eric Mercer, and Jay McCarthy. 2013. Proving MCAPI executions are correct using SMT. In Proceedings of the Conference on Automated Software Engineering (ASE’13). IEEE, Los Alamitos, CA, 26--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Bettina Krammer, Katrin Bidmon, Matthias S. Müller, and Michael M. Resch. 2003. MARMOT: An MPI analysis and checking tool. Advances in Parallel Computing 13, 2004, 493--500.Google ScholarGoogle Scholar
  25. Alan Leung, Manish Gupta, Yuvraj Agarwal, Rajesh Gupta, Ranjit Jhala, and Sorin Lerner. 2012. Verifying GPU kernels by test amplification. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’12). ACM, New York, NY, 383--394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hugo A. López, Eduardo R. B. Marques, Francisco Martins, Nicholas Ng, César Santos, Vasco Thudichum Vasconcelos, and Nobuko Yoshida. 2015. Protocol-based verification of message-passing parallel programs. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’15). ACM, New York, NY, 280--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Glenn R. Luecke, Yan Zou, James Coyle, Jim Hoekstra, and Marina Kraeva. 2002. Deadlock detection in MPI programs. Concurrency and Computation: Practice and Experience 14, 11, 911--932.Google ScholarGoogle ScholarCross RefCross Ref
  28. Stephan Merz, Martin Quinson, and Cristian Rosa. 2011. SimGrid MC: Verification support for a multi-API simulation platform. In FMOODS/FORTE. Lecture Notes in Computer Science, Vol. 6722. Springer, 274--288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Message Passing Interface Forum. 2009. MPI 2.2 Documents. Retrieved July 19, 2017, from http://www.mpi-forum.org/docs/mpi-2.2.Google ScholarGoogle Scholar
  30. Matthias S. Mueller, Ganesh Gopalakrishnan, Bronis R. de Supinski, David Lecomber, and Tobias Hilbrich. 2011. Dealing with MPI Bugs at Scale: Best Practices, Automatic Detection, Debugging, and Formal Verification. Retrieved July 19, 2017, from http://rcswww.zih.tu-dresden.de/ hilbrich/sc11/.Google ScholarGoogle Scholar
  31. N. Natarajan. 1984. A distributed algorithm for detecting communication deadlocks. In Foundations of Software Technology and Theoretical Computer Science. Lecture Notes in Computer Science, Vol. 181. Springer, 119--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Doron A. Peled. 1993. All from one, one for all: On model checking using representatives. In Computer Aided Verification. Lecture Notes in Computer Science, Vol. 697. Springer, 409--423. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. César Santos, Francisco Martins, and Vasco Thudichum Vasconcelos. 2015. Deductive verification of parallel programs using why3. In Proceedings of the 3rd International Conference on Ergonomics (ICE’15).Google ScholarGoogle ScholarCross RefCross Ref
  34. Subodh Sharma, Ganesh Gopalakrishnan, Eric Mercer, and Jim Holt. 2009. MCC: A runtime verification tool for MCAPI user applications. In Proceedings of the Conference on Formal Methods in Computer-Aided Design (FMCAD’09). IEEE, Los Alamitos, CA, 41--44.Google ScholarGoogle ScholarCross RefCross Ref
  35. Stephen F. Siegel. 2007. Model checking nonblocking MPI programs. In Verification, Model Checking, and Abstract Interpretation. Lecture Notes in Computer Science, Vol. 4349. Springer, 44--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Stephen F. Siegel, Manchun Zheng, Ziqing Luo, Timothy K. Zirkel, Andre V. Marianiello, John G. Edenhofner, Matthew B. Dwyer, and Michael S. Rogers. 2015. CIVL: The concurrency intermediate verification language. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’15). ACM, New York, NY, 61:1--61:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Stephen F. Siegel and Timothy K. Zirkel. 2011a. FEVS: A functional equivalence verification suite for high-performance scientific computing. Mathematics in Computer Science 5, 4, 427--435.Google ScholarGoogle ScholarCross RefCross Ref
  38. Stephen F. Siegel and Timothy K. Zirkel. 2011b. The Toolkit for Accurate Scientific Software. Technical Report UDEL-CIS-2011/01. Department of Computer and Information Sciences, University of Delaware.Google ScholarGoogle Scholar
  39. Sarvani Vakkalanka. 2010. Efficient Dynamic Verification Algorithms for MPI Applications. Ph.D. Dissertation. University of Utah. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Sarvani S. Vakkalanka, Ganesh Gopalakrishnan, and Robert M. Kirby. 2008. Dynamic verification of MPI programs with reductions in presence of split operations and relaxed orderings. In Computer Aided Verification. Lecture Notes in Computer Science, Vol. 5123. Springer, 66--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Antti Valmari. 1989. Stubborn sets for reduced state space generation. In Advances in Petri Nets 1990. Lecture Notes in Computer Science, Vol. 483. Springer, 491--515. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Anh Vo, Sriram Aananthakrishnan, Ganesh Gopalakrishnan, Bronis R. de Supinski, Martin Schulz, and Greg Bronevetsky. 2010. A scalable and distributed dynamic formal verifier for MPI programs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’10). IEEE, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Chao Wang, Sudipta Kundu, Malay K. Ganai, and Aarti Gupta. 2009. Symbolic predictive analysis for concurrent programs. In FM 2009: Formal Methods. Lecture Notes in Computer Science, Vol. 5850. Springer, 256--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Ruini Xue, Xuezheng Liu, Ming Wu, Zhenyu Guo, Wenguang Chen, Weimin Zheng, Zheng Zhang, and Geoffrey Voelker. 2009. MPIWiz: Subgroup reproducible replay of MPI applications. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’09). ACM, New York, NY, 251--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Timothy K. Zirkel, Stephen F. Siegel, and Louis F. Rossi. 2014. Using Symbolic Execution to Verify the Order of Accuracy of Numerical Approximations. Technical Report UD-CIS-2014/002. Department of Computer and Information Sciences, University of Delaware.Google ScholarGoogle Scholar

Index Terms

  1. Precise Predictive Analysis for Discovering Communication Deadlocks in MPI Programs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Programming Languages and Systems
        ACM Transactions on Programming Languages and Systems  Volume 39, Issue 4
        December 2017
        191 pages
        ISSN:0164-0925
        EISSN:1558-4593
        DOI:10.1145/3133234
        Issue’s Table of Contents

        Copyright © 2017 Owner/Author

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 August 2017
        • Accepted: 1 May 2017
        • Revised: 1 November 2016
        • Received: 1 November 2015
        Published in toplas Volume 39, Issue 4

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!