skip to main content
research-article

Noise Injection Techniques to Expose Subtle and Unintended Message Races

Authors Info & Claims
Published:26 January 2017Publication History
Skip Abstract Section

Abstract

Debugging intermittently occurring bugs within MPI applications is challenging, and message races, a condition in which two or more sends race to match with a receive, are one of the common root causes. Many debugging tools have been proposed to help programmers resolve them, but their runtime interference perturbs the timing such that subtle races often cannot be reproduced with debugging tools. We present novel noise injection techniques to expose message races even under a tool's control. We first formalize this race problem in the context of non-deterministic parallel applications and use this analysis to determine an effective noise-injection strategy to uncover them. We codified these techniques in NINJA (Noise INJection Agent) that exposes these races without modification to the application. Our evaluations on synthetic cases as well as a real-world bug in Hypre-2.10.1 show that NINJA significantly helps expose races.

References

  1. P. Beckman, K. Iskra, K. Yoshii, and S. Coghlan. The influence of operating systems on the performance of collective operations at extreme scale. In Cluster Computing, 2006 IEEE International Conference on, pages 1--12, Sept 2006. 10.1109/CLUSTR.2006.311846. Google ScholarGoogle ScholarCross RefCross Ref
  2. A. Bouteiller, G. Bosilca, and J. Dongarra. Retrospect: Deterministic Replay of MPI Applications for Interactive Distributed Debugging. In F. Cappello, T. Herault, and J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, volume 4757 of phLecture Notes in Computer Science, pages 297--306. Springer Berlin Heidelberg, 2007. ISBN 978--3--540--75415--2. 10.1007/978--3--540--75416--9_41. URL http://dx.doi.org/10.1007/978--3--540--75416--9_41.Google ScholarGoogle ScholarCross RefCross Ref
  3. C. Clemencon, J. Fritscher, M. Meehan, and R. Ruhl. An Implementation of Race Detection and Deterministic Replay with MPI. In EURO-PAR '95 Parallel Processing, volume 966 of phLecture Notes in Computer Science, pages 155--166. Springer Berlin Heidelberg, 1995. ISBN 978--3--540--60247--7. 10.1007/BFb0020462. URL http://dx.doi.org/10.1007/BFb0020462. Google ScholarGoogle ScholarCross RefCross Ref
  4. D. Comer. Internetworking with TCP/IP: Principles, Protocols, and Architecture. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1988. ISBN 0--13--470154--2.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. CORAL. Collaboration of Oak Ridge, Argonne, and Livermore benchmark codes. https://asc.llnl.gov/CORAL-benchmarks.Google ScholarGoogle Scholar
  6. Emmi:2011:DS:1926385.1926432M. Emmi, S. Qadeer, and Z. Rakamarić. Delay-bounded scheduling. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '11, pages 411--422, New York, NY, USA, 2011. ACM. ISBN 978--1--4503-0490-0. 10.1145/1926385.1926432. URL http://doi.acm.org/10.1145/1926385.1926432.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Engelmann. Investigating operating system noise in extreme-scale high-performance computing systems using simulation. In Proceedings of thehrefhttp://www.iasted.org/conferences/home-795.html 11th IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN) 2013, Innsbruck, Austria, Feb. 11--13, 2013.hrefhttp://www.actapress.comACTA Press, Calgary, AB, Canada. ISBN 978-0--88986--943--1. http://dx.doi.org/10.2316/P.2013.795-010. URL http://www.christian-engelmann.info/publications/engelmann12investigating.pdf. Google ScholarGoogle ScholarCross RefCross Ref
  8. K. B. Ferreira, P. Bridges, and R. Brightwell. Characterizing application sensitivity to os interference using kernel-level noise injection. In High Performance Computing, Networking, Storage and Analysis, 2008. SC 2008. International Conference for, pages 1--12, Nov 2008. 10.1109/SC.2008.5219920. Google ScholarGoogle ScholarCross RefCross Ref
  9. C. Flanagan and S. N. Freund. Fasttrack: Efficient and precise dynamic race detection. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '09, pages 121--133, New York, NY, USA, 2009. ACM. ISBN 978--1--60558--392--1. 10.1145/1542476.1542490. URL http://doi.acm.org/10.1145/1542476.1542490.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. P. Forum. MPI: A Message-Passing Interface Standard. Technical report, Knoxville, TN, USA, 1994. URL http://www.mpi-forum.org/.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Gusat, D. Craddock, W. Denzel, T. Engbersen, N. Ni, G. Pfister, W. Rooney, and J. Duato. Congestion control in infiniband networks. In High Performance Interconnects, 2005. Proceedings. 13th Symposium on, pages 158--159, Aug 2005. 10.1109/CONECT.2005.14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. er]Hilbrich:2012:MRE:2388996.2389037T. Hilbrich, J. Protze, M. Schulz, B. R. de Supinski, and M. S. Müller. Runtime error detection with must: Advances in deadlock detection. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 30:1--30:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press. ISBN 978-1-4673-0804-5. URL http://dl.acm.org/citation.cfm?id=2388996.2389037.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Hoefler, T. Schneider, and A. Lumsdaine. The impact of network noise at large-scale communication performance. In Parallel Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pages 1--8, May 2009. 10.1109/IPDPS.2009.5161095. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. C. d. Kergommeaux, M. Ronsse, and K. D. Bosschere. MPL*: Efficient Record/Play of Nondeterministic Features of Message Passing Libraries. In Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 141--148, London, UK, UK, 1999. Springer-Verlag. ISBN 3-540-66549-8. URL http://dl.acm.org/citation.cfm?id=648136.746462.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Kranzlmüller and J. Volkert. NOPE: A Nondeterministic Program Evaluator. In P. Zinterhof, M. Vajteršic, and A. Uhl, editors, Parallel Computation, volume 1557 of Lecture Notes in Computer Science, pages 490--499. Springer Berlin Heidelberg, 1999. ISBN 978--3--540--65641--8. 10.1007/3--540--49164--3_47. URL http://dx.doi.org/10.1007/3-540-49164-3_47.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Kranzlmüller, C. Schaubschläger, and J. Volkert. An Integrated Record & Replay Mechanism for Nondeterministic Message Passing Programs. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, volume 2131 of Lecture Notes in Computer Science, pages 192--200. Springer Berlin Heidelberg, 2001. ISBN 978-3-540-42609-7. 10.1007/3-540-45417-9_28. URL http://dx.doi.org/10.1007/3--540--45417--9_28. Google ScholarGoogle ScholarCross RefCross Ref
  17. R. H. B. Netzer and B. P. Miller. Optimal Tracing and Replay for Debugging Message-passing Parallel Programs. In Proceedings of the 1992 ACM/IEEE Conference on Supercomputing, Supercomputing '92, pages 502--511, Los Alamitos, CA, USA, 1992. IEEE Computer Society Press. ISBN 0--8186--2630--5. URL http://dl.acm.org/citation.cfm?id=147877.148058.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C.-S. Park, K. Sen, P. Hargrove, and C. Iancu. Efficient data race detection for distributed memory parallel programs. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 51:1--51:12, New York, NY, USA, 2011. ACM. ISBN 978--1--4503-0771-0. 10.1145/2063384.2063452. URL http://doi.acm.org/10.1145/2063384.2063452.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M.-Y. Park, S. J. Shim, Y.-K. Jun, and H.-R. Park. phMPIRace-Check: Detection of Message Races in MPI Programs, pages 322--333. Springer Berlin Heidelberg, Berlin, Heidelberg, 2007. ISBN 978--3--540--72360--8. 10.1007/978--3--540--72360--8_28. URL http://dx.doi.org/10.1007/978--3--540--72360--8_28.Google ScholarGoogle Scholar
  20. K. Sato, D. H. Ahn, I. Laguna, G. L. Lee, and M. Schulz. Clock delta compression for scalable order-replay of non-deterministic parallel applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '15, pages 62:1--62:12, New York, NY, USA, 2015. ACM. ISBN 978--1--4503--3723--6. 10.1145/2807591.2807642. URL http://doi.acm.org/10.1145/2807591.2807642.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: A dynamic data race detector for multithreaded programs. ACM Trans. Comput. Syst., 15 (4): 391--411, Nov. 1997. ISSN 0734--2071. 10.1145/265924.265927. URL http://doi.acm.org/10.1145/265924.265927.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Serebryany and T. Iskhodzhanov. Threadsanitizer: Data race detection in practice. In phProceedings of the Workshop on Binary Instrumentation and Applications, WBIA '09, pages 62--71, New York, NY, USA, 2009. ACM. ISBN 978--1--60558--793--6. 10.1145/1791194.1791203. URL http://doi.acm.org/10.1145/1791194.1791203.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Shipman, P. M., Cormick, K. Pedretti, S. Olivier, K. B. Ferreira, R. Sankaran, S. Treichler, A. Aiken, and M. Bauer. Analysis of application sensitivity to system performance variability in a dynamic task based runtime. In The Workshop on Runtime Systems for Extreme Scale Programming Models and Architectures, 2015.Google ScholarGoogle Scholar
  24. A. Vo, S. Vakkalanka, M. DeLisi, G. Gopalakrishnan, R. M. Kirby, and R. Thakur. Formal verification of practical mpi programs. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '09, pages 261--270, New York, NY, USA, 2009. ACM. ISBN 978--1--60558--397--6. 10.1145/1504176.1504214. URL http://doi.acm.org/10.1145/1504176.1504214.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Vo, S. Aananthakrishnan, G. Gopalakrishnan, B. R. d. Supinski, M. Schulz, and G. Bronevetsky. A scalable and distributed dynamic formal verifier for mpi programs. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC '10, pages 1--10, Washington, DC, USA, 2010. IEEE Computer Society. ISBN 978--1--4244--7559--9. 10.1109/SC.2010.7. URL http://dx.doi.org/10.1109/SC.2010.7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. Xue, X. Liu, M. Wu, Z. Guo, W. Chen, W. Zheng, Z. Zhang, and G. Voelker. Mpiwiz: Subgroup reproducible replay of mpi applications. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '09, pages 251--260, New York, NY, USA, 2009. ACM. ISBN 978--1--60558--397--6. 10.1145/1504176.1504213. URL http://doi.acm.org/10.1145/1504176.1504213.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Noise Injection Techniques to Expose Subtle and Unintended Message Races

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 52, Issue 8
      PPoPP '17
      August 2017
      442 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/3155284
      Issue’s Table of Contents
      • cover image ACM Conferences
        PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
        January 2017
        476 pages
        ISBN:9781450344937
        DOI:10.1145/3018743

      Copyright © 2017 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 January 2017

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!