skip to main content
research-article
Open Access

Foundations of empirical memory consistency testing

Published:13 November 2020Publication History
Skip Abstract Section

Abstract

Modern memory consistency models are complex, and it is difficult to reason about the relaxed behaviors that current systems allow. Programming languages, such as C and OpenCL, offer a memory model interface that developers can use to safely write concurrent applications. This abstraction provides functional portability across any platform that implements the interface, regardless of differences in the underlying systems. This powerful abstraction hinges on the ability of the system to correctly implement the interface. Many techniques for memory consistency model validation use empirical testing, which has been effective at uncovering undocumented behaviors and even finding bugs in trusted compilation schemes. Memory model testing consists of small concurrent unit tests called “litmus tests”. In these tests, certain observations, including potential bugs, are exceedingly rare, as they may only be triggered by precise interleaving of system steps in a complex processor, which is probabilistic in nature. Thus, each test must be run many times in order to provide a high level of confidence in its coverage.

In this work, we rigorously investigate empirical memory model testing. In particular, we propose methodologies for navigating complex stressing routines and analyzing large numbers of testing observations. Using these insights, we can more efficiently tune stressing parameters, which can lead to higher confidence results at a faster rate. We emphasize the need for such approaches by performing a meta-study of prior work, which reveals results with low reproducibility and inefficient use of testing time.

Our investigation is presented alongside empirical data. We believe that OpenCL targeting GPUs is a pragmatic choice in this domain as there exists a variety of different platforms to test, from large HPC servers to power-efficient edge devices. The tests presented in the work span 3 GPUs from 3 different vendors. We show that our methodologies are applicable across the GPUs, despite significant variances in the results. Concretely, our results show: lossless speedups of more than 5× in tuning using data peeking; a definition of portable stressing parameters which loses only 12% efficiency when generalized across our domain; a priority order of litmus tests for tuning. We stress test a conformance test suite for the OpenCL 2.0 memory model and discover a bug in Intel’s compiler. Our methods are evaluated on the other two GPUs using mutation testing. We end with recommendations for official memory model conformance tests.

Skip Supplemental Material Section

Supplemental Material

Auxiliary Presentation Video

Modern memory consistency models are complex, and it is difficult to reason about the relaxed behaviors that current systems allow. Programming languages, such as C and OpenCL, offer a memory model interface that developers can use to safely write concurrent applications. This abstraction hinges on the ability of the system to correctly implement the interface. We rigorously investigate empirical memory model testing. In particular, we propose methodologies for navigating complex stressing routines and analyzing large numbers of testing observations. Using these insights, we can more efficiently tune stressing parameters, which can lead to higher confidence results at a faster rate. We stress test a conformance test suite for the OpenCL 2.0 memory model and discover a bug in Intel’s compiler. We end with recommendations for official memory model conformance tests.

References

  1. Jade Alglave, Mark Batty, Alastair F. Donaldson, Ganesh Gopalakrishnan, Jeroen Ketema, Daniel Poetzl, Tyler Sorensen, and John Wickerson. 2015. GPU Concurrency: Weak Behaviours and Programming Assumptions. In Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM. https://doi.org/10.1145/2694344.2694391 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jade Alglave, Luc Maranget, Susmit Sarkar, and Peter Sewell. 2010. Fences in Weak Memory Models. In Computer Aided Verification (CAV). Springer. https://doi.org/10.1007/978-3-642-14295-6_25 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jade Alglave, Luc Maranget, Susmit Sarkar, and Peter Sewell. 2011. Litmus: Running Tests against Hardware. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS). Springer. https://doi.org/10.1007/978-3-642-19835-9_5 Google ScholarGoogle Scholar
  4. Jade Alglave, Luc Maranget, and Michael Tautschnig. 2014. Herding Cats: Modelling, Simulation, Testing, and Data Mining for Weak Memory. ACM Trans. Program. Lang. Syst. 36, 2 ( 2014 ), 7 : 1-7 : 74. https://doi.org/10.1145/2627752 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. ARM. 2011. Cortex-A9 MPCore, programmer advice notice, read-after-read hazards.Google ScholarGoogle Scholar
  6. Mark Batty, Alastair F. Donaldson, and John Wickerson. 2016. Overhauling SC atomics in C11 and OpenCL. In Principles of Programming Languages (POPL). ACM. https://doi.org/10.1145/2837614.2837637 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Mark Batty, Kayvan Memarian, Scott Owens, Susmit Sarkar, and Peter Sewell. 2012. Clarifying and compiling C/C++ concurrency: from C+ +11 to POWER. In Principles of Programming Languages (POPL). ACM. https://doi.org/10.1145/ 1926385.1926394 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Mark Batty, Scott Owens, Susmit Sarkar, Peter Sewell, and Tjark Weber. 2011. Mathematizing C+ + concurrency. In Principles of Programming Languages (POPL). ACM. https://doi.org/10.1145/1926385.1926394 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Joonwon Choi, Muralidaran Vijayaraghavan, Benjamin Sherman, Adam Chlipala, and Arvind. 2017. Kami: a platform for high-level parametric hardware specification and its modular verification. Proc. ACM Program. Lang. 1, ICFP ( 2017 ), 24 : 1-24 : 30. https://doi.org/10.1145/3110268 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Nathan Chong, Tyler Sorensen, and John Wickerson. 2018. The semantics of transactions and weak memory in x86, Power, ARM, and C++. In Programming Language Design and Implementation (PLDI). ACM. https://doi.org/10.1145/3192366. 3192373 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. William W. Collier. 1992. Reasoning About Parallel Architectures. Prentice-Hall. http://www.mpdiag.com/.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. A. DeMillo, R. J. Lipton, and F. G. Sayward. 1978. Hints on Test Data Selection: Help for the Practicing Programmer. Computer 11, 4 ( 1978 ), 34-41. https://doi.org/10.1109/ C-M. 1978.218136 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Shaked Flur, Susmit Sarkar, Christopher Pulte, Kyndylan Nienhuis, Luc Maranget, Kathryn E. Gray, Ali Sezgin, Mark Batty, and Peter Sewell. 2017. Mixed-size concurrency: ARM, POWER, C/C++ 11, and SC. In Principles of Programming Languages (POPL). ACM. https://doi.org/10.1145/3093333.3009839 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kshitij Gupta, Jef Stuart, and John D. Owens. 2012. A study of persistent threads style GPU programming for GPGPU workloads. In Innovative Parallel Computing (InPar). IEEE Computer Society. https://doi.org/10.1109/InPar. 2012.6339596 Google ScholarGoogle Scholar
  15. Sudheendra Hangal, Durgam Vahia, Chaiyasit Manovit, Juin-Yeu Joseph Lu, and Sridhar Narayanan. 2004. TSOtool: A program for verifying memory systems using the memory consistency model. In International Symposium on Computer Architecture (ISCA). IEEE Computer Society. https://doi.org/10.1145/1028176.1006710 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Derek R. Hower, Blake A. Hechtman, Bradford M. Beckmann, Benedict R. Gaster, Mark D. Hill, Steven K. Reinhardt, and David A. Wood. 2014. Heterogeneous-race-free memory models. In Architectural Support for Programming Languages and Operating Systems (ASPLOS). 427-440. https://doi.org/10.1145/2541940.2541981 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Dan Iorga, Tyler Sorensen, John Wickerson, and Alastair F. Donaldson. 2020. Slow and Steady: Measuring and Tuning Multicore Interference. In Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE. https: //doi.org/10.1109/RTAS48715. 2020. 000-6 Google ScholarGoogle ScholarCross RefCross Ref
  18. Yue Jia and Mark Harman. 2011. An Analysis and Survey of the Development of Mutation Testing. IEEE Trans. Software Eng. 37, 5 ( 2011 ), 649-678. https://doi.org/10.1109/TSE. 2010.62 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Khronos Group. 2015. The OpenCL Specification Version: 2.0 (rev. 29).Google ScholarGoogle Scholar
  20. Khronos Group. 2019. The OpenCL Specification Version: 2.2 (rev. 2. 2-11 ).Google ScholarGoogle Scholar
  21. L. Lamport. 1979. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Trans. Comput. 28, 9 ( 1979 ), 690-691. https://doi.org/10.1109/TC. 1979.1675439 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. N. G. Leveson and C. S. Turner. 1993. An investigation of the Therac-25 accidents. IEEE Computer 26, 7 ( 1993 ), 18-41. https://doi.org/10.1109/ MC. 1993.274940 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Daniel Lustig, Sameer Sahasrabuddhe, and Olivier Giroux. 2019. A Formal Analysis of the NVIDIA PTX Memory Consistency Model. In Architectural Support for Programming Languages and Operating Systems, ASPLOS. ACM. https://doi.org/10. 1145/3297858.3304043 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Yatin A. Manerkar, Caroline Trippel, Daniel Lustig, Michael Pellauer, and Margaret Martonosi. 2016. Counterexamples and proof loophole for the C/C++ to POWER and ARMv7 trailing-sync compiler mappings. arXiv:1611.01507 2016.Google ScholarGoogle Scholar
  25. Brian Norris and Brian Demsky. 2013. CDSChecker: Checking concurrent data structures written with C/C++ atomics. In Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), 2013. ACM. https://doi.org/10.1145/ 2544173.2509514 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Nvidia. 2019. CUDA C Programming Guide, Version 10.2.Google ScholarGoogle Scholar
  27. Christopher Pulte, Shaked Flur, Will Deacon, Jon French, Susmit Sarkar, and Peter Sewell. 2018. Simplifying ARM concurrency: multicopy-atomic axiomatic and operational models for ARMv8. Proc. ACM Program. Lang. POPL ( 2018 ). https: //doi.org/10.1145/3158107 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Peter Sewell, Susmit Sarkar, Scott Owens, Francesco Zappa Nardelli, and Magnus O. Myreen. 2010. x86-TSO: a rigorous and usable programmer's model for x86 multiprocessors. Commun. ACM 53, 7 ( 2010 ), 89-97. https://doi.org/10.1145/ 1785414.1785443 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. F J Sigworth and S M Sine. 1987. Data transformations for improved display and fitting of single-channel dwell time histograms. Biophysical journal 52, 6 (Dec 1987 ), 1047-54. https://doi.org/10.1016/S0006-3495 ( 87 ) 83298-8 Google ScholarGoogle ScholarCross RefCross Ref
  30. Tyler Sorensen and Alastair F. Donaldson. 2016. Exposing Errors Related to Weak Memory in GPU Applications. In Programming Language Design and Implementation (PLDI). ACM. https://doi.org/10.1145/2908080.2908114 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Tyler Sorensen, Alastair F. Donaldson, Mark Batty, Ganesh Gopalakrishnan, and Zvonimir Rakamaric. 2016. Portable inter-workgroup barrier synchronisation for GPUs. In Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). ACM. https://doi.org/10.1145/2983990.2984032 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Daniel J. Sorin, Mark D. Hill, and David A. Wood. 2011. A Primer on Memory Consistency and Cache Coherence (1st ed.). Morgan & Claypool Publishers. https://doi.org/10.2200/S00346ED1V01Y201104CAC016 Google ScholarGoogle ScholarCross RefCross Ref
  33. Tuan Ta, Xianwei Zhang, Anthony Gutierrez, and Bradford M. Beckmann. 2019. Autonomous data-race-free GPU testing. In International Symposium on Workload Characterization (IISWC). IEEE Computer Society. https://doi.org/10.1109/ IISWC47752. 2019.9042019 Google ScholarGoogle ScholarCross RefCross Ref
  34. United States Department of Energy. 2004. Final Report on the August 14, 2003 Blackout in the United States and Canada: Causes and Recommendations.Google ScholarGoogle Scholar
  35. Conrad Watt, Christopher Pulte, Anton Podkopaev, Guillaume Barbier, Stephen Dolan, Shaked Flur, Jean Pichon-Pharabod, and Shu yu Guo. 2020. Repairing and mechanising the JavaScript relaxed memory model. In Programming Language Design and Implementation (PLDI). ACM. https://doi.org/10.1145/3385412.3385973 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. John Wickerson, Mark Batty, Tyler Sorensen, and George A. Constantinides. 2017. Automatically comparing memory consistency models. In Principles of Programming Languages (POPL). ACM. https://doi.org/10.1145/3009837.3009838 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Shucai Xiao and Wu-chun Feng. 2010. Inter-block GPU communication via fast barrier synchronization. In International Symposium on Parallel and Distributed Processing (IPDPS). IEEE Computer Society. https://doi.org/10.1109/IPDPS. 2010. 5470477 Google ScholarGoogle ScholarCross RefCross Ref
  38. Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and understanding bugs in C compilers. In Programming Language Design and Implementation (PLDI). ACM. https://doi.org/10.1145/1993316.1993532 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Foundations of empirical memory consistency testing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the ACM on Programming Languages
        Proceedings of the ACM on Programming Languages  Volume 4, Issue OOPSLA
        November 2020
        3108 pages
        EISSN:2475-1421
        DOI:10.1145/3436718
        Issue’s Table of Contents

        Copyright © 2020 Owner/Author

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 November 2020
        Published in pacmpl Volume 4, Issue OOPSLA

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!