skip to main content
research-article
Public Access

COATCheck: Verifying Memory Ordering at the Hardware-OS Interface

Published:25 March 2016Publication History
Skip Abstract Section

Abstract

Modern computer systems include numerous compute elements, from CPUs to GPUs to accelerators. Harnessing their full potential requires well-defined, properly-implemented memory consistency models (MCMs), and low-level system functionality such as virtual memory and address translation (AT). Unfortunately, it is difficult to specify and implement hardware-OS interactions correctly; in the past, many hardware and OS specification mismatches have resulted in implementation bugs in commercial processors. In an effort to resolve this verification gap, this paper makes the following contributions. First, we present COATCheck, an address translation-aware framework for specifying and statically verifying memory ordering enforcement at the microarchitecture and operating system levels. We develop a domain-specific language for specifying ordering enforcement, for including ordering-related OS events and hardware micro-operations, and for programmatically enumerating happens-before graphs. Using a fast and automated static constraint solver, COATCheck can efficiently analyze interesting and important memory ordering scenarios for modern, high-performance, out-of-order processors. Second, we show that previous work on Virtual Address Memory Consistency (VAMC) does not capture every translation-related ordering scenario of interest, and that some such cases even fall outside the traditional scope of consistency. We therefore introduce the term transistency model to describe the superset of consistency which captures all translation-aware sets of ordering rules.

References

  1. S. V. Adve and M. D. Hill. Weak ordering--a new definition. 17th International Symposium on Computer Architecture (ISCA), 1990.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Alglave. A formal hierarchy of weak memory models. Formal Methods in System Design (FMSD), 41 (2): 178--210, 2012.Google ScholarGoogle Scholar
  3. J. Alglave, L. Maranget, and M. Tautschnig. Herding cats: Modelling, simulation, testing, and data-mining for weak memory. ACM Transactions on Programming Languages and Systems (TOPLAS), 36 (2): 7:1--7:74, July 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Alglave, M. Batty, A. F. Donaldson, G. Gopalakrishnan, J. Ketema, D. Poetzl, T. Sorensen, and J. Wickerson. GPU concurrency: Weak behaviours and programming assumptions. 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. AMD. Revision guide for AMD Athlon 64 and AMD Opteron processors. Publication number 25759. Revision: 3.79. 2009. URL http://support.amd.com/TechDocs/25759.pdf.Google ScholarGoogle Scholar
  6. AMD. Revision guide for AMD family 10h processors. publication number 41322. revision: 3.92. 2012.Google ScholarGoogle Scholar
  7. AMD. AMD64 architecture programmer's manual. http://developer.amd.com/resources/documentation-articles/developer-guides-manuals/, 2013.Google ScholarGoogle Scholar
  8. D. Aspinall and J. Sevcik. Java memory model examples: Good, bad and ugly. ph1st International Workshop on Verification and Analysis of Multi-threaded Java-like Programs (VAMP), 2007.Google ScholarGoogle Scholar
  9. M. Batty, S. Owens, S. Sarkar, P. Sewell, and T. Weber. Mathematizing C+ concurrency. In 38th Symposium on Principles of Programming Languages (POPL), 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Batty, K. Memarian, S. Owens, S. Sarkar, and P. Sewell. Clarifying and compiling C/C+ concurrency: From C+ 11 to POWER. In 39th Symposium on Principles of Programming Languages (POPL), 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Benedict R. Gaster. HSA memory model. Hot Chips Tutorial, 2013. URL http://hsafoundation.com/hot-chips-2013-hsa-foundation-presented-deeper-detail-hsa-hsail/.Google ScholarGoogle ScholarCross RefCross Ref
  12. A. Bhattacharjee and M. Martonosi. Inter-core cooperative TLB for chip multiprocessors. In 15th International Symposioum on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H.-J. Boehm and S. V. Adve. Foundations of the C+ concurrency memory model. In 29th Conference on Programming Language Design and Implementation (PLDI), 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Davis, G. Logemann, and D. Loveland. A machine program for theorem-proving. Communications of the ACM, 5 (7), 1962.Google ScholarGoogle Scholar
  15. K. Gharachorloo, A. Gupta, and J. Hennessy. Two techniques to enhance the performance of memory consistency models. International Conference on Parallel Processing (ICPP), 1991.Google ScholarGoogle Scholar
  16. A. Glew, G. Hinton, and H. Akkary. Method and apparatus for performing page table walks in a microprocessor capable of processing speculative instructions, Oct. 21 1997. URL https://www.google.com/patents/US5680565. US Patent 5,680,565.Google ScholarGoogle Scholar
  17. Intel. Intel 64 architecture memory ordering white paper. 2007. SKU 318147-001.Google ScholarGoogle Scholar
  18. Intel. Intel Core Duo processor and Intel Core Solo processor on 65 nm process specification update. Document number 309222. Revision number 20., 2009.Google ScholarGoogle Scholar
  19. Intel. Intel 64 and IA-32 architectures optimization reference manual, 2013.Google ScholarGoogle Scholar
  20. Intel. Intel 64 and IA-32 architectures software developer's manual, 2013.Google ScholarGoogle Scholar
  21. Intel. Intel Xeon processor E5 product family specification update. Reference number 326510-018., 2015.Google ScholarGoogle Scholar
  22. V. Karakostas, J. Gandhi, F. Ayar, A. Cristal, M. D. Hill, K. S. McKinley, M. Nemirovsky, M. M. Swift, and O. Ünsal. Redundant memory mappings for fast access to large memories. In 42nd International Symposium on Computer Architecture (ISCA), 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computing, 28 (9), 1979.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. N. M. Lê, A. Guatto, A. Cohen, and A. Pop. Correct and efficient bounded FIFO queues. ph25th Symposium on Computer Architecture and High-Performance Computing (SBAC-PAD), 2013.Google ScholarGoogle Scholar
  25. N. M. Lê, A. Pop, A. Cohen, and F. Zappa Nardelli. Correct and efficient work-stealing for weak memory models. 18th Symposium on Principles and Practice of Parallel Programming (PPoPP), 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Lustig, M. Pellauer, and M. Martonosi. PipeCheck: Specifying and verifying microarchitectural enforcement of memory consistency models. 47th International Symposium on Microarchitecture (MICRO), 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Lustig, M. Pellauer, and M. Martonosi. Verifying correct microarchitectural enforcement of memory consistency models. IEEE Micro Top Picks of 2014, 35 (3): 72--82, May 2015.Google ScholarGoogle Scholar
  28. S. Mador-Haim, L. Maranget, S. Sarkar, K. Memarian, J. Alglave, S. Owens, R. Alur, M. M. K. Martin, P. Sewell, and D. Williams. An axiomatic memory model for POWER multiprocessors. In 24th International Conference on Computer Aided Verification (CAV), 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Y. A. Manerkar, D. Lustig, M. Pellauer, and M. Martonosi. CCICheck: Using μhb graphs to verify the coherence-consistency interface. 48th International Symposium on Microarchitecture (MICRO), 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Manson, W. Pugh, and S. V. Adve. The Java memory model. In 32nd Symposium on Principles of Programming Languages (POPL), 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. P. E. McKenney, T. Riegel, J. Preshing, H. Boehm, C. Nelson, and O. Giroux. Towards implementation and use of memory_order_consume. ISO SC22 WG21 N4321, November 2014.Google ScholarGoogle Scholar
  32. S. Owens, S. Sarkar, and P. Sewell. A better x86 memory model: x86-TSO. In 22nd International Conference on Theorem Proving in Higher Order Logics (TPHOLs), 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. B. Pham, V. Vaidyanathan, A. Jaleel, and A. Bhattacharjee. CoLT: Coalesced large-reach TLBs. In 45th International Symposium on Microarchitecture (MICRO), 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. B. Pichai, L. Hsu, and A. Bhattacharjee. Architectural support for address translation on GPUs. In 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Power, M. Hill, and D. Wood. Supporting x86--64 address translation for 100s of GPU lanes. In 20th International Symposium on High Performance Computer Architecture (HPCA), 2014.Google ScholarGoogle ScholarCross RefCross Ref
  36. W. Pugh. The Java memory model is fatally flawed. Concurrency - Practice and Experience, 12 (6): 445--455, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  37. B. Romanescu, A. Lebeck, D. Sorin, and A. Bracy. UNified instruction/translation/data (UNITD) coherence: One protocol to rule them all. In 16th International Symposium on High Performance Computer Architecture (HPCA), 2010.Google ScholarGoogle ScholarCross RefCross Ref
  38. B. F. Romanescu, A. R. Lebeck, and D. J. Sorin. Specifying and dynamically verifying address translation-aware memory consistency. In ph15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2010.Google ScholarGoogle Scholar
  39. B. F. Romanescu, A. R. Lebeck, and D. J. Sorin. Address translation aware memory consistency. IEEE Micro, 31 (1), Jan 2011.Google ScholarGoogle Scholar
  40. S. Sarkar, P. Sewell, J. Alglave, L. Maranget, and D. Williams. Understanding POWER multiprocessors. In 32nd Conference on Programming Language Design and Implementation (PLDI), 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. A. Saulsbury, F. Dahlgren, and P. Stenstrom. Recency-based TLB preloading. In 27th International Symposium on Computer Architecture (ISCA), 2000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. T. C. Schroeder. Peer-to-peer & unified virtual addressing. NVIDIA GPU Technology Conference, 2011.Google ScholarGoogle Scholar
  43. P. Sewell, S. Sarkar, S. Owens, F. Zappa Nardelli, and M. O. Myreen. x86-TSO: a rigorous and usable programmer's model for x86 multiprocessors. Communications of the ACM, 53 (7), 2010.Google ScholarGoogle Scholar
  44. The Coq development team. The Coq proof assistant reference manual, version 8.0. LogiCal Project, 2004. URL http://coq.inria.fr.Google ScholarGoogle Scholar
  45. The diy development team. A don't (diy) tutorial, version 5.01, 2012. http://diy.inria.fr/doc/index.html.Google ScholarGoogle Scholar
  46. C. Villavieja, V. Karakostas, L. Vilanova, Y. Etsion, A. Ramirez, A. Mendelson, N. Navarro, A. Cristal, and O. Unsal. DiDi: Mitigating the performance impact of TLB shootdowns using a shared TLB directory. In 20th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. COATCheck: Verifying Memory Ordering at the Hardware-OS Interface

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 51, Issue 4
        ASPLOS '16
        April 2016
        774 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2954679
        • Editor:
        • Andy Gill
        Issue’s Table of Contents
        • cover image ACM Conferences
          ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems
          March 2016
          824 pages
          ISBN:9781450340915
          DOI:10.1145/2872362
          • General Chair:
          • Tom Conte,
          • Program Chair:
          • Yuanyuan Zhou

        Copyright © 2016 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 March 2016

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!