Abstract
Modern computer systems include numerous compute elements, from CPUs to GPUs to accelerators. Harnessing their full potential requires well-defined, properly-implemented memory consistency models (MCMs), and low-level system functionality such as virtual memory and address translation (AT). Unfortunately, it is difficult to specify and implement hardware-OS interactions correctly; in the past, many hardware and OS specification mismatches have resulted in implementation bugs in commercial processors. In an effort to resolve this verification gap, this paper makes the following contributions. First, we present COATCheck, an address translation-aware framework for specifying and statically verifying memory ordering enforcement at the microarchitecture and operating system levels. We develop a domain-specific language for specifying ordering enforcement, for including ordering-related OS events and hardware micro-operations, and for programmatically enumerating happens-before graphs. Using a fast and automated static constraint solver, COATCheck can efficiently analyze interesting and important memory ordering scenarios for modern, high-performance, out-of-order processors. Second, we show that previous work on Virtual Address Memory Consistency (VAMC) does not capture every translation-related ordering scenario of interest, and that some such cases even fall outside the traditional scope of consistency. We therefore introduce the term transistency model to describe the superset of consistency which captures all translation-aware sets of ordering rules.
- S. V. Adve and M. D. Hill. Weak ordering--a new definition. 17th International Symposium on Computer Architecture (ISCA), 1990.Google Scholar
Digital Library
- J. Alglave. A formal hierarchy of weak memory models. Formal Methods in System Design (FMSD), 41 (2): 178--210, 2012.Google Scholar
- J. Alglave, L. Maranget, and M. Tautschnig. Herding cats: Modelling, simulation, testing, and data-mining for weak memory. ACM Transactions on Programming Languages and Systems (TOPLAS), 36 (2): 7:1--7:74, July 2014.Google Scholar
Digital Library
- J. Alglave, M. Batty, A. F. Donaldson, G. Gopalakrishnan, J. Ketema, D. Poetzl, T. Sorensen, and J. Wickerson. GPU concurrency: Weak behaviours and programming assumptions. 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2015.Google Scholar
Digital Library
- AMD. Revision guide for AMD Athlon 64 and AMD Opteron processors. Publication number 25759. Revision: 3.79. 2009. URL http://support.amd.com/TechDocs/25759.pdf.Google Scholar
- AMD. Revision guide for AMD family 10h processors. publication number 41322. revision: 3.92. 2012.Google Scholar
- AMD. AMD64 architecture programmer's manual. http://developer.amd.com/resources/documentation-articles/developer-guides-manuals/, 2013.Google Scholar
- D. Aspinall and J. Sevcik. Java memory model examples: Good, bad and ugly. ph1st International Workshop on Verification and Analysis of Multi-threaded Java-like Programs (VAMP), 2007.Google Scholar
- M. Batty, S. Owens, S. Sarkar, P. Sewell, and T. Weber. Mathematizing C+ concurrency. In 38th Symposium on Principles of Programming Languages (POPL), 2011.Google Scholar
Digital Library
- M. Batty, K. Memarian, S. Owens, S. Sarkar, and P. Sewell. Clarifying and compiling C/C+ concurrency: From C+ 11 to POWER. In 39th Symposium on Principles of Programming Languages (POPL), 2012.Google Scholar
Digital Library
- Benedict R. Gaster. HSA memory model. Hot Chips Tutorial, 2013. URL http://hsafoundation.com/hot-chips-2013-hsa-foundation-presented-deeper-detail-hsa-hsail/.Google Scholar
Cross Ref
- A. Bhattacharjee and M. Martonosi. Inter-core cooperative TLB for chip multiprocessors. In 15th International Symposioum on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2010.Google Scholar
Digital Library
- H.-J. Boehm and S. V. Adve. Foundations of the C+ concurrency memory model. In 29th Conference on Programming Language Design and Implementation (PLDI), 2008.Google Scholar
Digital Library
- M. Davis, G. Logemann, and D. Loveland. A machine program for theorem-proving. Communications of the ACM, 5 (7), 1962.Google Scholar
- K. Gharachorloo, A. Gupta, and J. Hennessy. Two techniques to enhance the performance of memory consistency models. International Conference on Parallel Processing (ICPP), 1991.Google Scholar
- A. Glew, G. Hinton, and H. Akkary. Method and apparatus for performing page table walks in a microprocessor capable of processing speculative instructions, Oct. 21 1997. URL https://www.google.com/patents/US5680565. US Patent 5,680,565.Google Scholar
- Intel. Intel 64 architecture memory ordering white paper. 2007. SKU 318147-001.Google Scholar
- Intel. Intel Core Duo processor and Intel Core Solo processor on 65 nm process specification update. Document number 309222. Revision number 20., 2009.Google Scholar
- Intel. Intel 64 and IA-32 architectures optimization reference manual, 2013.Google Scholar
- Intel. Intel 64 and IA-32 architectures software developer's manual, 2013.Google Scholar
- Intel. Intel Xeon processor E5 product family specification update. Reference number 326510-018., 2015.Google Scholar
- V. Karakostas, J. Gandhi, F. Ayar, A. Cristal, M. D. Hill, K. S. McKinley, M. Nemirovsky, M. M. Swift, and O. Ünsal. Redundant memory mappings for fast access to large memories. In 42nd International Symposium on Computer Architecture (ISCA), 2015.Google Scholar
Digital Library
- L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computing, 28 (9), 1979.Google Scholar
Digital Library
- N. M. Lê, A. Guatto, A. Cohen, and A. Pop. Correct and efficient bounded FIFO queues. ph25th Symposium on Computer Architecture and High-Performance Computing (SBAC-PAD), 2013.Google Scholar
- N. M. Lê, A. Pop, A. Cohen, and F. Zappa Nardelli. Correct and efficient work-stealing for weak memory models. 18th Symposium on Principles and Practice of Parallel Programming (PPoPP), 2013.Google Scholar
Digital Library
- D. Lustig, M. Pellauer, and M. Martonosi. PipeCheck: Specifying and verifying microarchitectural enforcement of memory consistency models. 47th International Symposium on Microarchitecture (MICRO), 2014.Google Scholar
Digital Library
- D. Lustig, M. Pellauer, and M. Martonosi. Verifying correct microarchitectural enforcement of memory consistency models. IEEE Micro Top Picks of 2014, 35 (3): 72--82, May 2015.Google Scholar
- S. Mador-Haim, L. Maranget, S. Sarkar, K. Memarian, J. Alglave, S. Owens, R. Alur, M. M. K. Martin, P. Sewell, and D. Williams. An axiomatic memory model for POWER multiprocessors. In 24th International Conference on Computer Aided Verification (CAV), 2012.Google Scholar
Digital Library
- Y. A. Manerkar, D. Lustig, M. Pellauer, and M. Martonosi. CCICheck: Using μhb graphs to verify the coherence-consistency interface. 48th International Symposium on Microarchitecture (MICRO), 2015.Google Scholar
Digital Library
- J. Manson, W. Pugh, and S. V. Adve. The Java memory model. In 32nd Symposium on Principles of Programming Languages (POPL), 2005.Google Scholar
Digital Library
- P. E. McKenney, T. Riegel, J. Preshing, H. Boehm, C. Nelson, and O. Giroux. Towards implementation and use of memory_order_consume. ISO SC22 WG21 N4321, November 2014.Google Scholar
- S. Owens, S. Sarkar, and P. Sewell. A better x86 memory model: x86-TSO. In 22nd International Conference on Theorem Proving in Higher Order Logics (TPHOLs), 2009.Google Scholar
Digital Library
- B. Pham, V. Vaidyanathan, A. Jaleel, and A. Bhattacharjee. CoLT: Coalesced large-reach TLBs. In 45th International Symposium on Microarchitecture (MICRO), 2012.Google Scholar
Digital Library
- B. Pichai, L. Hsu, and A. Bhattacharjee. Architectural support for address translation on GPUs. In 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2014.Google Scholar
Digital Library
- J. Power, M. Hill, and D. Wood. Supporting x86--64 address translation for 100s of GPU lanes. In 20th International Symposium on High Performance Computer Architecture (HPCA), 2014.Google Scholar
Cross Ref
- W. Pugh. The Java memory model is fatally flawed. Concurrency - Practice and Experience, 12 (6): 445--455, 2000.Google Scholar
Cross Ref
- B. Romanescu, A. Lebeck, D. Sorin, and A. Bracy. UNified instruction/translation/data (UNITD) coherence: One protocol to rule them all. In 16th International Symposium on High Performance Computer Architecture (HPCA), 2010.Google Scholar
Cross Ref
- B. F. Romanescu, A. R. Lebeck, and D. J. Sorin. Specifying and dynamically verifying address translation-aware memory consistency. In ph15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2010.Google Scholar
- B. F. Romanescu, A. R. Lebeck, and D. J. Sorin. Address translation aware memory consistency. IEEE Micro, 31 (1), Jan 2011.Google Scholar
- S. Sarkar, P. Sewell, J. Alglave, L. Maranget, and D. Williams. Understanding POWER multiprocessors. In 32nd Conference on Programming Language Design and Implementation (PLDI), 2011.Google Scholar
Digital Library
- A. Saulsbury, F. Dahlgren, and P. Stenstrom. Recency-based TLB preloading. In 27th International Symposium on Computer Architecture (ISCA), 2000.Google Scholar
Digital Library
- T. C. Schroeder. Peer-to-peer & unified virtual addressing. NVIDIA GPU Technology Conference, 2011.Google Scholar
- P. Sewell, S. Sarkar, S. Owens, F. Zappa Nardelli, and M. O. Myreen. x86-TSO: a rigorous and usable programmer's model for x86 multiprocessors. Communications of the ACM, 53 (7), 2010.Google Scholar
- The Coq development team. The Coq proof assistant reference manual, version 8.0. LogiCal Project, 2004. URL http://coq.inria.fr.Google Scholar
- The diy development team. A don't (diy) tutorial, version 5.01, 2012. http://diy.inria.fr/doc/index.html.Google Scholar
- C. Villavieja, V. Karakostas, L. Vilanova, Y. Etsion, A. Ramirez, A. Mendelson, N. Navarro, A. Cristal, and O. Unsal. DiDi: Mitigating the performance impact of TLB shootdowns using a shared TLB directory. In 20th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2011.Google Scholar
Digital Library
Index Terms
COATCheck: Verifying Memory Ordering at the Hardware-OS Interface
Recommendations
COATCheck: Verifying Memory Ordering at the Hardware-OS Interface
ASPLOS'16Modern computer systems include numerous compute elements, from CPUs to GPUs to accelerators. Harnessing their full potential requires well-defined, properly-implemented memory consistency models (MCMs), and low-level system functionality such as ...
COATCheck: Verifying Memory Ordering at the Hardware-OS Interface
ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating SystemsModern computer systems include numerous compute elements, from CPUs to GPUs to accelerators. Harnessing their full potential requires well-defined, properly-implemented memory consistency models (MCMs), and low-level system functionality such as ...
Specifying and dynamically verifying address translation-aware memory consistency
ASPLOS '10Computer systems with virtual memory are susceptible to design bugs and runtime faults in their address translation (AT) systems. Detecting bugs and faults requires a clear specification of correct behavior. To address this need, we develop a framework ...







Comments