Abstract
Exploiting today's multiprocessors requires high-performance and correct concurrent systems code (optimising compilers, language runtimes, OS kernels, etc.), which in turn requires a good understanding of the observable processor behaviour that can be relied on. Unfortunately this critical hardware/software interface is not at all clear for several current multiprocessors.
In this paper we characterise the behaviour of IBM POWER multiprocessors, which have a subtle and highly relaxed memory model (ARM multiprocessors have a very similar architecture in this respect). We have conducted extensive experiments on several generations of processors: POWER G5, 5, 6, and 7. Based on these, on published details of the microarchitectures, and on discussions with IBM staff, we give an abstract-machine semantics that abstracts from most of the implementation detail but explains the behaviour of a range of subtle examples. Our semantics is explained in prose but defined in rigorous machine-processed mathematics; we also confirm that it captures the observable processor behaviour, or the architectural intent, for our examples with an executable checker. While not officially sanctioned by the vendor, we believe that this model gives a reasonable basis for reasoning about current POWER multiprocessors.
Our work should bring new clarity to concurrent systems programming for these architectures, and is a necessary precondition for any analysis or verification. It should also inform the design of languages such as C and C++, where the language memory model is constrained by what can be efficiently compiled to such multiprocessors.
- A. Adir, H. Attiya, and G. Shurek. Information-flow models for shared memory with an application to the PowerPC architecture. IEEE Trans. Parallel Distrib. Syst., 14(5):502--515, 2003. Google Scholar
Digital Library
- J. Alglave, A. Fox, S. Ishtiaq, M. O. Myreen, S. Sarkar, P. Sewell, and F. Zappa Nardelli. The semantics of Power and ARM multiprocessor machine code. In Proc. DAMP 2009, January 2009. Google Scholar
Digital Library
- S. V. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial. IEEE Computer, 29(12):66--76, 1996. Google Scholar
Digital Library
- Jade Alglave. A Shared Memory Poetics. PhD thesis, Université Paris 7 Denis Diderot, November 2010.Google Scholar
- J. Alglave, L. Maranget, S. Sarkar, and P. Sewell. Fences in weak memory models. In Proc. CAV, 2010. Google Scholar
Digital Library
- J. Alglave, L. Maranget, S. Sarkar, and P. Sewell. Litmus: Running tests against hardware. In Proc. TACAS, 2011. Google Scholar
Digital Library
- ARM. ARM Barrier Litmus Tests and Cookbook, October 2008. PRD03-GENC-007826 2.0.Google Scholar
- H.-J. Boehm and S. Adve. Foundations of the C concurrency memory model. In Proc. PLDI, 2008. Google Scholar
Digital Library
- M. Batty, S. Owens, S. Sarkar, P. Sewell, and T. Weber. Mathematizing C concurrency. In Proc. POPL, 2011. Google Scholar
Digital Library
- N. Chong and S. Ishtiaq. Reasoning about the ARM weakly consistent memory model. In MSPC, 2008. Google Scholar
Digital Library
- W.W. Collier. Reasoning about parallel architectures. Prentice-Hall, Inc., 1992. Google Scholar
Digital Library
- F. Corella, J. M. Stone, and C. M. Barton. A formal specification of the PowerPC shared memory architecture. Technical Report RC18638, IBM, 1993.Google Scholar
- M. Dubois, C. Scheurich, and F. Briggs. Memory access buffering in multiprocessors. In ISCA, 1986. Google Scholar
Digital Library
- K. Gharachorloo. Memory consistency models for shared-memory multiprocessors. WRL Research Report, 95(9), 1995. Google Scholar
Digital Library
- Intel. A formal specification of Intel Itanium processor family memory ordering, 2002. developer.intel.com/design/itanium/downloads/251429.htm.Google Scholar
- R. Joshi, L. Lamport, J. Matthews, S. Tasiran, M. Tuttle, and Y. Yu. Checking cache-coherence protocols with TLA. Form. Methods Syst. Des., 22:125--131, March 2003. Google Scholar
Digital Library
- R. Kalla, B. Sinharoy, W. J. Starke, and M. Floyd. Power7: IBM's next-generation server processor. IEEE Micro, 30:7--15, March 2010. Google Scholar
Digital Library
- D. Lea. The JSR-133 cookbook for compiler writers. http://gee.cs.oswego.edu/dl/jmm/cookbook.html.Google Scholar
- H. Q. Le, W. J. Starke, J. S. Fields, F. P. O'Connell, D. Q. Nguyen, B. J. Ronchetti, W. Sauer, E. M. Schwarz, and M. T. Vaden. IBM POWER6 microarchitecture. IBM Journal of Research and Development, 51(6):639--662, 2007. Google Scholar
Digital Library
- C. May, E. Silha, R. Simpson, and H. Warren, editors. The PowerPC architecture: a specification for a new family of RISC processors. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1994. Google Scholar
Digital Library
- S. Owens, P. Böhm, F. Zappa Nardelli, and P. Sewell. Lightweight tools for heavyweight semantics. Submitted for publication http://www.cl.cam.ac.uk/ so294/lem/.Google Scholar
- S. Owens, S. Sarkar, and P. Sewell. A better x86 memory model: x86-TSO. In Proc. TPHOLs, pages 391--407, 2009. Google Scholar
Digital Library
- Power ISA™ Version 2.06. IBM, 2009.Google Scholar
- J. M. Stone and R. P. Fitzgerald. Storage in the PowerPC. IEEE Micro, 15:50--58, April 1995. Google Scholar
Digital Library
- P. S. Sindhu, J.-M. Frailong, and M. Cekleov. Formal specification of memory models. In Scalable Shared Memory Multiprocessors, pages 25--42. Kluwer, 1991.Google Scholar
- B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, and J. B. Joyner. POWER5 system microarchitecture. IBM Journal of Research and Development, 49(4-5):505--522, 2005. Google Scholar
Digital Library
- The SPARC Architecture Manual, V. 8. SPARC International, Inc., 1992. Revision SAV080SI9308. http://www.sparc.org/standards/V8.pdf. Google Scholar
Digital Library
- S. Sarkar, P. Sewell, J. Alglave, L. Maranget, and D. Williams. Understanding POWER multiprocessors. www.cl.cam.ac.uk/users/pes20/ppc-supplemental, 2011. Google Scholar
Digital Library
- P. Sewell, S. Sarkar, S. Owens, F. Zappa Nardelli, and M. O. Myreen. x86-TSO: A rigorous and usable programmer's model for x86 multiprocessors. Communications of the ACM, 53(7):89--97, July 2010. Google Scholar
Digital Library
- S. Sarkar, P. Sewell, F. Zappa Nardelli, S. Owens, T. Ridge, T. Braibant, M. Myreen, and J. Alglave. The semantics of x86-CC multiprocessor machine code. In Proc. POPL 2009, January 2009. Google Scholar
Digital Library
- Y. Yang, G. Gopalakrishnan, G. Lindstrom, and K. Slind. Analyzing the Intel Itanium memory ordering rules using logic programming and SAT. In Proc. CHARME, LNCS 2860, 2003.Google Scholar
Cross Ref
Index Terms
Understanding POWER multiprocessors
Recommendations
Simplifying ARM concurrency: multicopy-atomic axiomatic and operational models for ARMv8
ARM has a relaxed memory model, previously specified in informal prose for ARMv7 and ARMv8. Over time, and partly due to work building formal semantics for ARM concurrency, it has become clear that some of the complexity of the model is not justified by ...
Modelling the ARMv8 architecture, operationally: concurrency and ISA
POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming LanguagesIn this paper we develop semantics for key aspects of the ARMv8 multiprocessor architecture: the concurrency model and much of the 64-bit application-level instruction set (ISA). Our goal is to clarify what the range of architecturally allowable ...
Understanding POWER multiprocessors
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and ImplementationExploiting today's multiprocessors requires high-performance and correct concurrent systems code (optimising compilers, language runtimes, OS kernels, etc.), which in turn requires a good understanding of the observable processor behaviour that can be ...







Comments