ABSTRACT
C or C++ programs relying on the pthreads interface for concurrency are required to use a specified set of functions to avoid data races, and to ensure memory visibility across threads. Although the detailed rules are not completely, it is not hard to refine them to a simple set of clear and uncontroversial rules for at least a subset of the C language that excludes structures (and hence bit-fields).
We precisely address the question of how locks in this subset must be implemented, and particularly when other memory operations can be reordered with respect to locks. This impacts the memory fences required in lock implementations, and hence has significant performance impact. Along the way, we show that a significant class of common compiler transformations are actually safe in the presence of pthreads, something which appears to have received minimal attention in the past.
We show that, surprisingly to us, the reordering constraints are not symmetric for the lock and unlock operations. In particular, it is not always safe to move memory operations into a locked region by delaying them past a pthread_mutex_lock() call, but it is safe to move them into such a region by advancing them to before a pthread_mutex_unlock() call. We believe that this was not previously recognized, and there is evidence that it is under-appreciated among implementors of thread libraries.
Although our precise arguments are expressed in terms of statement reordering within a small subset language, we believe that our results capture the situation for a full C/C++ implementation. We also argue that our results are insensitive to the details of our literal (and reasonable, though possibly unintended) interpretation of the pthread standard. We believe that they accurately reflect hardware memory ordering constraints in addition to compiler constraints. And they appear to have implications beyond pthread environments.
References
- S. V. Adve. Designing Memory Consistency Models for Shared-Memory Multiprocessors. PhD thesis, University of Wisconsin-Madison, 1993. Google Scholar
Digital Library
- S. V. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial. IEEE Computer, 29(12):66--76, 1996. Google Scholar
Digital Library
- S. V. Adve and M. D. Hill. Weak ordering---A new definition. In Proceedings of the 17th Annual International Symposium on Computer Architecture (ISCA'90), pages 2--14, 1990. Google Scholar
Digital Library
- A. Alexandrescu, H.-J. Boehm, K. Henney, B. Hutchings, D. Lea, and B. Pugh. Memory model for multithreaded C++: Issues. C++ standards committee paper WG21/N1777, http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2005/n1777.pdf, March 2005.Google Scholar
- A. Alexandrescu, H.-J. Boehm, K. Henney, D. Lea, and B. Pugh. Memory model for multithreaded C++. C++ standards committee paper WG21/N1680, http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2004/n1680.pdf, September 2004.Google Scholar
- D. F. Bacon, R. Konuru, C. Murthy, and M. Serrano. Thin locks: Featherweight synchronization for Java. In Proceedings of the ACM SIGPLAN '98 Conference on Programming Language Design and Implementation, pages 258--268, 1998. Google Scholar
Digital Library
- H. Boehm, D. Lea, and B. Pugh. Memory model for multithreaded C++: August 2005 status update. C++ standards committee paper WG21/N1876, http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2005/n1876.pdf, September 2005.Google Scholar
- H.-J. Boehm. Fast multiprocessor memory allocation and garbage collection. Technical Report HPL-2000-165, HP Laboratories, December 2000.Google Scholar
- H.-J. Boehm. The atomic_ops atomic operations package. http://www.hpl.hp.com/research/linux/atomic_ops/, 2005.Google Scholar
- H.-J. Boehm. Threads cannot be implemented as a library. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 261--268, 2005. Google Scholar
Digital Library
- G. Colvin, B. Dawes, D. Adler, and P. Dimov. The Boost shared_ptr class template. http://www.boost.org/libs/smart_ptr/shared_ptr.htm, August 2005.Google Scholar
- K. Gharachorloo. Retrospective: memory consistency and event ordering in scalable shared-memory multiprocessors. International Conference on Computer Architecture, 25 years of the international symposia on Computer architecture (selected papers), pages 67--70, 1998. Google Scholar
Digital Library
- IEEE and The Open Group. IEEE Standard 1003.1-2001. IEEE, 2001.Google Scholar
- JSR 133 Expert Group. Jsr-133: Java memory model and thread specification. http://www.cs.umd.edu/~pugh/java/memoryModel/jsr133.pdf, August 2004.Google Scholar
- A. Krishnamurthy and K. A. Yelick. Optimizing parallel programs with explicit synchronization. In SIGPLAN Conference on Programming Language Design and Implementation, pages 196--204, 1995. Google Scholar
Digital Library
- L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers, C-28(9):690--691, 1979.Google Scholar
Digital Library
- D. Lea. The JSR-133 cookbook for compiler writers. http://gee.cs.oswego.edu/dl/jmm/cookbook.html.Google Scholar
- J. Lee, D. A. Padua, and S. P. Midkiff. Basic compiler alogrithem for parallel programs. In Principles and Practice of Parallel Programming, pages 1--12, 1999. Google Scholar
Digital Library
- G. libstc++ developers. GNU standard C++ library: libstdc++-v3. http://gcc.gnu.org/viewcvs/tags/gcc_4_1_0_release/libstdc++-v3.Google Scholar
- J. Manson, W. Pugh, and S. Adve. The Java memory model (expanded version). http://www.cs.umd.edu/users/jmanson/java/journal.pdf.Google Scholar
- J. Manson, W. Pugh, and S. Adve. The Java memory model. In Conference Record of the Thirty-Second Annual ACM Symposium on Principles of Programming Languages, pages 378--391, January 2005. Google Scholar
Digital Library
- P. E. McKenney. Exploiting Deferred Destruction: An Analysis of Read-Copy-Update Techniques in Operating System Kernels. PhD thesis, OGI School of Engineering at Oregon Health and Science University, 2004. Google Scholar
Digital Library
- C. Nelson and H. Boehm. Sequencing and the concurrency memory model. C++ standards committee paper WG21/N2052, http://www.openstd.org/JTC1/SC22/WG21/docs/papers/2006/n2052.htm, September 2006.Google Scholar
- B. Pugh. The Java memory model. http://www.cs.umd.edu/~pugh/java/memoryModel/.Google Scholar
- M. L. Scott and W. N. Scherer, III. Scalable queue-based spin locks with timeout. In Principles and Practice of Parallel Programming (PPOPP), pages 44--52, 2001. Google Scholar
Digital Library
- D. Shasha and M. Snir. Efficient and correct execution of parallel programs that share memory. ACM Transactions on Programming Languages and Systems, 10(2):282--312, April 1998. Google Scholar
Digital Library
- The Open Group and IEEE. The single UNIX specification, version 3 (IEEE standard 1003.1-2001). http://unix.org/version3/, see "Base Definitions", 4.10.Google Scholar
Index Terms
Reordering constraints for pthread-style locks





Comments