Abstract
Building correct and efficient concurrent algorithms is known to be a difficult problem of fundamental importance. To achieve efficiency, designers try to remove unnecessary and costly synchronization. However, not only is this manual trial-and-error process ad-hoc, time consuming and error-prone, but it often leaves designers pondering the question of: is it inherently impossible to eliminate certain synchronization, or is it that I was unable to eliminate it on this attempt and I should keep trying?
In this paper we respond to this question. We prove that it is impossible to build concurrent implementations of classic and ubiquitous specifications such as sets, queues, stacks, mutual exclusion and read-modify-write operations, that completely eliminate the use of expensive synchronization.
We prove that one cannot avoid the use of either: i) read-after-write (RAW), where a write to shared variable A is followed by a read to a different shared variable B without a write to B in between, or ii) atomic write-after-read (AWAR), where an atomic operation reads and then writes to shared locations. Unfortunately, enforcing RAW or AWAR is expensive on all current mainstream processors. To enforce RAW, memory ordering--also called fence or barrier--instructions must be used. To enforce AWAR, atomic instructions such as compare-and-swap are required. However, these instructions are typically substantially slower than regular instructions.
Although algorithm designers frequently struggle to avoid RAW and AWAR, their attempts are often futile. Our result characterizes the cases where avoiding RAW and AWAR is impossible. On the flip side, our result can be used to guide designers towards new algorithms where RAW and AWAR can be eliminated.
Supplemental Material
- Sarita V. Advee and Kourosh Gharachorloo. Shared memory consistency models: A tutorial. IEEE Computer, 29(12):66--76, 1996. Google Scholar
Digital Library
- Thomas E. Anderson. The performance of spin lock alternatives for shared-money multiprocessors. IEEE Trans. Parallel Distrib. Syst., 1(1):6--16, 1990. Google Scholar
Digital Library
- Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. Thread scheduling for multiprogrammed multiprocessors. In Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA, pages 119--129, June 1998. Google Scholar
Digital Library
- Hagit Attiya, Faith Fich, and Yaniv Kaplan. Lower bounds for adaptive collect and related objects. In Proceedings of the Twenty-Third Annual ACM Symposium on Principles of Distributed Computing, pages 60--69, 2004. Google Scholar
Digital Library
- Hagit Attiya, Alla Gorbach, and Shlomo Moran. Computing in totally anonymous asynchronous shared memory systems. Information and Computation, 173(2):162--183, March 2002. Google Scholar
Digital Library
- Yoah Bar-David and Gadi Taubenfeld. Automatic discovery of mutual exclusion algorithms. In Proceedings of the 17th International Conference on Distributed Computing, DISC, pages 136--150, 2003.Google Scholar
Digital Library
- Hans-J. Boehm. Reordering constraints for pthread-style locks. In Proceedings of the Twevelth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP, pages 173--182, 2007. Google Scholar
Digital Library
- Sebastian Burckhardt, Chris Dern, Madanlal Musuvathi, and Roy Tan. Line-up: a complete and automatic linearizability checker. In PLDI '10: Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation, pages 330--340, New York, NY, USA, 2010. ACM. Google Scholar
Digital Library
- James Burns and Nancy Lynch. Bounds on shared memory for mutual exclusion. Information and Computation, 107(2):171--184, December 1993. Google Scholar
Digital Library
- David Chase and Yossi Lev. Dynamic circular work-stealing deque. In Proceedings of the Seventeenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA, pages 21--28, July 2005. Google Scholar
Digital Library
- Edsger W. Dijkstra. Solution of a problem in concurrent programming control. Commun. ACM, 8(9):569, 1965. Google Scholar
Digital Library
- Faith Ellen, Panagiota Fatourou, and Eric Ruppert. Time lower bounds for implementations of multi-writer snapshots. Journal of the ACM, 54(6):30, 2007. Google Scholar
Digital Library
- Faith Fich, Danny Hendler, and Nir Shavit. On the inherent weakness of conditional primitives. Distributed Computing, 18(4):267--277, 2006. Google Scholar
Digital Library
- Faith Fich, Maurice Herlihy, and Nir Shavit. On the space complexity of randomized synchronization. Journal of the ACM, 45(5):843--862, September 1998. Google Scholar
Digital Library
- Faith Fich, Victor Luchangco, Mark Moir, and Nir Shavit. Obstruction-free step complexity: Lock-free dcas as an example. In DISC, pages 493--494, 2005. Google Scholar
Digital Library
- Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation, PLDI, pages 212--223, June 1998. Google Scholar
Digital Library
- James R. Goodman. Cache consistency and sequential consistency. Technical report, 1989. Technical report 61.Google Scholar
- Gary Graunke and Shreekant S. Thakkar. Synchronization algorithms for shared-memory multiprocessors. IEEE Computer, 23(6):60--69, 1990. Google Scholar
Digital Library
- Danny Hendler, Yossi Lev, Mark Moir, and Nir Shavit. A dynamic-sized nonblocking work stealing deque. Distributed Computing, 18(3):189--207, 2006. Google Scholar
Digital Library
- Danny Hendler and Nir Shavit. Non-blocking steal-half work queues. In Proceedings of the Twenty-First Annual ACM Symposium on Principles of Distributed Computing, pages 280--289, July 2002. Google Scholar
Digital Library
- Maurice Herlihy. Wait-free synchronization. ACM Trans. Program. Lang. Syst., 13(1):124--149, 1991. Google Scholar
Digital Library
- Maurice Herlihy and Nir Shavit. The art of multiprocessor programming. Morgan Kaufmann, 2008. Google Scholar
Digital Library
- Maurice Herlihy and Jeannette Wing. Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst., 12(3):463--492, 1990. Google Scholar
Digital Library
- Lisa Higham and Jalal Kawash. Java: Memory consistency and process coordination. In DISC, pages 201--215, 1998. Google Scholar
Digital Library
- Lisa Higham and Jalal Kawash. Bounds for mutual exclusion with only processor consistency. In DISC, pages 44--58, 2000. Google Scholar
Digital Library
- IBM System/370 Extended Architecture, Principles of Operation, 1983. Publication No. SA22--7085.Google Scholar
- Prasad Jayanti, King Tan, and Sam Toueg. Time and space lower bounds for nonblocking implementations. SIAM Journal on Computing, 30(2):438--456, 2000. Google Scholar
Digital Library
- Jalal Kawash. Limitations and Capabilities of Weak Memory Consistency Systems. PhD thesis, University of Calgary, January 2000. Google Scholar
Digital Library
- Michael Kuperstein, Martin Vechev, and Eran Yahav. Automatic inference of memory fences. In Formal Methods in Computer Aided Design, 2010. Google Scholar
Digital Library
- Leslie Lamport. Specifying concurrent program modules. ACM Trans. Program. Lang. Syst., 5(2):190--222, April 1983. Google Scholar
Digital Library
- Leslie Lamport. The mutual exclusion problem: part II - statement and solutions. J. ACM, 33(2):327--348, 1986. Google Scholar
Digital Library
- Leslie Lamport. A fast mutual exclusion algorithm. ACM Trans. Comput. Syst., 5(1):1--11, 1987. Google Scholar
Digital Library
- Jaejin Lee. Compilation Techniques for Explicitly Parallel Programs. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, 1999. Google Scholar
Digital Library
- Victor Luchangco, Mark Moir, and Nir Shavit. On the uncontended complexity of consensus. In Proceedings of the 17th International Conference on Distributed Computing, pages 45--59, October 2003.Google Scholar
Cross Ref
- Paul E. McKenney. Memory barriers: a hardware view for software hackers. Linux Technology Center, IBM Beaverton, June 2010.Google Scholar
- John M. Mellor-Crummey and Michael L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst., 9(1):21--65, 1991. Google Scholar
Digital Library
- Maged M. Michael and Michael L. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proceedings of the Fifteenth Annual ACM Symposium on Principles of Distributed Computing, pages 267--275, May 1996. Google Scholar
Digital Library
- Maged M. Michael, Martin T. Vechev, and Vijay Saraswat. Idempotent work stealing. In Proceedings of the Fourteenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP, pages 45--54, February 2009. Google Scholar
Digital Library
- Shlomo Moran, Gadi Taubenfeld, and Irit Yadin. Concurrent counting. Journal of Computer and System Sciences, 53(1):61--78, August 1996. Google Scholar
Digital Library
- Scott Owens, Susmit Sarkar, and Peter Sewell. A better x86 memory model: x86-tso. In TPHOLs, pages 391--407, 2009. Google Scholar
Digital Library
- Gary L. Peterson. Myths about the mutual exclusion problem. Inf. Process. Lett., 12(3):115--116, 1981.Google Scholar
Cross Ref
- Vijay A. Saraswat, Radha Jagadeesan, Maged M. Michael, and Christoph von Praun. A theory of memory models. In Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP, pages 161--172, March 2007. Google Scholar
Digital Library
- Susmit Sarkar, Peter Sewell, Francesco Zappa Nardelli, Scott Owens, Tom Ridge, Thomas Braibant, Magnus O. Myreen, and Jade Alglave. The semantics of x86-cc multiprocessor machine code. In POPL, pages 379--391, 2009. Google Scholar
Digital Library
- William E. Weihl. Commutativity-based concurrency control for abstract data types. IEEE Trans. Computers, 37(12):1488--1505, 1988. Google Scholar
Digital Library
- Glynn Winskel. The Formal Semantics of Programming Languages. MIT Press, 1993. Google Scholar
Digital Library
Index Terms
Laws of order: expensive synchronization in concurrent algorithms cannot be eliminated
Recommendations
Laws of order: expensive synchronization in concurrent algorithms cannot be eliminated
POPL '11: Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languagesBuilding correct and efficient concurrent algorithms is known to be a difficult problem of fundamental importance. To achieve efficiency, designers try to remove unnecessary and costly synchronization. However, not only is this manual trial-and-error ...
Reordering constraints for pthread-style locks
PPoPP '07: Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programmingC or C++ programs relying on the pthreads interface for concurrency are required to use a specified set of functions to avoid data races, and to ensure memory visibility across threads. Although the detailed rules are not completely, it is not hard to ...
Grasping the gap between blocking and non-blocking transactional memories
Transactional memory (TM) is an inherently optimistic abstraction: it allows concurrent processes to execute sequences of shared-data accesses (transactions) speculatively, with an option of aborting them in the future. Early TM designs avoided using ...







Comments