Abstract
Non-Uniform Memory Access (NUMA) architectures are gaining importance in mainstream computing systems due to the rapid growth of multi-core multi-chip machines. Extracting the best possible performance from these new machines will require us to revisit the design of the concurrent algorithms and synchronization primitives which form the building blocks of many of today's applications. This paper revisits one such critical synchronization primitive -- the reader-writer lock.
We present what is, to the best of our knowledge, the first family of reader-writer lock algorithms tailored to NUMA architectures. We present several variations which trade fairness between readers and writers for higher concurrency among readers and better back-to-back batching of writers from the same NUMA node. Our algorithms leverage the lock cohorting technique to manage synchronization between writers in a NUMA-friendly fashion, binary flags to coordinate readers and writers, and simple distributed reader counter implementations to enable NUMA-friendly concurrency among readers. The end result is a collection of surprisingly simple NUMA-aware algorithms that outperform the state-of-the-art reader-writer locks by up to a factor of 10 in our microbenchmark experiments. To evaluate our algorithms in a realistic setting we also present performance results of the kccachetest benchmark of the Kyoto-Cabinet distribution, an open-source database which makes heavy use of pthread reader-writer locks. Our locks boost the performance of kccachetest by up to 40% over the best prior alternatives.
- B. B. Brandenburg and J. H. Anderson. Spin-based Reader-Writer Synchronization for Multiprocessor Real-time Systems. Real-Time Syst., 46(1):25--87, 2010. Google Scholar
Digital Library
- P. J. Courtois, F. Heymans, and D. L. Parnas. Concurrent control with "readers" and "writers". Communications of the ACM, 14(10):667--668, 1971. Google Scholar
Digital Library
- D. Dice, V. J. Marathe, and N. Shavit. Flat Combining NUMA Locks. In Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures, 2011. Google Scholar
Digital Library
- D. Dice. Solaris Scheduling: SPARC and CPUIDs. URL https://blogs.oracle.com/dave/entry/solaris_scheduling_and_cpuids.Google Scholar
- D. Dice. A Partitioned Ticket Lock. In Proceedings of the 23rd ACM Aymposium on Parallelism in Algorithms and Architectures, pages 309--310, 2011. Google Scholar
Digital Library
- D. Dice and N. Shavit. TLRW: Return of the Read-Write Lock. In Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures, pages 284--293, 2010. Google Scholar
Digital Library
- D. Dice, V. J. Marathe, and N. Shavit. Lock Cohorting: A General Technique for Designing NUMA Locks. In Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pages 247--256, 2012. Google Scholar
Digital Library
- E. W. Dijkstra. The origin of concurrent programming. chapter Cooperating sequential processes, pages 65--138. 2002. Google Scholar
Digital Library
- F. Ellen, Y. Lev, V. Luchangco, andM.Moir. SNZI: Scalable NonZero Indicators. In Proceedings of the 26th Annual ACM Symposium on Principles of Distributed Computing, pages 13--22, 2007. Google Scholar
Digital Library
- E. Freudenthal and A. Gottlieb. Process coordination with fetchand-increment. In Proceedings of the 4th International Conferenceon Architectural Support for Programming Languages and Operating Systems, pages 260--268, 1991. Google Scholar
Digital Library
- W. C. Hsieh and W. E. Weihl. Scalable Reader-Writer Locks for Parallel Systems. In Proceedings of the Sixth International Parallel Processing Symposium, 1991. Google Scholar
Digital Library
- J. M. Mellor-Crummey and M. L. Scott. Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors. ACM Transactions on Computer Systems, 9(1):21--65, 1991. Google Scholar
Digital Library
- J. M. Mellor-Crummey and M. L. Scott. Synchronization without Contention. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 269--278, 1991. Google Scholar
Digital Library
- O. Krieger, M. Stumm, R. Unrau, and J. Hanna. A Fair Fast Scalable Reader-Writer Lock. In Proceedings of the 1993 International Conference on Parallel Processing, pages 201--204, 1993. Google Scholar
Digital Library
- Y. Lev, V. Luchangco, and M. Olszewski. Scalable Reader-Writer Locks. In Proceedings of the 21st Annual Symposium on Parallelism in Algorithms and Architectures, pages 101--110, 2009. Google Scholar
Digital Library
- J. M. Mellor-Crummey and M. L. Scott. Scalable Reader-Writer Synchronization for Shared-MemoryMultiprocessors. In Proceedings of the 3rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 106--113, 1991. Google Scholar
Digital Library
- Z. Radovic and E. Hagersten. Hierarchical Backoff Locks for Nonuniform Communication Architectures. In HPCA-9, pages 241--252, Anaheim, California, USA, Feb. 2003. Google Scholar
Digital Library
- J. Shirako, N. Vrvilo, E. G.Mercer, and V. Sarkar. Design, verification and applications of a new read-write lock algorithm. In Proceedinbgs of the 24th ACM symposium on Parallelism in algorithms and architectures, pages 48--57, 2012. Google Scholar
Digital Library
- Victor Luchangco and Dan Nussbaum and Nir Shavit. A Hierarchical CLH Queue Lock. In Proceedings of the 12th International Euro-Par Conference, pages 801--810, 2006. Google Scholar
Digital Library
- D. Vyukov. Distributed Reader-Writer Mutex. URL http://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem/distributed-reader-writer-mutex.Google Scholar
Index Terms
NUMA-aware reader-writer locks
Recommendations
High performance locks for multi-level NUMA systems
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingEfficient locking mechanisms are critically important for high performance computers. On highly-threaded systems with a deep memory hierarchy, the throughput of traditional queueing locks, e.g., MCS locks, falls off due to NUMA effects. Two-level ...
Lock Cohorting: A General Technique for Designing NUMA Locks
Special Issue on PPOPP 2012Multicore machines are quickly shifting to NUMA and CC-NUMA architectures, making scalable NUMA-aware locking algorithms, ones that take into account the machine's nonuniform memory and caching hierarchy, ever more important. This article presents lock ...
Scalable reader-writer locks
SPAA '09: Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architecturesWe present three new reader-writer lock algorithms that scale under high read-only contention. Many previous reader-writer locks suffer significant degradation when many readers attempt to acquire the lock concurrently, even though they are all allowed ...







Comments