Abstract
Multicore machines are quickly shifting to NUMA and CC-NUMA architectures, making scalable NUMA-aware locking algorithms, ones that take into account the machines' non-uniform memory and caching hierarchy, ever more important. This paper presents lock cohorting, a general new technique for designing NUMA-aware locks that is as simple as it is powerful.
Lock cohorting allows one to transform any spin-lock algorithm, with minimal non-intrusive changes, into scalable NUMA-aware spin-locks. Our new cohorting technique allows us to easily create NUMA-aware versions of the TATAS-Backoff, CLH, MCS, and ticket locks, to name a few. Moreover, it allows us to derive a CLH-based cohort abortable lock, the first NUMA-aware queue lock to support abortability.
We empirically compared the performance of cohort locks with prior NUMA-aware and classic NUMA-oblivious locks on a synthetic micro-benchmark, a real world key-value store application memcached, as well as the libc memory allocator. Our results demonstrate that cohort locks perform as well or better than known locks when the load is low and significantly out-perform them as the load increases.
- libmemcached. www.libmemcached.org.Google Scholar
- memcached -- a distributed memory object caching system. www.memcached.org.Google Scholar
- A. Agarwal and M. Cherian. Adaptive backoff synchronization techniques. SIGARCH Comput. Archit. News, 17:396--406, April 1989. Google Scholar
Digital Library
- T. Craig. Building FIFO and priority-queueing spin locks from atomic swap. Technical Report TR 93-02-02, University of Washington, Dept of Computer Science, February 1993.Google Scholar
- D. Dice. US Patent # 07318128: Wakeup affinity and locality.Google Scholar
- D. Dice and A. Garthwaite. Mostly Lock Free Malloc. In Proceedings of the 3rd International Symposium on Memory Management, pages 163--174, 2002. Google Scholar
Digital Library
- D. Dice, V. Marathe, and N. Shavit. Flat Combining NUMA Locks. In Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures, 2011. Google Scholar
Digital Library
- D. Hendler, I. Incze, N. Shavit, and M. Tzafrir. Flat Combining and the Synchronization-Parallelism Tradeoff. In Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures, pages 355--364, 2010. Google Scholar
Digital Library
- M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann, 2007. Google Scholar
Digital Library
- J. Mellor-Crummey and M. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Computer Systems, 9(1):21--65, 1991. Google Scholar
Digital Library
- M. Pohlack and S. Diestelhorst. From Lightweight Hardware Transactional Memory to LightWeight Lock Elision. In Proceedings of the 6th ACM SIGPLAN Workshop on Transactional Computing, 2011.Google Scholar
- Z. Radovic and E. Hagersten. Hierarchical Backoff Locks for Nonuniform Communication Architectures. In HPCA-9, pages 241--252, Anaheim, California, USA, Feb. 2003. Google Scholar
Digital Library
- M. Scott and W. Scherer. Scalable queue-based spin locks with timeout. In Proc. 8th ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, pages 44--52, 2001. Google Scholar
Digital Library
- M. L. Scott. Non-blocking timeout in scalable queue-based spin locks. In Proceedings of the twenty-first annual symposium on Principles of distributed computing, PODC '02, pages 31--40, New York, NY, USA, 2002. ACM. Google Scholar
Digital Library
- Victor Luchangco and Dan Nussbaum and Nir Shavit. A Hierarchical CLH Queue Lock. In Proceedings of the 12th International Euro-Par Conference, pages 801--810, 2006. Google Scholar
Digital Library
Index Terms
Lock cohorting: a general technique for designing NUMA locks
Recommendations
High performance locks for multi-level NUMA systems
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingEfficient locking mechanisms are critically important for high performance computers. On highly-threaded systems with a deep memory hierarchy, the throughput of traditional queueing locks, e.g., MCS locks, falls off due to NUMA effects. Two-level ...
Lock Cohorting: A General Technique for Designing NUMA Locks
Special Issue on PPOPP 2012Multicore machines are quickly shifting to NUMA and CC-NUMA architectures, making scalable NUMA-aware locking algorithms, ones that take into account the machine's nonuniform memory and caching hierarchy, ever more important. This article presents lock ...
Lock cohorting: a general technique for designing NUMA locks
PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel ProgrammingMulticore machines are quickly shifting to NUMA and CC-NUMA architectures, making scalable NUMA-aware locking algorithms, ones that take into account the machines' non-uniform memory and caching hierarchy, ever more important. This paper presents lock ...







Comments