Abstract
Efficient locking mechanisms are critically important for high performance computers. On highly-threaded systems with a deep memory hierarchy, the throughput of traditional queueing locks, e.g., MCS locks, falls off due to NUMA effects. Two-level cohort locks perform better on NUMA systems, but fail to deliver top performance for deep NUMA hierarchies. In this paper, we describe a hierarchical variant of the MCS lock that adapts the principles of cohort locking for architectures with deep NUMA hierarchies. We describe analytical models for throughput and fairness of Cohort-MCS (C-MCS) and Hierarchical MCS (HMCS) locks that enable us to tailor these locks for high performance on any target platform without empirical tuning. Using these models, one can select parameters such that an HMCS lock will deliver better fairness than a C-MCS lock for a given throughput, or deliver better throughput for a given fairness. Our experiments show that, under high contention, a three-level HMCS lock delivers up to 7.6x higher lock throughput than a C-MCS lock on a 128-thread IBM Power 755 and a five-level HMCS lock delivers up to 72x higher lock throughput on a 4096-thread SGI UV 1000. On the K-means clustering code from the MineBench suit, a three-level HMCS lock reduces the running time by up to 55% compared to the C-MCS lock on a IBM Power 755.
- S. Boyd-Wickizer, M. F. Kaashoek, R. Morris, and N. Zeldovich. Nonscalable locks are dangerous. In Proc. Linux Symposium, 2012.Google Scholar
- P. Buhr, D. Dice, and W. Hesselink. High-performance n-thread software solutions for mutual exclusion. In Concurrency and Computation: Practice and Experience, Early View, 2014.Google Scholar
- D. Dice, V. J. Marathe, and N. Shavit. Flat-combining NUMA Locks. In Proc. of the Twenty-third Annual ACM Symp. on Parallelism in Algorithms and Architectures, SPAA ’11, pages 65–74, 2011. Google Scholar
Digital Library
- D. Dice, V. J. Marathe, and N. Shavit. Lock Cohorting: A General Technique for Designing NUMA Locks. In Proc. of the 17th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, PPoPP ’12, pages 247–256, 2012. Google Scholar
Digital Library
- V. Luchangco, D. Nussbaum, and N. Shavit. A Hierarchical CLH Queue Lock. In Proc. of the 12th Intl. Conf. on Parallel Processing, Euro-Par’06, pages 801–810, 2006. Google Scholar
Digital Library
- P. S. Magnusson, A. Landin, and E. Hagersten. Queue Locks on Cache Coherent Multiprocessors. In Proc. of the 8th Intl. Symp. on Parallel Processing, pages 165–171, 1994. Google Scholar
Digital Library
- J. M. Mellor-Crummey and M. L. Scott. Algorithms for Scalable Synchronization on Shared-memory Multiprocessors. ACM Trans. Comput. Syst., 9(1):21–65, Feb. 1991. Google Scholar
Digital Library
- R. Narayanan, B. Azisikyilmaz, J. Zambreno, G. Memik, and A. N. Choudhary. MineBench: A Benchmark Suite for Data Mining Workloads. In IEEE International Symposium on Workload Characterization, pages 182–188, 2006.Google Scholar
- SGI. SGI Altix UV 1000 System User’s Guide. http://techpubs.sgi.com/library/manuals/5000/007-5663-003/pdf/007- 5663-003.pdf.Google Scholar
Index Terms
High performance locks for multi-level NUMA systems
Recommendations
High performance locks for multi-level NUMA systems
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingEfficient locking mechanisms are critically important for high performance computers. On highly-threaded systems with a deep memory hierarchy, the throughput of traditional queueing locks, e.g., MCS locks, falls off due to NUMA effects. Two-level ...
Lock Cohorting: A General Technique for Designing NUMA Locks
Special Issue on PPOPP 2012Multicore machines are quickly shifting to NUMA and CC-NUMA architectures, making scalable NUMA-aware locking algorithms, ones that take into account the machine's nonuniform memory and caching hierarchy, ever more important. This article presents lock ...
Lock cohorting: a general technique for designing NUMA locks
PPOPP '12Multicore machines are quickly shifting to NUMA and CC-NUMA architectures, making scalable NUMA-aware locking algorithms, ones that take into account the machines' non-uniform memory and caching hierarchy, ever more important. This paper presents lock ...






Comments