Abstract

The popularity of Non-Uniform Memory Access (NUMA) architectures has led to numerous locality-preserving hierarchical lock designs, such as HCLH, HMCS, and cohort locks. Locality-preserving locks trade fairness for higher throughput. Hence, some instances of acquisitions can incur long latencies, which may be intolerable for certain applications. Few locks admit a waiting thread to abandon its protocol on a timeout. State-of-the-art abortable locks are not fully locality aware, introduce high overheads, and unsuitable for frequent aborts. Enhancing locality-aware locks with lightweight timeout capability is critical for their adoption. In this paper, we design and evaluate the HMCS-T lock, a Hierarchical MCS (HMCS) lock variant that admits a timeout. HMCS-T maintains the locality benefits of HMCS while ensuring aborts to be lightweight. HMCS-T offers the progress guarantee missing in most abortable queuing locks. Our evaluations show that HMCS-T offers the timeout feature at a moderate overhead over its HMCS analog. HMCS-T, used in an MPI runtime lock, mitigated the poor scalability of an MPI+OpenMP BFS code and resulted in 4.3x superior scaling.
- A. Amer, P. Balaji, W. Bland, W. Gropp, R. Latham, H. Lu, L. Oden, A. Pena, K. Raffenetti, S. Seo, T. Rajeev, and Z. Junchao. MPICH User's Guide, Version 3.2. http://www.mpich.org/static/downloads/3.2/mpich-3.2-userguide.pdf, 2015.Google Scholar
- A. Amer, H. Lu, P. Balaji, and S. Matsuoka. Characterizing MPI and Hybrid MPIGoogle Scholar
- Threads Applications at Scale: Case Study with BFS. In Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on, pages 1075--1083. IEEE, 2015.Google Scholar
- A. Amer, H. Lu, Y. Wei, P. Balaji, and S. Matsuoka. MPIGoogle Scholar
- Threads: Runtime Contention and Remedies. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, pages 239--248, New York, NY, USA, 2015. ACM.Google Scholar
- A. Amer, H. Lu, Y. Wei, H. Jeff, S. Matsuoka, and P. Balaji. Locking Aspects in Multithreaded MPI Implementations. Technical Report ANL/MCS-P6005-0516, 2016.Google Scholar
- M. Chabbi, M. Fagan, and J. Mellor-Crummey. High Performance Locks for Multi-level NUMA Systems. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 215--226, 2015. Google Scholar
Digital Library
- M. Chabbi and J. Mellor-Crummey. Contention-conscious, Locality-preserving Locks. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '16, pages 22:1--22:14, New York, NY, USA, 2016. ACM. Google Scholar
Digital Library
- T. David, R. Guerraoui, and V. Trigonakis. Everything You Always Wanted to Know About Synchronization but Were Afraid to Ask. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pages 33--48, New York, NY, USA, 2013. ACM. Google Scholar
Digital Library
- D. Dice, V. J. Marathe, and N. Shavit. Lock Cohorting: A General Technique for Designing NUMA Locks. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 247--256, 2012. Google Scholar
Digital Library
- M. Frigo, C. E. Leiserson, and K. H. Randall. The Implementation of the Cilk-5 Multithreaded Language. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation, PLDI '98, pages 212--223, New York, NY, USA, 1998. ACM. Google Scholar
Digital Library
- Hewlett Packard Enterprise. HP Integrity Superdome X. http://www8.hp.com/h20195/v2/GetPDF.aspx/c04383189.pdf.Google Scholar
- T. Hoefler, C. Siebert, and A. Lumsdaine. Scalable Communication Protocols for Dynamic Sparse Data Exchange. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '10, pages 159--168, New York, NY, USA, 2010. ACM. Google Scholar
Digital Library
- G. J. Holzmann. The Model Checker SPIN. IEEE Transactions on Software Engineering -- Special issue on formal methods in software practice, 23(5):279--295, May 1997.Google Scholar
Digital Library
- Intel Corp. An Introduction to the Intel® QuickPath Interconnect. http://www.intel.com/content/www/us/en/io/quickpath-technology/quick-path-interconnect-introduction-paper.html, 2009.Google Scholar
- P. Jayanti. Adaptive and Efficient Abortable Mutual Exclusion. In Proceedings of the Twenty-second Annual Symposium on Principles of Distributed Computing, PODC '03, pages 295--304, New York, NY, USA, 2003. ACM. Google Scholar
Digital Library
- P. S. Magnusson, A. Landin, and E. Hagersten. Queue Locks on Cache Coherent Multiprocessors. In Proceedings of the 8th International Symposium on Parallel Processing, pages 165--171, 1994. Google Scholar
Cross Ref
- V. Marathe, M. Moir, and N. Shavit. Composite Abortable Locks. In Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International, pages 1--10, April 2006.Google Scholar
- J. M. Mellor-Crummey and M. L. Scott. Algorithms for Scalable Synchronization on Shared-memory Multiprocessors. ACM Transactions on Computer Systems, 9(1):21--65, February 1991. Google Scholar
Digital Library
- A. Pareek and P. Woelfel. RMR-efficient Randomized Abortable Mutual Exclusion. In Proceedings of the 26th International Conference on Distributed Computing, DISC'12, pages 267--281, Berlin, Heidelberg, 2012. Springer-Verlag. Google Scholar
Digital Library
- M. L. Scott. Non-blocking Timeout in Scalable Queue-based Spin Locks. In Proceedings of the Twenty-first Annual Symposium on Principles of Distributed Computing, PODC '02, pages 31--40, New York, NY, USA, 2002. ACM. Google Scholar
Digital Library
- M. L. Scott and W. N. Scherer. Scalable Queue-based Spin Locks with Timeout. In Proceedings of the Eighth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, PPoPP '01, pages 44--52, New York, NY, USA, 2001. ACM. Google Scholar
Digital Library
- Scott, Michael. Non-Blocking Timeout in Scalable Queue-Based Spin Locks. https://www.cs.rochester.edu/research/synchronization/pseudocode/nb_timeout.html.Google Scholar
- SGI. SGI UV The World's Most Powerful In-Memory Supercomputers. https://www.sgi.com/products/servers/uv/.Google Scholar
- D. D. Sleator and R. E. Tarjan. Self-adjusting Binary Search Trees. Journal of the ACM, 32(3):652--686, July 1985. Google Scholar
Digital Library
- L. G. Valiant. A Bridging Model for Parallel Computation. Communications of the ACM, 33(8):103--111, Aug. 1990. Google Scholar
Digital Library
Index Terms
An Efficient Abortable-locking Protocol for Multi-level NUMA Systems
Recommendations
An Efficient Abortable-locking Protocol for Multi-level NUMA Systems
PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingThe popularity of Non-Uniform Memory Access (NUMA) architectures has led to numerous locality-preserving hierarchical lock designs, such as HCLH, HMCS, and cohort locks. Locality-preserving locks trade fairness for higher throughput. Hence, some ...
Efficient Abortable-locking Protocol for Multi-level NUMA Systems: Design and Correctness
Special Issue on PPoPP 2017 (Part 2) and Regular PapersThe popularity of Non-Uniform Memory Access (NUMA) architectures has led to numerous locality-preserving hierarchical lock designs, such as HCLH, HMCS, and cohort locks. Locality-preserving locks trade fairness for higher throughput. Hence, some ...
Queue delegation locking
SPAA '14: Proceedings of the 26th ACM symposium on Parallelism in algorithms and architecturesThe scalability of parallel programs is often bounded by the performance of synchronization mechanisms used to protect critical sections. The performance of these mechanisms is in turn determined by their ability to use modern hardware efficiently and ...







Comments