skip to main content
research-article

An Efficient Abortable-locking Protocol for Multi-level NUMA Systems

Authors Info & Claims
Published:26 January 2017Publication History
Skip Abstract Section

Abstract

The popularity of Non-Uniform Memory Access (NUMA) architectures has led to numerous locality-preserving hierarchical lock designs, such as HCLH, HMCS, and cohort locks. Locality-preserving locks trade fairness for higher throughput. Hence, some instances of acquisitions can incur long latencies, which may be intolerable for certain applications. Few locks admit a waiting thread to abandon its protocol on a timeout. State-of-the-art abortable locks are not fully locality aware, introduce high overheads, and unsuitable for frequent aborts. Enhancing locality-aware locks with lightweight timeout capability is critical for their adoption. In this paper, we design and evaluate the HMCS-T lock, a Hierarchical MCS (HMCS) lock variant that admits a timeout. HMCS-T maintains the locality benefits of HMCS while ensuring aborts to be lightweight. HMCS-T offers the progress guarantee missing in most abortable queuing locks. Our evaluations show that HMCS-T offers the timeout feature at a moderate overhead over its HMCS analog. HMCS-T, used in an MPI runtime lock, mitigated the poor scalability of an MPI+OpenMP BFS code and resulted in 4.3x superior scaling.

References

  1. A. Amer, P. Balaji, W. Bland, W. Gropp, R. Latham, H. Lu, L. Oden, A. Pena, K. Raffenetti, S. Seo, T. Rajeev, and Z. Junchao. MPICH User's Guide, Version 3.2. http://www.mpich.org/static/downloads/3.2/mpich-3.2-userguide.pdf, 2015.Google ScholarGoogle Scholar
  2. A. Amer, H. Lu, P. Balaji, and S. Matsuoka. Characterizing MPI and Hybrid MPIGoogle ScholarGoogle Scholar
  3. Threads Applications at Scale: Case Study with BFS. In Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on, pages 1075--1083. IEEE, 2015.Google ScholarGoogle Scholar
  4. A. Amer, H. Lu, Y. Wei, P. Balaji, and S. Matsuoka. MPIGoogle ScholarGoogle Scholar
  5. Threads: Runtime Contention and Remedies. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, pages 239--248, New York, NY, USA, 2015. ACM.Google ScholarGoogle Scholar
  6. A. Amer, H. Lu, Y. Wei, H. Jeff, S. Matsuoka, and P. Balaji. Locking Aspects in Multithreaded MPI Implementations. Technical Report ANL/MCS-P6005-0516, 2016.Google ScholarGoogle Scholar
  7. M. Chabbi, M. Fagan, and J. Mellor-Crummey. High Performance Locks for Multi-level NUMA Systems. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 215--226, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Chabbi and J. Mellor-Crummey. Contention-conscious, Locality-preserving Locks. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '16, pages 22:1--22:14, New York, NY, USA, 2016. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. David, R. Guerraoui, and V. Trigonakis. Everything You Always Wanted to Know About Synchronization but Were Afraid to Ask. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pages 33--48, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Dice, V. J. Marathe, and N. Shavit. Lock Cohorting: A General Technique for Designing NUMA Locks. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 247--256, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Frigo, C. E. Leiserson, and K. H. Randall. The Implementation of the Cilk-5 Multithreaded Language. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation, PLDI '98, pages 212--223, New York, NY, USA, 1998. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hewlett Packard Enterprise. HP Integrity Superdome X. http://www8.hp.com/h20195/v2/GetPDF.aspx/c04383189.pdf.Google ScholarGoogle Scholar
  13. T. Hoefler, C. Siebert, and A. Lumsdaine. Scalable Communication Protocols for Dynamic Sparse Data Exchange. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '10, pages 159--168, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. J. Holzmann. The Model Checker SPIN. IEEE Transactions on Software Engineering -- Special issue on formal methods in software practice, 23(5):279--295, May 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Intel Corp. An Introduction to the Intel® QuickPath Interconnect. http://www.intel.com/content/www/us/en/io/quickpath-technology/quick-path-interconnect-introduction-paper.html, 2009.Google ScholarGoogle Scholar
  16. P. Jayanti. Adaptive and Efficient Abortable Mutual Exclusion. In Proceedings of the Twenty-second Annual Symposium on Principles of Distributed Computing, PODC '03, pages 295--304, New York, NY, USA, 2003. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. S. Magnusson, A. Landin, and E. Hagersten. Queue Locks on Cache Coherent Multiprocessors. In Proceedings of the 8th International Symposium on Parallel Processing, pages 165--171, 1994. Google ScholarGoogle ScholarCross RefCross Ref
  18. V. Marathe, M. Moir, and N. Shavit. Composite Abortable Locks. In Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International, pages 1--10, April 2006.Google ScholarGoogle Scholar
  19. J. M. Mellor-Crummey and M. L. Scott. Algorithms for Scalable Synchronization on Shared-memory Multiprocessors. ACM Transactions on Computer Systems, 9(1):21--65, February 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Pareek and P. Woelfel. RMR-efficient Randomized Abortable Mutual Exclusion. In Proceedings of the 26th International Conference on Distributed Computing, DISC'12, pages 267--281, Berlin, Heidelberg, 2012. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. L. Scott. Non-blocking Timeout in Scalable Queue-based Spin Locks. In Proceedings of the Twenty-first Annual Symposium on Principles of Distributed Computing, PODC '02, pages 31--40, New York, NY, USA, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. L. Scott and W. N. Scherer. Scalable Queue-based Spin Locks with Timeout. In Proceedings of the Eighth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, PPoPP '01, pages 44--52, New York, NY, USA, 2001. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Scott, Michael. Non-Blocking Timeout in Scalable Queue-Based Spin Locks. https://www.cs.rochester.edu/research/synchronization/pseudocode/nb_timeout.html.Google ScholarGoogle Scholar
  24. SGI. SGI UV The World's Most Powerful In-Memory Supercomputers. https://www.sgi.com/products/servers/uv/.Google ScholarGoogle Scholar
  25. D. D. Sleator and R. E. Tarjan. Self-adjusting Binary Search Trees. Journal of the ACM, 32(3):652--686, July 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. L. G. Valiant. A Bridging Model for Parallel Computation. Communications of the ACM, 33(8):103--111, Aug. 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An Efficient Abortable-locking Protocol for Multi-level NUMA Systems

                        Recommendations

                        Comments

                        Login options

                        Check if you have access through your login credentials or your institution to get full access on this article.

                        Sign in

                        Full Access

                        PDF Format

                        View or Download as a PDF file.

                        PDF

                        eReader

                        View online with eReader.

                        eReader
                        About Cookies On This Site

                        We use cookies to ensure that we give you the best experience on our website.

                        Learn more

                        Got it!