skip to main content
research-article

Contention-conscious, locality-preserving locks

Published:27 February 2016Publication History
Skip Abstract Section

Abstract

Over the last decade, the growing use of cache-coherent NUMA architectures has spurred the development of numerous locality-preserving mutual exclusion algorithms. NUMA-aware locks such as HCLH, HMCS, and cohort locks exploit locality of reference among nearby threads to deliver high lock throughput under high contention. However, the hierarchical nature of these locality-aware locks increases latency, which reduces the throughput of uncontended or lightly-contended critical sections. To date, no lock design for NUMA systems has delivered both low latency under low contention and high throughput under high contention.

In this paper, we describe the design and evaluation of an adaptive mutual exclusion scheme (AHMCS lock), which employs several orthogonal strategies---a hierarchical MCS (HMCS) lock for high throughput under high contention, Lamport's fast path approach for low latency under low contention, an adaptation mechanism that employs hysteresis to balance latency and throughput under moderate contention, and hardware transactional memory for lowest latency in the absence of contention. The result is a top performing lock that has most properties of an ideal mutual exclusion algorithm. AHMCS exploits the strengths of multiple contention management techniques to deliver high performance over a broad range of contention levels. Our empirical evaluations demonstrate the effectiveness of AHMCS over prior art.

References

  1. L. Adhianto et al. HPCToolkit: Tools for Performance Analysis of Optimized Parallel Programs. Concurrency and Computation: Practice and Experience, 22(6):685--701, Apr. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Boyd-Wickizer, M. F. Kaashoek, R. Morris, and N. Zeldovich. Non-scalable Locks are Dangerous. In Proc. Linux Symposium, 2012.Google ScholarGoogle Scholar
  3. M. Chabbi. Software Support For Efficient Use of Modern Computer Architectures. PhD thesis, The Department of Computer Science, Rice University, Houston, Texas, USA, 8 2015.Google ScholarGoogle Scholar
  4. M. Chabbi et al. High Performance Locks for Multi-level NUMA Systems. In Proc. of the 20th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, pages 215--226, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Dice et al. Applications of the adaptive transactional memory test platform. In 3rd ACM SIGPLAN Workshop on Transactional Computing, pages 1--10, 2008.Google ScholarGoogle Scholar
  6. D. Dice et al. Flat-combining NUMA Locks. In Proc. of the 23rd Annual ACM Symp. on Parallelism in Algorithms and Architectures, SPAA '11, pages 65--74, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Dice et al. Lock Cohorting: A General Technique for Designing NUMA Locks. In Proc. of the 17th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, pages 247--256, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Dice et al. Lightweight Contention Management for Efficient Compare-and-swap Operations. In Proc. of the 19th Intl. Conf. on Parallel Processing, Euro-Par'13, pages 595--606, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. Ellen et al. SNZI: Scalable NonZero Indicators. In Proc. of the 26th Annual ACM Symposium on Principles of Distributed Computing, PODC '07, pages 13--22, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Fatourou and N. D. Kallimanis. Revisiting the Combining Synchronization Technique. In Proc. of the 17th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, pages 257--266, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. R. Goodman et al. Efficient Synchronization Primitives for Large-scale Cache-coherent Multiprocessors. In Proc. of the Third Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, ASPLOS III, pages 64--75, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Guerraoui et al. Toward a Theory of Transactional Contention Managers. In Proc. of the 24th Annual ACM Symp. on Principles of Distributed Computing, PODC '05, pages 258--264, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Herlihy et al. Software Transactional Memory for Dynamic-sized Data Structures. In Proc. of the 22nd Annual Symp. on Principles of Distributed Computing, PODC '03, pages 92--101, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Hewlett Packard Enterprise. HP Integrity Superdome X. http://www8.hp.com/h20195/v2/GetPDF.aspx/c04383189.pdf.Google ScholarGoogle Scholar
  15. Intel Corp. An Introduction to the Intel® Quick-Path Interconnect. http://www.intel.com/content/www/us/en/io/quickpath-technology/quick-path-interconnect-introduction-paper.html, 2009.Google ScholarGoogle Scholar
  16. R. Johnson et al. Improving OLTP Scalability Using Speculative Lock Inheritance. Proc. VLDB Endow., 2(1):479--489, Aug. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Kim et al. Nonuniform cache architectures for wire-delay dominated on-chip caches. Micro, IEEE, 23(6):99--107, Nov 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Kleen. Lock elision in the GNU C library. https://lwn.net/Articles/534758/, 2013.Google ScholarGoogle Scholar
  19. L. Lamport. A Fast Mutual Exclusion Algorithm. ACM Transactions on Computer Systems (TOCS), 5(1):1--11, Jan. 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P.-A. Larson et al. High-performance Concurrency Control Mechanisms for Main-memory Databases. Proc. VLDB Endow., 5(4):298--309, Dec. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B.-H. Lim and A. Agarwal. Reactive Synchronization Algorithms for Multiprocessors. In Proc. of the 6th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, ASPLOS VI, pages 25--35, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J.-P. Lozi. Towards More Scalable Mutual Exclusion for Multicore Architectures. PhD thesis, Universite Pierre et Marie Curie, Paris, France, 2014. http://rclrepository.gforge.inria.fr/rcl_benchmarks_23-11-15.tar.gz.Google ScholarGoogle Scholar
  23. J.-P. Lozi et al. Remote Core Locking: Migrating Critical-section Execution to Improve the Performance of Multithreaded Applications. In Proc. of the 2012 USENIX Annual Technical Conf., USENIX ATC'12, pages 6--6, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. V. Luchangco et al. A Hierarchical CLH Queue Lock. In Proc. of the 12th Intl. Conf. on Parallel Processing, Euro-Par'06, pages 801--810, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. S. Magnusson et al. Queue Locks on Cache Coherent Multiprocessors. In Proc. of the 8th Intl. Symp. on Parallel Processing, pages 165--171, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. M. Mellor-Crummey and M. L. Scott. Algorithms for Scalable Synchronization on Shared-memory Multiprocessors. ACM Transactions on Computer Systems, 9(1):21--65, February 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. Nakaike et al. Quantitative Comparison of Hardware Transactional Memory for Blue Gene/Q, zEnterprise EC12, Intel Core, and POWER8. In Proc. of the 42nd Annual Intl. Symp. on Computer Architecture, ISCA '15, pages 144--157, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Narayanan et al. MineBench: A Benchmark Suite for Data Mining Workloads. In Proc. of the IEEE Intl. Symp. on Workload Characterization, pages 182--188, 2006.Google ScholarGoogle Scholar
  29. I. Pandis et al. Data-oriented Transaction Execution. Proc. VLDB Endow., 3(1-2):928--939, Sept. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. K. Pusukuri et al. Shuffling: A Framework for Lock Contention Aware Thread Scheduling for Multicore Multiprocessor Systems. In Proc. of the 23rd Intl. Conf. on Parallel Architectures and Compilation, PACT '14, pages 289--300, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Z. Radovic and E. Hagersten. Hierarchical Backoff Locks for Nonuniform Communication Architectures. In Proc. of the 9th Intl. Symp. on High-Performance Computer Architecture, HPCA '03, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. Rajwar and J. R. Goodman. Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution. In Proc. of the 34th Annual ACM/IEEE Intl. Symp. on Microarchitecture, MICRO 34, pages 294--305, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. W. N. Scherer, III and M. L. Scott. Advanced Contention Management for Dynamic Software Transactional Memory. In Proc. of the 24th Annual ACM Symp. on Principles of Distributed Computing, PODC '05, pages 240--248, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. SGI. SGI Altix UV 1000 System User's Guide. http://techpubs.sgi.com/library/manuals/5000/007-5663-003/pdf/007-5663-003.pdf.Google ScholarGoogle Scholar
  35. N. Shavit and A. Zemach. Combining Funnels. J. Parallel Distrib. Comput., 60(11):1355--1387, Nov. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. D. D. Sleator and R. E. Tarjan. Self-adjusting Binary Search Trees. Journal of the ACM, 32(3):652--686, July 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. F. Spear et al. A Comprehensive Strategy for Contention Management in Software Transactional Memory. In Proc. of the 14th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, pages 141--150, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. T. Usui et al. Adaptive Locks: Combining Transactions and Locks for Efficient Concurrency. In Proc. of the 2009 18th Intl. Conf. on Parallel Architectures and Compilation Techniques, PACT '09, pages 3--14, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. J.-H. Yang and J. Anderson. A fast, scalable mutual exclusion algorithm. Distributed Computing, 9(1):51--60, 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. X. Yu et al. On Adaptive Contention Management Strategies for Software Transactional Memory. In 2012 IEEE 10th Intl. Symp. on Parallel and Distributed Processing with Applications (ISPA), pages 24--31, July 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Contention-conscious, locality-preserving locks

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 51, Issue 8
        PPoPP '16
        August 2016
        405 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/3016078
        Issue’s Table of Contents
        • cover image ACM Conferences
          PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
          February 2016
          420 pages
          ISBN:9781450340922
          DOI:10.1145/2851141

        Copyright © 2016 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 27 February 2016

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!