skip to main content
research-article
Public Access

High performance locks for multi-level NUMA systems

Published:24 January 2015Publication History
Skip Abstract Section

Abstract

Efficient locking mechanisms are critically important for high performance computers. On highly-threaded systems with a deep memory hierarchy, the throughput of traditional queueing locks, e.g., MCS locks, falls off due to NUMA effects. Two-level cohort locks perform better on NUMA systems, but fail to deliver top performance for deep NUMA hierarchies. In this paper, we describe a hierarchical variant of the MCS lock that adapts the principles of cohort locking for architectures with deep NUMA hierarchies. We describe analytical models for throughput and fairness of Cohort-MCS (C-MCS) and Hierarchical MCS (HMCS) locks that enable us to tailor these locks for high performance on any target platform without empirical tuning. Using these models, one can select parameters such that an HMCS lock will deliver better fairness than a C-MCS lock for a given throughput, or deliver better throughput for a given fairness. Our experiments show that, under high contention, a three-level HMCS lock delivers up to 7.6x higher lock throughput than a C-MCS lock on a 128-thread IBM Power 755 and a five-level HMCS lock delivers up to 72x higher lock throughput on a 4096-thread SGI UV 1000. On the K-means clustering code from the MineBench suit, a three-level HMCS lock reduces the running time by up to 55% compared to the C-MCS lock on a IBM Power 755.

References

  1. S. Boyd-Wickizer, M. F. Kaashoek, R. Morris, and N. Zeldovich. Nonscalable locks are dangerous. In Proc. Linux Symposium, 2012.Google ScholarGoogle Scholar
  2. P. Buhr, D. Dice, and W. Hesselink. High-performance n-thread software solutions for mutual exclusion. In Concurrency and Computation: Practice and Experience, Early View, 2014.Google ScholarGoogle Scholar
  3. D. Dice, V. J. Marathe, and N. Shavit. Flat-combining NUMA Locks. In Proc. of the Twenty-third Annual ACM Symp. on Parallelism in Algorithms and Architectures, SPAA ’11, pages 65–74, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Dice, V. J. Marathe, and N. Shavit. Lock Cohorting: A General Technique for Designing NUMA Locks. In Proc. of the 17th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, PPoPP ’12, pages 247–256, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. V. Luchangco, D. Nussbaum, and N. Shavit. A Hierarchical CLH Queue Lock. In Proc. of the 12th Intl. Conf. on Parallel Processing, Euro-Par’06, pages 801–810, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. S. Magnusson, A. Landin, and E. Hagersten. Queue Locks on Cache Coherent Multiprocessors. In Proc. of the 8th Intl. Symp. on Parallel Processing, pages 165–171, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. M. Mellor-Crummey and M. L. Scott. Algorithms for Scalable Synchronization on Shared-memory Multiprocessors. ACM Trans. Comput. Syst., 9(1):21–65, Feb. 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Narayanan, B. Azisikyilmaz, J. Zambreno, G. Memik, and A. N. Choudhary. MineBench: A Benchmark Suite for Data Mining Workloads. In IEEE International Symposium on Workload Characterization, pages 182–188, 2006.Google ScholarGoogle Scholar
  9. SGI. SGI Altix UV 1000 System User’s Guide. http://techpubs.sgi.com/library/manuals/5000/007-5663-003/pdf/007- 5663-003.pdf.Google ScholarGoogle Scholar

Index Terms

  1. High performance locks for multi-level NUMA systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 50, Issue 8
        PPoPP '15
        August 2015
        290 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2858788
        • Editor:
        • Andy Gill
        Issue’s Table of Contents
        • cover image ACM Conferences
          PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
          January 2015
          290 pages
          ISBN:9781450332057
          DOI:10.1145/2688500

        Copyright © 2015 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 January 2015

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!