skip to main content
10.1145/1693453.1693489acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Analyzing lock contention in multithreaded applications

Published:09 January 2010Publication History

ABSTRACT

Many programs exploit shared-memory parallelism using multithreading. Threaded codes typically use locks to coordinate access to shared data. In many cases, contention for locks reduces parallel efficiency and hurts scalability. Being able to quantify and attribute lock contention is important for understanding where a multithreaded program needs improvement.

This paper proposes and evaluates three strategies for gaining insight into performance losses due to lock contention. First, we consider using a straightforward strategy based on call stack profiling to attribute idle time and show that it fails to yield insight into lock contention. Second, we consider an approach that builds on a strategy previously used for analyzing idleness in work-stealing computations; we show that this strategy does not yield insight into lock contention. Finally, we propose a new technique for measurement and analysis of lock contention that uses data associated with locks to blame lock holders for the idleness of spinning threads. Our approach incurs ≤ 5% overhead on a quantum chemistry application that makes extensive use of locking (65M distinct locks, a maximum of 340K live locks, and an average of 30K lock acquisitions per second per thread) and attributes lock contention to its full static and dynamic calling contexts. Our strategy, implemented in HPCToolkit, is fully distributed and should scale well to systems with large core counts.

References

  1. T. Anderson. The performance of spin lock alternatives for shared-memory multiprocessors. IEEE Transactions on Parallel Distributed Systems, 1(1):6--16, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. T. E. Anderson and E. D. Lazowska. Quartz: a tool for tuning parallel program performance. SIGMETRICS Perform. Eval. Rev., 18(1):115--125, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. F. Bacon, R. Konuru, C. Murthy, and M. Serrano. Thin locks: featherweight synchronization for Java. In Proc. of the 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 258--268, New York, NY, USA, 1998. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. A. Bader and K. Madduri. Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors. Lecture Notes in Computer Science, 3769/2005:465--476, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. P. Breshears. Using Intel Thread Profiler for Win32 threads: Philosophy and theory. http://software.intel.com/en-us/articles/using-intel-thread-profiler-for-win32-threads-philosophy-and-theory http://software.intel.com/en-us/articles/using-intel-thread-profiler-for-win32-threads-philosophy-and-theory, August 2007.Google ScholarGoogle Scholar
  6. D. R. Butenhof. Programming with POSIX threads. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Cepeda. Performance analysis and Intel Parallel Amplifier. http://www.ddj.com/architect/217700473, May 27, 2009.Google ScholarGoogle Scholar
  8. M. Chung. Monitoring and managing Java SE 6 platform applications. http://java.sun.com/developer/technicalArticles/J2SE/monitoring, August 2006.Google ScholarGoogle Scholar
  9. DARPA High Productivity Computing Program. Scalable Synthetic Compact Application benchmarks. http://www.highproductivity.org/SSCABmks.htm.Google ScholarGoogle Scholar
  10. J. Dean, J. E. Hicks, C. A. Waldspurger, W. E. Weihl, and G. Chrysos. ProfileMe: Hardware support for instruction--level profiling on out-of-order processors. In Proc. of the 30th Annual ACM/IEEE International Symposium on Microarchitecture, pages 292--302, Washington, DC, USA, 1997. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Dice and N. Shavit. Understanding tradeoffs in software transactional memory. In Proc. of the International Symposium on Code Generation and Optimization, pages 21--33, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In Proc. of the 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 212--223, Montreal, Quebec, Canada, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Froyd, J. Mellor-Crummey, and R. Fowler. Low-overhead call path profiling of unmodified, optimized code. In Proc. of the 19th Annual International Conference on Supercomputing, pages 81--90, New York, NY, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. J. Hall. Call path profiling. In Proc. of the 14th international Conference on Software engineering, pages 296--306, New York, NY, USA, 1992. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. J. Hansen, C. A. Linthicum, and G. Brooks. Experience with a performance analyzer for multithreaded applications. In Proc. of the 1990 ACM/IEEE Conference on Supercomputing, pages 124--131, Washington, DC, USA, 1990. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. J. Harrison, G. I. Fann, T. Yanai, and G. Beylkin. Multiresolution quantum chemistry in multiwavelet bases. Lecture Notes in Computer Science, 2660/2003:103--110, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. IBM. IBM lock analyzer for Java. http://www.alphaworks.ibm.com/tech/jla.Google ScholarGoogle Scholar
  18. J. Larus and C. Kozyrakis. Transactional memory. Commun. ACM, 51(7):80--88, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Mellor-Crummey and M. Scott. Algorithms for scalable synchronization on shared--memory multiprocessors. ACM Transactions on Computer Systems, 9(1):21--65, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Olivier, J. Huan, J. Liu, J. Prins, J. Dinan, P. Sadayappan, and C.-W. Tseng. UTS: An unbalanced tree search benchmark. Lecture Notes in Computer Science, 4382/2007:235--250, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. F. Pfister and V. A. Norton. Hot-spot contention and combining in multistage interconnection networks. IEEE Transactions on Computers, C--34(10):943--948, October 1985.Google ScholarGoogle ScholarCross RefCross Ref
  22. W. N. Scherer III and M. L. Scott. Advanced contention management for dynamic software transactional memory. In Proc. of the 24th Annual ACM Symposium on Principles of Distributed Computing, pages 240--248, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. N. R. Tallent and J. Mellor-Crummey. Effective performance measurement and analysis of multithreaded applications. In Proc. of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 229--240, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. N. R. Tallent, J. Mellor-Crummey, and M. W. Fagan. Binary analysis for measurement and attribution of program performance. In Proc. of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 441--452, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Analyzing lock contention in multithreaded applications

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
              January 2010
              372 pages
              ISBN:9781605588773
              DOI:10.1145/1693453
              • cover image ACM SIGPLAN Notices
                ACM SIGPLAN Notices  Volume 45, Issue 5
                PPoPP '10
                May 2010
                346 pages
                ISSN:0362-1340
                EISSN:1558-1160
                DOI:10.1145/1837853
                Issue’s Table of Contents

              Copyright © 2010 ACM

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 9 January 2010

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate230of1,014submissions,23%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!