skip to main content
research-article

PREDATOR: predictive false sharing detection

Authors Info & Claims
Published:06 February 2014Publication History
Skip Abstract Section

Abstract

False sharing is a notorious problem for multithreaded applications that can drastically degrade both performance and scalability. Existing approaches can precisely identify the sources of false sharing, but only report false sharing actually observed during execution; they do not generalize across executions. Because false sharing is extremely sensitive to object layout, these detectors can easily miss false sharing problems that can arise due to slight differences in memory allocation order or object placement decisions by the compiler. In addition, they cannot predict the impact of false sharing on hardware with different cache line sizes.

This paper presents PREDATOR, a predictive software-based false sharing detector. PREDATOR generalizes from a single execution to precisely predict false sharing that is latent in the current execution. PREDATOR tracks accesses within a range that could lead to false sharing given different object placement. It also tracks accesses within virtual cache lines, contiguous memory ranges that span actual hardware cache lines, to predict sharing on hardware platforms with larger cache line sizes. For each, it reports the exact program location of predicted false sharing problems, ranked by their projected impact on performance. We evaluate PREDATOR across a range of benchmarks and actual applications. PREDATOR identifies problems undetectable with previous tools, including two previously-unknown false sharing problems, with no false positives. PREDATOR is able to immediately locate false sharing problems in MySQL and the Boost library that had eluded detection for years.

References

  1. E. D. Berger, K. S. McKinley, R. D. Blumofe, and P. R. Wilson. Hoard: A scalable memory allocator for multithreaded applications. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX), pages 117--128, Cambridge, MA, Nov. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. E. D. Berger, B. G. Zorn, and K. S. McKinley. Composing high-performance memory allocators. In Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation, PLDI '01, pages 114--124, New York, NY, USA, 2001. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Bienia and K. Li. PARSEC 2.0: A new benchmark suite for chip-multiprocessors. In Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation, June 2009.Google ScholarGoogle Scholar
  4. W. J. Bolosky and M. L. Scott. False sharing and its effect on shared memory performance. In SEDMS IV: USENIX Symposium on Experiences with Distributed and Multiprocessor Systems, pages 57--71, Berkeley, CA, USA, 1993. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Boyd-Wickizer, A. T. Clements, Y. Mao, A. Pesterev, M. F. Kaashoek, R. Morris, and N. Zeldovich. An analysis of Linux scalability to many cores. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI'10, pages 1--8, Berkeley, CA, USA, 2010. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Bruening, T. Garnett, and S. Amarasinghe. An infrastructure for adaptive dynamic optimization. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, CGO '03, pages 265--275, Washington, DC, USA, 2003. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J.-H. Chow and V. Sarkar. False sharing elimination by selection of runtime scheduling parameters. In ICPP '97: Proceedings of the international Conference on Parallel Processing, pages 396--403, Washington, DC, USA, 1997. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. David Dice. False sharing induced by card table marking. https://blogs.oracle.com/dave/entry/false_sharing_induced_by_card, February 2011.Google ScholarGoogle Scholar
  9. S. M. Günther and J. Weidendorfer. Assessing cache false sharing effects by dynamic binary instrumentation. In WBIA '09: Proceedings of the Workshop on Binary Instrumentation and Applications, pages 26--33, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Intel Corporation. Intel Performance Tuning Utility 3.2 Update, November 2008.Google ScholarGoogle Scholar
  11. Intel Corporation. Avoiding and identifying false sharing among threads. http://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads/, February 2010.Google ScholarGoogle Scholar
  12. T. Iskhodzhanov, R. Kleckner, and E. Stepanov. Combining compile-time and run-time instrumentation for testing tools. Programmnye produkty i sistemy, 3:224--231, 2013.Google ScholarGoogle Scholar
  13. S. Jayasena, S. Amarasinghe, A. Abeyweera, G. Amarasinghe, H. De Silva, S. Rathnayake, X. Meng, and Y. Liu. Detection of false sharing using machine learning. In Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis, SC '13, pages 30:1--30:9, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Justin L. A way to determine a process's "real" memory usage, i.e. private dirty RSS? http://stackoverflow.com/questions/118307/a-way-to-determine-a-processs-real-memory-usage-i-e-private-dirty-rss, October 2011.Google ScholarGoogle Scholar
  15. C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, CGO '04, pages 75--, Washington, DC, USA, 2004. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C.-L. Liu. False sharing analysis for multithreaded programs. Master's thesis, National Chung Cheng University, July 2009.Google ScholarGoogle Scholar
  17. T. Liu and E. D. Berger.textscSheriff: Precise detection and automatic mitigation of false sharing. In Proceedings of the 2011 ACM International Conference on Object-Oriented Programming Systems Languages and Applications, OOPSLA '11, pages 3--18, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '05, pages 190--200, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. mcmcc. False sharing in boost::detail::spinlock pool? http://stackoverflow.com/questions/11037655/false-sharing-in-boostdetailspinlock-pool, June 2012.Google ScholarGoogle Scholar
  20. Mikael Ronstrom. Mysql team increases scalability by >50% for sysbench oltp ro in mysql 5.6 labs release april 2012. http://mikaelronstrom.blogspot.com/2012/04/mysql-team-increases-scalability-by-50.html, April 2012.Google ScholarGoogle Scholar
  21. M. Nanavati, M. Spear, N. Taylor, S. Rajagopalan, D. T. Meyer, W. Aiello, and A. Warfield. Whose cache line is it anyway?: operating system support for live detection and repair of false sharing. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13, pages 141--154, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, PLDI '07, pages 89--100, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. Papadimitriou. Taming false sharing in parallel programs. Master's thesis, University of Edinburgh, 2009.Google ScholarGoogle Scholar
  24. A. Pesterev, N. Zeldovich, and R. T. Morris. Locating cache performance bottlenecks using data profiling. In EuroSys '10: Proceedings of the 5th European conference on Computer systems, pages 335--348, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating MapReduce for multi-core and multiprocessor systems. In HPCA '07: Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture, pages 13--24, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Schindewolf. Analysis of cache misses using SIMICS. Master's thesis, Institute for Computing Systems Architecture, University of Edinburgh, 2007.Google ScholarGoogle Scholar
  27. K. Serebryany, D. Bruening, A. Potapenko, and D. Vyukov. AddressSanitizer: a fast address sanity checker. In Proceedings of the 2012 USENIX Annual Technical Conference, USENIX ATC'12, pages 28--28, Berkeley, CA, USA, 2012. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Q. Zhao, D. Koh, S. Raza, D. Bruening, W.-F. Wong, and S. Amarasinghe. Dynamic cache contention detection in multi-threaded applications. In The International Conference on Virtual Execution Environments, Newport Beach, CA, Mar 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PREDATOR: predictive false sharing detection

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 49, Issue 8
        PPoPP '14
        August 2014
        390 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2692916
        Issue’s Table of Contents
        • cover image ACM Conferences
          PPoPP '14: Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
          February 2014
          412 pages
          ISBN:9781450326568
          DOI:10.1145/2555243

        Copyright © 2014 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 6 February 2014

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!