skip to main content
research-article
Public Access

Drinking from both glasses: combining pessimistic and optimistic tracking of cross-thread dependences

Published:27 February 2016Publication History
Skip Abstract Section

Abstract

It is notoriously challenging to develop parallel software systems that are both scalable and correct. Runtime support for parallelism---such as multithreaded record & replay, data race detectors, transactional memory, and enforcement of stronger memory models---helps achieve these goals, but existing commodity solutions slow programs substantially in order to track (i.e., detect or control) an execution's cross-thread dependences accurately. Prior work tracks cross-thread dependences either "pessimistically," slowing every program access, or "optimistically," allowing for lightweight instrumentation of most accesses but dramatically slowing accesses involved in cross-thread dependences.

This paper seeks to hybridize pessimistic and optimistic tracking, which is challenging because there exists a fundamental mismatch between pessimistic and optimistic tracking. We address this challenge based on insights about how dependence tracking and program synchronization interact, and introduce a novel approach called hybrid tracking. Hybrid tracking is suitable for building efficient runtime support, which we demonstrate by building hybrid-tracking-based versions of a dependence recorder and a region serializability enforcer. An adaptive, profile-based policy makes runtime decisions about switching between pessimistic and optimistic tracking. Our evaluation shows that hybrid tracking enables runtime support to overcome the performance limitations of both pessimistic and optimistic tracking alone.

Skip Supplemental Material Section

Supplemental Material

References

  1. M. Abadi, T. Harris, and M. Mehrara. Transactional Memory with Strong Atomicity Using Off-the-Shelf Memory Protection Hardware. In PPoPP, pages 185--196, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. V. Adve and H.-J. Boehm. Memory Models: A Case for Rethinking Parallel Languages and Hardware. CACM, 53:90--101, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. V. Adve and M. D. Hill. Weak Ordering---A New Definition. In ISCA, pages 2--14, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Alpern, S. Augart, S. M. Blackburn, M. Butrico, A. Cocchi, P. Cheng, J. Dolby, S. Fink, D. Grove, M. Hind, K. S. McKinley, M. Mergen, J. E. B. Moss, T. Ngo, and V. Sarkar. The Jikes Research Virtual Machine Project: Building an Open-Source Research Community. IBM Systems Journal, 44:399--417, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. F. Bacon, R. Konuru, C. Murthy, and M. Serrano. Thin Locks: Featherweight Synchronization for Java. In PLDI, pages 258--268, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Biswas, M. Zhang, M. D. Bond, and B. Lucia. Valor: Efficient, Software-only Region Conflict Exceptions. In OOPSLA, pages 241--259, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In OOPSLA, pages 169--190, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H.-J. Boehm. Position paper: Nondeterminism is Unavoidable, but Data Races are Pure Evil. In RACES, pages 9--14, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H.-J. Boehm and S. V. Adve. Foundations of the C++ Concurrency Memory Model. In PLDI, pages 68--78, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. D. Bond, M. Kulkarni, M. Cao, M. Fathi Salmi, and J. Huang. Efficient Deterministic Replay of Multithreaded Executions in a Managed Language Virtual Machine. In PPPJ, pages 90--101, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. D. Bond, M. Kulkarni, M. Cao, M. Zhang, M. Fathi Salmi, S. Biswas, A. Sengupta, and J. Huang. Octet: Capturing and Controlling Cross-Thread Dependences Efficiently. In OOPSLA, pages 693--712, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Boyapati, R. Lee, and M. Rinard. Ownership Types for Safe Programming: Preventing Data Races and Deadlocks. In OOPSLA, pages 211--230, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Burrows. How to Implement Unnecessary Mutexes. In Computer Systems Theory, Technology, and Applications, pages 51--57. Springer--Verlag, 2004.Google ScholarGoogle Scholar
  14. M. Cao, M. Zhang, and M. D. Bond. Drinking from Both Glasses: Adaptively Combining Pessimistic and Optimistic Synchronization for Efficient Parallel Runtime Support. In WoDet, 2014.Google ScholarGoogle Scholar
  15. J.-D. Choi, K. Lee, A. Loginov, R. O'Callahan, V. Sarkar, and M. Sridharan. Efficient and Precise Datarace Detection for Multithreaded Object-Oriented Programs. In PLDI, pages 258--269, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Dice, A. Kogan, Y. Lev, T. Merrifield, and M. Moir. Adaptive Integration of Hardware and Software Lock Elision Techniques. In SPAA, pages 188--197, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Elmas, S. Qadeer, and S. Tasiran. Goldilocks: A Race and Transaction-Aware Java Runtime. In PLDI, pages 245--255, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Flanagan and S. N. Freund. FastTrack: Efficient and Precise Dynamic Race Detection. In PLDI, pages 121--133, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Flanagan, S. N. Freund, and J. Yi. Velodrome: A Sound and Complete Dynamic Atomicity Checker for Multithreaded Programs. In PLDI, pages 293--303, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Harris and K. Fraser. Language Support for Lightweight Transactions. In OOPSLA, pages 388--402, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Kalibera, M. Mole, R. Jones, and J. Vitek. A Black-box Approach to Understanding Concurrency in DaCapo. In OOPSLA, pages 335--354, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Kawachiya, A. Koseki, and T. Onodera. Lock Reservation: Java Locks Can Mostly Do Without Atomic Operations. In OOPSLA, pages 130--141, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. CACM, 21(7):558--565, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. J. LeBlanc and J. M. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. IEEE TOC, 36:471--482, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Lee, P. M. Chen, J. Flinn, and S. Narayanasamy. Chimera: Hybrid Program Analysis for Determinism. In PLDI, pages 463--474, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Lee, B. Wester, K. Veeraraghavan, S. Narayanasamy, P. M. Chen, and J. Flinn. Respec: Efficient Online Multiprocessor Replay via Speculation and External Determinism. In ASPLOS, pages 77--90, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Manson, W. Pugh, and S. V. Adve. The Java Memory Model. In POPL, pages 378--391, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. S. Matar, I. Kuru, S. Tasiran, and R. Dementiev. Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support. In WoDet, 2014.Google ScholarGoogle Scholar
  29. J. Ouyang, P. M. Chen, J. Flinn, and S. Narayanasamy. ...and region serializability for all. In HotPar, 2013.Google ScholarGoogle Scholar
  30. S. Park, Y. Zhou, W. Xiong, Z. Yin, R. Kaushik, K. H. Lee, and S. Lu. PRES: Probabilistic Replay with Execution Sketching on Multiprocessors. In SOSP, pages 177--192, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. G. Ritson and F. R. Barnes. An Evaluation of Intel's Restricted Transactional Memory for CPAs. In CPA, pages 271--292, 2013.Google ScholarGoogle Scholar
  32. M. Ronsse and K. De Bosschere. RecPlay: A Fully Integrated Practical Record/Replay System. TOCS, 17:133--152, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. Russell and D. Detlefs. Eliminating Synchronization-Related Atomic Operations with Biased Locking and Bulk Rebiasing. In OOPSLA, pages 263--272, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. B. Saha, A.-R. Adl-Tabatabai, R. L. Hudson, C. C. Minh, and B. Hertzberg. McRT-STM: A High Performance Software Transactional Memory System for a Multi-Core Runtime. In PPoPP, pages 187--197, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. J. Scales, K. Gharachorloo, and C. A. Thekkath. Shasta: A Low Overhead, Software-Only Approach for Supporting Fine-Grain Shared Memory. In ASPLOS, pages 174--185, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. Sengupta, S. Biswas, M. Zhang, M. D. Bond, and M. Kulkarni. Hybrid Static--Dynamic Analysis for Statically Bounded Region Serializability. In ASPLOS, pages 561--575, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. T. Usui, R. Behrends, J. Evans, and Y. Smaragdakis. Adaptive Locks: Combining Transactions and Locks for Efficient Concurrency. In PACT, pages 3--14, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. K. Veeraraghavan, D. Lee, B. Wester, J. Ouyang, P. M. Chen, J. Flinn, and S. Narayanasamy. DoublePlay: Parallelizing Sequential Logging and Replay. In ASPLOS, pages 15--26, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. C. von Praun and T. R. Gross. Object Race Detection. In OOPSLA, pages 70--82, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. C. von Praun and T. R. Gross. Static Conflict Analysis for Multi-Threaded Object-Oriented Programs. In PLDI, pages 115--128, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. D. Weeratunge, X. Zhang, and S. Jagannathan. Analyzing Multicore Dumps to Facilitate Concurrency Bug Reproduction. In ASPLOS, pages 155--166, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. R. M. Yoo, C. J. Hughes, K. Lai, and R. Rajwar. Performance Evaluation of Intel Transactional Synchronization Extensions for High-Performance Computing. In SC, pages 19:1--19:11, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. O. Ziv, A. Aiken, G. Golan-Gueta, G. Ramalingam, and M. Sagiv. Composing Concurrency Control. In PLDI, pages 240--249, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 51, Issue 8
    PPoPP '16
    August 2016
    405 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/3016078
    Issue’s Table of Contents
    • cover image ACM Conferences
      PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
      February 2016
      420 pages
      ISBN:9781450340922
      DOI:10.1145/2851141

    Copyright © 2016 ACM

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 27 February 2016

    Check for updates

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!