skip to main content
research-article

OCTET: capturing and controlling cross-thread dependences efficiently

Published:29 October 2013Publication History
Skip Abstract Section

Abstract

Parallel programming is essential for reaping the benefits of parallel hardware, but it is notoriously difficult to develop and debug reliable, scalable software systems. One key challenge is that modern languages and systems provide poor support for ensuring concurrency correctness properties - atomicity, sequential consistency, and multithreaded determinism - because all existing approaches are impractical. Dynamic, software-based approaches slow programs by up to an order of magnitude because capturing and controlling cross-thread dependences (i.e., conflicting accesses to shared memory) requires synchronization at virtually every access to potentially shared memory.

This paper introduces a new software-based concurrency control mechanism called OCTET that soundly captures cross-thread dependences and can be used to build dynamic analyses for concurrency correctness. OCTET achieves low overheads by tracking the locality state of each potentially shared object. Non-conflicting accesses conform to the locality state and require no synchronization; only conflicting accesses require a state change and heavyweight synchronization. This optimistic tradeoff leads to significant efficiency gains in capturing cross-thread dependences: a prototype implementation of OCTET in a high-performance Java virtual machine slows real-world concurrent programs by only 26% on average. A dependence recorder, suitable for record & replay, built on top of OCTET adds an additional 5% overhead on average. These results suggest that OCTET can provide a foundation for developing low-overhead analyses that check and enforce concurrency correctness.

References

  1. S. V. Adve and H.-J. Boehm. Memory Models: A Case for Rethinking Parallel Languages and Hardware. CACM, 53:90--101, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. V. Adve and M. D. Hill. Weak Ordering - A New Definition. In ISCA, pages 2--14, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Alpern, S. Augart, S. M. Blackburn, M. Butrico, A. Cocchi, P. Cheng, J. Dolby, S. Fink, D. Grove, M. Hind, K. S. McKinley, M. Mergen, J. E. B. Moss, T. Ngo, and V. Sarkar. The Jikes Research Virtual Machine Project: Building an Open-Source Research Community. IBM Systems Journal, 44:399--417, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H. E. Bal, M. F. Kaashoek, and A. S. Tanenbaum. Orca: A Language For Parallel Programming of Distributed Systems. IEEE TSE, 18:190--205, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Baugh, N. Neelakantam, and C. Zilles. Using Hardware Memory Protection to Build a High-Performance, Strongly-Atomic Hybrid Transactional Memory. In ISCA, pages 115--126, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Bergan, O. Anderson, J. Devietti, L. Ceze, and D. Grossman. CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution. In ASPLOS, pages 53--64, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Biswas, J. Huang, A. Sengupta, and M. D. Bond. DoubleChecker: Efficient Sound and Precise Atomicity Checking. Unpublished, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In OOPSLA, pages 169--190, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. L. Bocchino, Jr., V. S. Adve, S. V. Adve, and M. Snir. Parallel Programming Must Be Deterministic by Default. In HotPar, pages 4--9, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Boyapati, R. Lee, and M. Rinard. Ownership Types for Safe Programming: Preventing Data Races and Deadlocks. In OOPSLA, pages 211--230, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Burrows. How to Implement Unnecessary Mutexes. In Computer Systems Theory, Technology, and Applications, pages 51--57. Springer-Verlag, 2004.Google ScholarGoogle Scholar
  12. L. Ceze, J. Devietti, B. Lucia, and S. Qadeer. A Case for System Support for Concurrency Exceptions. In HotPar, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J.-D. Choi, K. Lee, A. Loginov, R. O'Callahan, V. Sarkar, and M. Sridharan. Efficient and Precise Datarace Detection for Multithreaded Object-Oriented Programs. In PLDI, pages 258--269, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Devietti, B. Lucia, L. Ceze, and M. Oskin. DMP: Deterministic Shared Memory Multiprocessing. In ASPLOS, pages 85--96, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Elmas, S. Qadeer, and S. Tasiran. Goldilocks: A Race and Transaction-Aware Java Runtime. In PLDI, pages 245--255, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Flanagan and S. N. Freund. FastTrack: Efficient and Precise Dynamic Race Detection. In PLDI, pages 121--133, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Flanagan and S. N. Freund. The RoadRunner Dynamic Analysis Framework for Concurrent Programs. In ACM Workshop on Program Analysis for Software Tools and Engineering, pages 1--8, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Flanagan, S. N. Freund, and J. Yi. Velodrome: A Sound and Complete Dynamic Atomicity Checker for Multithreaded Programs. In PLDI, pages 293--303, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. K. Gharachorloo and P. B. Gibbons. Detecting Violations of Sequential Consistency. In SPAA, pages 316--326, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Harris, J. Larus, and R. Rajwar. Transactional Memory. Morgan and Claypool Publishers, 2nd edition, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Herlihy and J. E. B. Moss. Transactional Memory: Architectural Support for Lock-Free Data Structures. In ISCA, pages 289--300, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. B. Hindman and D. Grossman. Atomicity via Source-to-Source Translation. In MSPC, pages 82--91, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. R. Hower, P. Montesinos, L. Ceze, M. D. Hill, and J. Torrellas. Two Hardware-Based Approaches for Deterministic Multiprocessor Replay. CACM, 52:93--100, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Kalibera, M. Mole, R. Jones, and J. Vitek. A Black-box Approach to Understanding Concurrency in DaCapo. In OOPSLA, pages 335--354, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. Kawachiya, A. Koseki, and T. Onodera. Lock Reservation: Java Locks Can Mostly Do Without Atomic Operations. In OOPSLA, pages 130--141, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Keleher, A. L. Cox, S. Dwarkadas, and W. Zwaenepoel. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems. In USENIX, pages 115--132, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. I. Kontothanassis and M. L. Scott. Software Cache Coherence for Large Scale Multiprocessors. In HPCA, pages 286--295, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. L. Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. CACM, 21(7):558--565, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Lamport. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Computer, 28:690--691, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. T. J. LeBlanc and J. M. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. IEEE TOC, 36:471--482, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Lee, P. M. Chen, J. Flinn, and S. Narayanasamy. Chimera: Hybrid Program Analysis for Determinism. In PLDI, pages 463--474, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. Lee, B. Wester, K. Veeraraghavan, S. Narayanasamy, P. M. Chen, and J. Flinn. Respec: Efficient Online Multiprocessor Replay via Speculation and External Determinism. In ASPLOS, pages 77--90, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. S. Lam. The Stanford Dash Multiprocessor. IEEE Computer, 25:63--79, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. T. Liu, C. Curtsinger, and E. D. Berger. Dthreads: Efficient Deterministic Multithreading. In SOSP, pages 327--336, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. B. Lucia, L. Ceze, K. Strauss, S. Qadeer, and H.-J. Boehm. Conflict Exceptions: Simplifying Concurrent Language Semantics with Precise Hardware Exceptions for Data-Races. In ISCA, pages 210--221, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Manson, W. Pugh, and S. V. Adve. The Java Memory Model. In POPL, pages 378--391, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. D. Marino, A. Singh, T. Millstein, M. Musuvathi, and S. Narayanasamy. DRFx: A Simple and Efficient Memory Model for Concurrent Programming Languages. In PLDI, pages 351--362, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. K. E. Moore, J. Bobba, M. J. Moravan, M. D. Hill, and D. A. Wood. LogTM: Log-based Transactional Memory. In HPCA, pages 254--265, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. Naik and A. Aiken. Conditional Must Not Aliasing for Static Race Detection. In POPL, pages 327--338, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. N. Nethercote and J. Seward. How to Shadow Every Byte of Memory Used by a Program. In ACM/USENIX International Conference on Virtual Execution Environments, pages 65--74, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: Efficient Deterministic Multithreading in Software. In ASPLOS, pages 97--108, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. M. Olszewski, Q. Zhao, D. Koh, J. Ansel, and S. Amarasinghe. Aikido: Accelerating Shared Data Dynamic Analyses. In ASPLOS, pages 173--184, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. T. Onodera, K. Kawachiya, and A. Koseki. Lock Reservation for Java Reconsidered. In ECOOP, pages 559--583, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  44. M. S. Papamarcos and J. H. Patel. A Low-Overhead Coherence Solution for Multiprocessors with Private Cache Memories. In ISCA, pages 348--354, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. S. Park, Y. Zhou, W. Xiong, Z. Yin, R. Kaushik, K. H. Lee, and S. Lu. PRES: Probabilistic Replay with Execution Sketching on Multiprocessors. In SOSP, pages 177--192, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. K. Russell and D. Detlefs. Eliminating Synchronization-Related Atomic Operations with Biased Locking and Bulk Rebiasing. In OOPSLA, pages 263--272, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. D. J. Scales, K. Gharachorloo, and C. A. Thekkath. Shasta: A Low Overhead, Software-Only Approach for Supporting Fine-Grain Shared Memory. In ASPLOS, pages 174--185, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. I. Schoinas, B. Falsafi, A. R. Lebeck, S. K. Reinhardt, J. R. Larus, and D. A. Wood. Fine-Grain Access Control for Distributed Shared Memory. In ASPLOS, pages 297--306, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. A. Sengupta, S. Biswas, M. D. Bond, and M. Kulkarni. EnforSCer: Hybrid Static-Dynamic Analysis for End-to-End Sequential Consistency in Software. Technical Report OSU-CISRC-11/12-TR18, Computer Science & Engineering, Ohio State University, 2012.Google ScholarGoogle Scholar
  50. K. Veeraraghavan, D. Lee, B. Wester, J. Ouyang, P. M. Chen, J. Flinn, and S. Narayanasamy. DoublePlay: Parallelizing Sequential Logging and Replay. In ASPLOS, pages 15--26, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. C. von Praun and T. R. Gross. Object Race Detection. In OOPSLA, pages 70--82, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. C. von Praun and T. R. Gross. Static Conflict Analysis for Multi-Threaded Object-Oriented Programs. In PLDI, pages 115--128, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. L. Wang and S. D. Stoller. Runtime Analysis of Atomicity for Multithreaded Programs. IEEE TSE, 32:93--110, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. X. Yang, S. M. Blackburn, D. Frampton, and A. L. Hosking. Barriers Reconsidered, Friendlier Still! In ISMM, pages 37--48, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. X. Yang, S. M. Blackburn, D. Frampton, J. B. Sartor, and K. S. McKinley. Why Nothing Matters: The Impact of Zeroing. In OOPSLA, pages 307--324, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. M. Zhang, J. Huang, and M. D. Bond. LarkTM: Efficient, Strongly Atomic Software Transactional Memory. Technical Report OSU-CISRC-11/12-TR17, Computer Science & Engineering, Ohio State University, 2012.Google ScholarGoogle Scholar

Index Terms

  1. OCTET: capturing and controlling cross-thread dependences efficiently

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 48, Issue 10
      OOPSLA '13
      October 2013
      867 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2544173
      Issue’s Table of Contents
      • cover image ACM Conferences
        OOPSLA '13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
        October 2013
        904 pages
        ISBN:9781450323741
        DOI:10.1145/2509136

      Copyright © 2013 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 October 2013

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!