Abstract
Parallel programming is essential for reaping the benefits of parallel hardware, but it is notoriously difficult to develop and debug reliable, scalable software systems. One key challenge is that modern languages and systems provide poor support for ensuring concurrency correctness properties - atomicity, sequential consistency, and multithreaded determinism - because all existing approaches are impractical. Dynamic, software-based approaches slow programs by up to an order of magnitude because capturing and controlling cross-thread dependences (i.e., conflicting accesses to shared memory) requires synchronization at virtually every access to potentially shared memory.
This paper introduces a new software-based concurrency control mechanism called OCTET that soundly captures cross-thread dependences and can be used to build dynamic analyses for concurrency correctness. OCTET achieves low overheads by tracking the locality state of each potentially shared object. Non-conflicting accesses conform to the locality state and require no synchronization; only conflicting accesses require a state change and heavyweight synchronization. This optimistic tradeoff leads to significant efficiency gains in capturing cross-thread dependences: a prototype implementation of OCTET in a high-performance Java virtual machine slows real-world concurrent programs by only 26% on average. A dependence recorder, suitable for record & replay, built on top of OCTET adds an additional 5% overhead on average. These results suggest that OCTET can provide a foundation for developing low-overhead analyses that check and enforce concurrency correctness.
- S. V. Adve and H.-J. Boehm. Memory Models: A Case for Rethinking Parallel Languages and Hardware. CACM, 53:90--101, 2010. Google Scholar
Digital Library
- S. V. Adve and M. D. Hill. Weak Ordering - A New Definition. In ISCA, pages 2--14, 1990. Google Scholar
Digital Library
- B. Alpern, S. Augart, S. M. Blackburn, M. Butrico, A. Cocchi, P. Cheng, J. Dolby, S. Fink, D. Grove, M. Hind, K. S. McKinley, M. Mergen, J. E. B. Moss, T. Ngo, and V. Sarkar. The Jikes Research Virtual Machine Project: Building an Open-Source Research Community. IBM Systems Journal, 44:399--417, 2005. Google Scholar
Digital Library
- H. E. Bal, M. F. Kaashoek, and A. S. Tanenbaum. Orca: A Language For Parallel Programming of Distributed Systems. IEEE TSE, 18:190--205, 1992. Google Scholar
Digital Library
- L. Baugh, N. Neelakantam, and C. Zilles. Using Hardware Memory Protection to Build a High-Performance, Strongly-Atomic Hybrid Transactional Memory. In ISCA, pages 115--126, 2008. Google Scholar
Digital Library
- T. Bergan, O. Anderson, J. Devietti, L. Ceze, and D. Grossman. CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution. In ASPLOS, pages 53--64, 2010. Google Scholar
Digital Library
- S. Biswas, J. Huang, A. Sengupta, and M. D. Bond. DoubleChecker: Efficient Sound and Precise Atomicity Checking. Unpublished, 2013.Google Scholar
Digital Library
- S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In OOPSLA, pages 169--190, 2006. Google Scholar
Digital Library
- R. L. Bocchino, Jr., V. S. Adve, S. V. Adve, and M. Snir. Parallel Programming Must Be Deterministic by Default. In HotPar, pages 4--9, 2009. Google Scholar
Digital Library
- C. Boyapati, R. Lee, and M. Rinard. Ownership Types for Safe Programming: Preventing Data Races and Deadlocks. In OOPSLA, pages 211--230, 2002. Google Scholar
Digital Library
- M. Burrows. How to Implement Unnecessary Mutexes. In Computer Systems Theory, Technology, and Applications, pages 51--57. Springer-Verlag, 2004.Google Scholar
- L. Ceze, J. Devietti, B. Lucia, and S. Qadeer. A Case for System Support for Concurrency Exceptions. In HotPar, 2009. Google Scholar
Digital Library
- J.-D. Choi, K. Lee, A. Loginov, R. O'Callahan, V. Sarkar, and M. Sridharan. Efficient and Precise Datarace Detection for Multithreaded Object-Oriented Programs. In PLDI, pages 258--269, 2002. Google Scholar
Digital Library
- J. Devietti, B. Lucia, L. Ceze, and M. Oskin. DMP: Deterministic Shared Memory Multiprocessing. In ASPLOS, pages 85--96, 2009. Google Scholar
Digital Library
- T. Elmas, S. Qadeer, and S. Tasiran. Goldilocks: A Race and Transaction-Aware Java Runtime. In PLDI, pages 245--255, 2007. Google Scholar
Digital Library
- C. Flanagan and S. N. Freund. FastTrack: Efficient and Precise Dynamic Race Detection. In PLDI, pages 121--133, 2009. Google Scholar
Digital Library
- C. Flanagan and S. N. Freund. The RoadRunner Dynamic Analysis Framework for Concurrent Programs. In ACM Workshop on Program Analysis for Software Tools and Engineering, pages 1--8, 2010. Google Scholar
Digital Library
- C. Flanagan, S. N. Freund, and J. Yi. Velodrome: A Sound and Complete Dynamic Atomicity Checker for Multithreaded Programs. In PLDI, pages 293--303, 2008. Google Scholar
Digital Library
- K. Gharachorloo and P. B. Gibbons. Detecting Violations of Sequential Consistency. In SPAA, pages 316--326, 1991. Google Scholar
Digital Library
- T. Harris, J. Larus, and R. Rajwar. Transactional Memory. Morgan and Claypool Publishers, 2nd edition, 2010. Google Scholar
Digital Library
- M. Herlihy and J. E. B. Moss. Transactional Memory: Architectural Support for Lock-Free Data Structures. In ISCA, pages 289--300, 1993. Google Scholar
Digital Library
- B. Hindman and D. Grossman. Atomicity via Source-to-Source Translation. In MSPC, pages 82--91, 2006. Google Scholar
Digital Library
- D. R. Hower, P. Montesinos, L. Ceze, M. D. Hill, and J. Torrellas. Two Hardware-Based Approaches for Deterministic Multiprocessor Replay. CACM, 52:93--100, 2009. Google Scholar
Digital Library
- T. Kalibera, M. Mole, R. Jones, and J. Vitek. A Black-box Approach to Understanding Concurrency in DaCapo. In OOPSLA, pages 335--354, 2012. Google Scholar
Digital Library
- K. Kawachiya, A. Koseki, and T. Onodera. Lock Reservation: Java Locks Can Mostly Do Without Atomic Operations. In OOPSLA, pages 130--141, 2002. Google Scholar
Digital Library
- P. Keleher, A. L. Cox, S. Dwarkadas, and W. Zwaenepoel. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems. In USENIX, pages 115--132, 1994. Google Scholar
Digital Library
- L. I. Kontothanassis and M. L. Scott. Software Cache Coherence for Large Scale Multiprocessors. In HPCA, pages 286--295, 1995. Google Scholar
Digital Library
- L. Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. CACM, 21(7):558--565, 1978. Google Scholar
Digital Library
- L. Lamport. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Computer, 28:690--691, 1979. Google Scholar
Digital Library
- T. J. LeBlanc and J. M. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. IEEE TOC, 36:471--482, 1987. Google Scholar
Digital Library
- D. Lee, P. M. Chen, J. Flinn, and S. Narayanasamy. Chimera: Hybrid Program Analysis for Determinism. In PLDI, pages 463--474, 2012. Google Scholar
Digital Library
- D. Lee, B. Wester, K. Veeraraghavan, S. Narayanasamy, P. M. Chen, and J. Flinn. Respec: Efficient Online Multiprocessor Replay via Speculation and External Determinism. In ASPLOS, pages 77--90, 2010. Google Scholar
Digital Library
- D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. S. Lam. The Stanford Dash Multiprocessor. IEEE Computer, 25:63--79, 1992. Google Scholar
Digital Library
- T. Liu, C. Curtsinger, and E. D. Berger. Dthreads: Efficient Deterministic Multithreading. In SOSP, pages 327--336, 2011. Google Scholar
Digital Library
- B. Lucia, L. Ceze, K. Strauss, S. Qadeer, and H.-J. Boehm. Conflict Exceptions: Simplifying Concurrent Language Semantics with Precise Hardware Exceptions for Data-Races. In ISCA, pages 210--221, 2010. Google Scholar
Digital Library
- J. Manson, W. Pugh, and S. V. Adve. The Java Memory Model. In POPL, pages 378--391, 2005. Google Scholar
Digital Library
- D. Marino, A. Singh, T. Millstein, M. Musuvathi, and S. Narayanasamy. DRFx: A Simple and Efficient Memory Model for Concurrent Programming Languages. In PLDI, pages 351--362, 2010. Google Scholar
Digital Library
- K. E. Moore, J. Bobba, M. J. Moravan, M. D. Hill, and D. A. Wood. LogTM: Log-based Transactional Memory. In HPCA, pages 254--265, 2006.Google Scholar
Digital Library
- M. Naik and A. Aiken. Conditional Must Not Aliasing for Static Race Detection. In POPL, pages 327--338, 2007. Google Scholar
Digital Library
- N. Nethercote and J. Seward. How to Shadow Every Byte of Memory Used by a Program. In ACM/USENIX International Conference on Virtual Execution Environments, pages 65--74, 2007. Google Scholar
Digital Library
- M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: Efficient Deterministic Multithreading in Software. In ASPLOS, pages 97--108, 2009. Google Scholar
Digital Library
- M. Olszewski, Q. Zhao, D. Koh, J. Ansel, and S. Amarasinghe. Aikido: Accelerating Shared Data Dynamic Analyses. In ASPLOS, pages 173--184, 2012. Google Scholar
Digital Library
- T. Onodera, K. Kawachiya, and A. Koseki. Lock Reservation for Java Reconsidered. In ECOOP, pages 559--583, 2004.Google Scholar
Cross Ref
- M. S. Papamarcos and J. H. Patel. A Low-Overhead Coherence Solution for Multiprocessors with Private Cache Memories. In ISCA, pages 348--354, 1984. Google Scholar
Digital Library
- S. Park, Y. Zhou, W. Xiong, Z. Yin, R. Kaushik, K. H. Lee, and S. Lu. PRES: Probabilistic Replay with Execution Sketching on Multiprocessors. In SOSP, pages 177--192, 2009. Google Scholar
Digital Library
- K. Russell and D. Detlefs. Eliminating Synchronization-Related Atomic Operations with Biased Locking and Bulk Rebiasing. In OOPSLA, pages 263--272, 2006. Google Scholar
Digital Library
- D. J. Scales, K. Gharachorloo, and C. A. Thekkath. Shasta: A Low Overhead, Software-Only Approach for Supporting Fine-Grain Shared Memory. In ASPLOS, pages 174--185, 1996. Google Scholar
Digital Library
- I. Schoinas, B. Falsafi, A. R. Lebeck, S. K. Reinhardt, J. R. Larus, and D. A. Wood. Fine-Grain Access Control for Distributed Shared Memory. In ASPLOS, pages 297--306, 1994. Google Scholar
Digital Library
- A. Sengupta, S. Biswas, M. D. Bond, and M. Kulkarni. EnforSCer: Hybrid Static-Dynamic Analysis for End-to-End Sequential Consistency in Software. Technical Report OSU-CISRC-11/12-TR18, Computer Science & Engineering, Ohio State University, 2012.Google Scholar
- K. Veeraraghavan, D. Lee, B. Wester, J. Ouyang, P. M. Chen, J. Flinn, and S. Narayanasamy. DoublePlay: Parallelizing Sequential Logging and Replay. In ASPLOS, pages 15--26, 2011. Google Scholar
Digital Library
- C. von Praun and T. R. Gross. Object Race Detection. In OOPSLA, pages 70--82, 2001. Google Scholar
Digital Library
- C. von Praun and T. R. Gross. Static Conflict Analysis for Multi-Threaded Object-Oriented Programs. In PLDI, pages 115--128, 2003. Google Scholar
Digital Library
- L. Wang and S. D. Stoller. Runtime Analysis of Atomicity for Multithreaded Programs. IEEE TSE, 32:93--110, 2006. Google Scholar
Digital Library
- X. Yang, S. M. Blackburn, D. Frampton, and A. L. Hosking. Barriers Reconsidered, Friendlier Still! In ISMM, pages 37--48, 2012. Google Scholar
Digital Library
- X. Yang, S. M. Blackburn, D. Frampton, J. B. Sartor, and K. S. McKinley. Why Nothing Matters: The Impact of Zeroing. In OOPSLA, pages 307--324, 2011. Google Scholar
Digital Library
- M. Zhang, J. Huang, and M. D. Bond. LarkTM: Efficient, Strongly Atomic Software Transactional Memory. Technical Report OSU-CISRC-11/12-TR17, Computer Science & Engineering, Ohio State University, 2012.Google Scholar
Index Terms
OCTET: capturing and controlling cross-thread dependences efficiently
Recommendations
OCTET: capturing and controlling cross-thread dependences efficiently
OOPSLA '13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applicationsParallel programming is essential for reaping the benefits of parallel hardware, but it is notoriously difficult to develop and debug reliable, scalable software systems. One key challenge is that modern languages and systems provide poor support for ...
Hybridizing and Relaxing Dependence Tracking for Efficient Parallel Runtime Support
Special Issue: Invited papers from PPoPP 2016, Part 2It is notoriously challenging to develop parallel software systems that are both scalable and correct. Runtime support for parallelism—such as multithreaded record and replay, data race detectors, transactional memory, and enforcement of stronger memory ...
Valor: efficient, software-only region conflict exceptions
OOPSLA 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsData races complicate programming language semantics, and a data race is often a bug. Existing techniques detect data races and define their semantics by detecting conflicts between synchronization-free regions (SFRs). However, such techniques either ...







Comments