Abstract
A longstanding challenge of shared-memory concurrency is to provide a memory model that allows for efficient implementation while providing strong and simple guarantees to programmers. The C++0x and Java memory models admit a wide variety of compiler and hardwareoptimizations and provide sequentially consistent (SC) semantics for data-race-free programs. However, they either do not provide any semantics (C++0x) or provide a hard-to-understand semantics (Java) for racy programs, compromising the safety and debuggability of such programs.
In earlier work we proposed the DRFx memory model, which addresses this problem by dynamically detecting potential violations of SC due to the interaction of compiler or hardware optimizations with data races and halting execution upon detection. In this paper, we present a detailed micro-architecture design for supporting the DRFx memory model, formalize the design and prove its correctness, and evaluate the design using a hardware simulator. We describe a set of DRFx-compliant complexity-effective optimizations which allow us to attain performance close to that of TSO (Total Store Model) and DRF0 while providing strong guarantees for all programs.
- S. Adve and K. Gharachorloo. Shared memory consistency models: a tutorial. Computer, 29 (12): 66--76, 1996. Google Scholar
Digital Library
- S. V. Adve and M. D. Hill. Weak ordering--a new definition. In ISCA '90, pages 2--14. ACM, 1990. Google Scholar
Digital Library
- S. V. Adve, M. D. Hill, B. P. Miller, and R. H. B. Netzer. Detecting data races on weak memory systems. In ISCA '91, 1991. Google Scholar
Digital Library
- W. Ahn, S. Qi, J.-W. Lee, M. Nicolaides, X. Fang, J. Torrellas, D. Wong, and S. Midkiff. BulkCompiler: High-performance sequential consistency through cooperative compiler and hardware support. In 42nd International Symposium on Microarchitecture, 2009. Google Scholar
Digital Library
- C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of PACT, October 2008. Google Scholar
Digital Library
- C. Blundell, M. Martin, and T. Wenisch. InvisiFence: performance-transparent memory ordering in conventional multiprocessors. In ISCA '09, pages 233--244, 2009. Google Scholar
Digital Library
- R. Bocchino, V. Adve, D. Dig, S. Adve, S. Heumann, R. Komuravelli, J. Overbey, P. Simmons, H. Sung, and M. Vakilian. A type and effect system for Deterministic Parallel Java. In OOPSLA, 2009. Google Scholar
Digital Library
- H. J. Boehm. Simple thread semantics require race detection. In FIT session at PLDI, 2009.Google Scholar
- H. J. Boehm and S. Adve. Foundations of the C++ concurrency memory model. In PLDI '08, pages 68--78. ACM, 2008. Google Scholar
Digital Library
- C. Boyapati and M. Rinard. A parameterized type system for race-free Java programs. In OOPSLA '01, pages 56--69, 2001. Google Scholar
Digital Library
- C. Boyapati, R. Lee, and M. Rinard. Ownership types for safe programming: Preventing data races and deadlocks. In OOPSLA '02, pages 211--230, 2002. Google Scholar
Digital Library
- Cacti. Hp labs. cacti 4.2. URL http://quid.hpl.hp.com:9081/cacti.Google Scholar
- P. Cenciarelli, A. Knapp, and E. Sibilio. The Java memory model: Operationally, denotationally, axiomatically. In ESOP '07, 2007. Google Scholar
Digital Library
- L. Ceze, J. Tuck, J. Torrellas, and C. Cascaval. Bulk disambiguation of speculative threads in multiprocessors. In ISCA '06, 2006. Google Scholar
Digital Library
- L. Ceze, J. Tuck, P. Montesinos, and J. Torrellas. BulkSC: bulk enforcement of sequential consistency. In ISCA '07, pages 278--289, 2007. Google Scholar
Digital Library
- L. Ceze, J. Devietti, B. Lucia, and S. Qadeer. The case for system support for concurrency exceptions. In USENIX HotPar, 2009. Google Scholar
Digital Library
- T. Elmas, S. Qadeer, and S. Tasiran. Goldilocks: a race and transaction-aware Java runtime. In PLDI '07, pages 149--158, 2007. Google Scholar
Digital Library
- C. Flanagan and S. Freund. FastTrack: efficient and precise dynamic race detection. In PLDI '09, pages 121--133, 2009. Google Scholar
Digital Library
- C. Flanagan and S. N. Freund. Type-based race detection for Java. In PLDI '00, pages 219--232, 2000. Google Scholar
Digital Library
- K. Gharachorloo and P. Gibbons. Detecting violations of sequential consistency. In SPAA '91, pages 316--326, 1991. Google Scholar
Digital Library
- K. Gharachorloo, A. Gupta, and J. Hennessy. Two techniques to enhance the performance of memory consistency models. In Proceedings of ICPP, volume 1, pages 355--364, 1991.Google Scholar
- Hammond, Carlstrom, Wong, Hertzberg, Chen, Kozyrakis, and Olukotun}TCCL. Hammond, B. D. Carlstrom, V. Wong, B. Hertzberg, M. Chen, C. Kozyrakis, and K. Olukotun. Programming with transactional coherence and consistency (TCC). In ASPLOS-XI, pages 1--13, 2004. Google Scholar
Digital Library
- Hammond, Wong, Chen, Carlstrom, Davis, Hertzberg, Prabhu, Wijaya, Kozyrakis, and Olukotun}Hammond04L. Hammond, V. Wong, M. K. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun. Transactional memory coherence and consistency. In ISCA, pages 102--113, 2004. Google Scholar
Digital Library
- M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA '93, pages 289--300. ACM, 1993. Google Scholar
Digital Library
- A. Kamil, J. Su, and K. Yelick. Making sequential consistency practical in Titanium. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing, page 15. IEEE Computer Society, 2005. Google Scholar
Digital Library
- A. Krishnamurthy and K. Yelick. Analyses and optimizations for shared address space programs. Journal of Parallel and Distributed Computing, 38 (2): 130--144, 1996. Google Scholar
Digital Library
- L. Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21 (7): 558--565, 1978. Google Scholar
Digital Library
- L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE transactions on computers, 100 (28): 690--691, 1979. Google Scholar
Digital Library
- C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of CGO. IEEE Computer Society, 2004. Google Scholar
Digital Library
- B. Lucia, L. Ceze, K. Strauss, S. Qadeer, and H. Boehm. Conflict exceptions: Providing simple parallel language semantics with precise hardware exceptions. In ISCA '10, pages 210--221, 2010. Google Scholar
Digital Library
- berg, Högberg, Larsson, Moestedt, and Werner}simicsS. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hållberg, J. Högberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. IEEE Computer, 35 (2): 50--58, 2002. Google Scholar
Digital Library
- J. Manson, W. Pugh, and S. Adve. The Java memory model. In POPL '05, pages 378--391. ACM, 2005. Google Scholar
Digital Library
- D. Marino, A. Singh, T. Millstein, M. Musuvathi, and S. Narayanasamy. DRFx: A simple and efficient memory model for concurrent programming languages. Technical Report 090021, UCLA Computer Science Department, Nov. 2009. URL http://fmdb.cs.ucla.edu/Treports/090021.pdf.Google Scholar
- D. Marino, A. Singh, T. Millstein, M. Musuvathi, and S. Narayanasamy. DRFx: A simple and efficient memory model for concurrent programming languages. In PLDI '10, 2010. Google Scholar
Digital Library
- A. Muzahid, D. Suarez, S. Qi, and J. Torrellas. SigRace: signature-based data race detection. In ISCA '09, pages 337--348, 2009. Google Scholar
Digital Library
- N. Neelakantam, C. Blundell, J. Devietti, M. M. K. Martin, and C. Zilles. The FeS2 simulator. In Poster session at ASPLOS '08, 2008. URL http://fes2.cs.uiuc.edu/acknowledgements.html.Google Scholar
- M. Prvulovic and J. Torrelas. Reenact: Using thread-level speculation mechanisms to debug data races in multithreaded codes. In Proceedings of ISCA, San Diego, CA, June 2003. Google Scholar
Digital Library
- P. Ranganathan, V. Pai, and S. Adve. Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models. In SPAA '97, pages 199--210, 1997. Google Scholar
Digital Library
- J. Sevcík. Private communication.Google Scholar
- J. Sevcík and D. Aspinall. On validity of program transformations in the Java memory model. In ECOOP '08, pages 27--51, 2008. Google Scholar
Digital Library
- D. Shasha and M. Snir. Efficient and correct execution of parallel programs that share memory. ACM Transactions on Programming Languages and Systems (TOPLAS), 10 (2): 282--312, 1988. Google Scholar
Digital Library
- A. Singh, D. Marino, S. Narayanasamy, T. Millstein, and M. Musuvathi. Efficient processor support for DRFx: Technical report. Technical Report 110002, UCLA Computer Science Department, Mar. 2011.Google Scholar
- Z. Sura, X. Fang, C. Wong, S. Midkiff, J. Lee, and D. Padua. Compiler techniques for high performance sequentially consistent java programs. In Proceedings of PPoPP, pages 2--13, 2005. Google Scholar
Digital Library
- T. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. Mechanisms for store-wait-free multiprocessors. In ISCA'07, 2007. Google Scholar
Digital Library
- S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The splash-2 programs: Characterization and methodological considerations. In ISCA'95, pages 24--36, 1995. Google Scholar
Digital Library
Index Terms
Efficient processor support for DRFx, a memory model with exceptions
Recommendations
Efficient processor support for DRFx, a memory model with exceptions
ASPLOS '11A longstanding challenge of shared-memory concurrency is to provide a memory model that allows for efficient implementation while providing strong and simple guarantees to programmers. The C++0x and Java memory models admit a wide variety of compiler ...
Efficient processor support for DRFx, a memory model with exceptions
ASPLOS XVI: Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systemsA longstanding challenge of shared-memory concurrency is to provide a memory model that allows for efficient implementation while providing strong and simple guarantees to programmers. The C++0x and Java memory models admit a wide variety of compiler ...
DRFX: a simple and efficient memory model for concurrent programming languages
PLDI '10The most intuitive memory model for shared-memory multithreaded programming is sequential consistency(SC), but it disallows the use of many compiler and hardware optimizations thereby impacting performance. Data-race-free (DRF) models, such as the ...







Comments