Abstract
The most intuitive memory consistency model for shared-memory multi-threaded programming is sequential consistency (SC). However, current concurrent programming languages support a relaxed model, as such relaxations are deemed necessary for enabling important optimizations. This paper demonstrates that an SC-preserving compiler, one that ensures that every SC behavior of a compiler-generated binary is an SC behavior of the source program, retains most of the performance benefits of an optimizing compiler. The key observation is that a large class of optimizations crucial for performance are either already SC-preserving or can be modified to preserve SC while retaining much of their effectiveness. An SC-preserving compiler, obtained by restricting the optimization phases in LLVM, a state-of-the-art C/C++ compiler, incurs an average slowdown of 3.8% and a maximum slowdown of 34% on a set of 30 programs from the SPLASH-2, PARSEC, and SPEC CINT2006 benchmark suites.
While the performance overhead of preserving SC in the compiler is much less than previously assumed, it might still be unacceptable for certain applications. We believe there are several avenues for improving performance without giving up SC-preservation. In this vein, we observe that the overhead of our SC-preserving compiler arises mainly from its inability to aggressively perform a class of optimizations we identify as eager-load optimizations. This class includes common-subexpression elimination, constant propagation, global value numbering, and common cases of loop-invariant code motion. We propose a notion of interference checks in order to enable eager-load optimizations while preserving SC. Interference checks expose to the compiler a commonly used hardware speculation mechanism that can efficiently detect whether a particular variable has changed its value since last read.
- S. V. Adve and H.-J. Boehm. Memory models: A case for rethinking parallel languages and hardware. Commun. ACM, 53(8):90--101, 2010. Google Scholar
Digital Library
- S. V. Adve and K. Gharachorloo. Shared memory consistency models: a tutorial. Computer, 29(12):66--76, 1996. Google Scholar
Digital Library
- S. V. Adve and M. D. Hill. Weak ordering---a new definition. In Proceedings of ISCA, pages 2--14. ACM, 1990. Google Scholar
Digital Library
- S. V. Adve, M. D. Hill, B. P.Miller, and R. H. B. Netzer. Detecting data races on weak memory systems. In ISCA, pages 234--243, 1991. Google Scholar
Digital Library
- W. Ahn, S. Qi, J.-W. Lee, M. Nicolaides, X. Fang, J. Torrellas, D. Wong, and S. Midkiff. BulkCompiler: High-performance sequential consistency through cooperative compiler and hardware support. In 42nd International Symposium on Microarchitecture, 2009. Google Scholar
Digital Library
- M. Batty, S. Owens, S. Sarkar, P. Sewell, and T.Weber. Mathematizing C++ concurrency. In Proceedings of the 38th annual ACM SIGPLAN- SIGACT symposium on Principles of programming languages, POPL '11, pages 55--66. ACM, 2011. Google Scholar
Digital Library
- C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, October 2008. Google Scholar
Digital Library
- C. Blundell, M. M. Martin, and T. F. Wenisch. InvisiFence: Performance-transparent memory ordering in conventional multiprocessors. In Proceedings of the 36th annual International Symposium on Computer architecture, ISCA '09, pages 233--244. ACM, 2009. Google Scholar
Digital Library
- R. Bocchino, V. Adve, D. Dig, S. Adve, S. Heumann, R. Komuravelli, J. Overbey, P. Simmons, H. Sung, and M. Vakilian. A type and effect system for Deterministic Parallel Java. In OOPSLA, 2009. Google Scholar
Digital Library
- H. J. Boehm. Simple thread semantics require race detection. In FIT session at PLDI, 2009.Google Scholar
- H. J. Boehm and S. V. Adve. Foundations of the C++ concurrency memory model. In Proceedings of PLDI, pages 68--78. ACM, 2008. Google Scholar
Digital Library
- C. Boyapati and M. Rinard. A parameterized type system for race-free Java programs. In Proceedings of OOPSLA, pages 56--69. ACM Press, 2001. Google Scholar
Digital Library
- C. Boyapati, R. Lee, and M. Rinard. Ownership types for safe programming: Preventing data races and deadlocks. In Proceedings of OOPSLA, 2002. Google Scholar
Digital Library
- S. Burckhardt, M. Musuvathi, and V. Singh. Verifying local transformations on relaxed memorymodels. In Compiler Construction, volume 6011 of Lecture Notes in Computer Science, pages 104--123. Springer Berlin / Heidelberg, 2010. Google Scholar
- L. Ceze, J. Tuck, P. Montesinos, and J. Torrellas. BulkSC: Bulk enforcement of sequential consistency. In ISCA, pages 278--289, 2007. Google Scholar
Digital Library
- L. Ceze, J. Devietti, B. Lucia, and S. Qadeer. The case for system support for concurrency exceptions. In USENIX HotPar, 2009. Google Scholar
Digital Library
- W. Chuang, S. Narayanasamy, G. Venkatesh, J. Sampson, M. V. Biesbrouck, G. Pokam, B. Calder, and O. Colavin. Unbounded page-based transactional memory. International Conference on Architectural Sup- port for Programming Languages and Operating Systems, 2006. Google Scholar
Digital Library
- T. Elmas, S. Qadeer, and S. Tasiran. Goldilocks: A race and transactionaware Java runtime. In PLDI, pages 245--255, 2007. Google Scholar
Digital Library
- FeS2. The FeS2 simulator. URL http://fes2.cs.uiuc.edu/.Google Scholar
- C. Flanagan and S. Freund. FastTrack: Efficient and precise dynamic race detection. In Proceedings of PLDI, 2009. Google Scholar
Digital Library
- C. Flanagan and S. N. Freund. Type-based race detection for Java. In Proceedings of PLDI, pages 219--232, 2000. Google Scholar
Digital Library
- D. M. Gallagher, W. Y. Chen, S. A. Mahlke, J. C. Gyllenhaal, and W. mei W. Hwu. Dynamic memory disambiguation using the memory conflict buffer. In ASPLOS, pages 183--193, 1994. Google Scholar
Digital Library
- K. Gharachorloo, A. Gupta, and J. Hennessy. Two techniques to enhance the performance of memory consistency models. In Proceedings of the 1991 International Conference on Parallel Processing, volume 1, pages 355--364, 1991.Google Scholar
- C. Gniady and B. Falsafi. Speculative sequential consistency with little custom storage. In IEEE PACT, pages 179--188, 2002. Google Scholar
Digital Library
- L. Hammond, V. Wong, M. K. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun. Transactional memory coherence and consistency. In ISCA, pages 102-- 113, 2004. Google Scholar
Digital Library
- J. L. Henning. SPEC CPU2006 benchmark descriptions. SIGARCH Computer Architecture News, 34:1--17, September 2006. ISSN 0163-5964. Google Scholar
Digital Library
- M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data structures. In Proceedings of ISCA, pages 289--300. ACM, 1993. Google Scholar
Digital Library
- M. D. Hill. Multiprocessors should support simple memoryconsistency models. IEEE Computer, 31:28--34, 1998. ISSN 0018- 9162. Google Scholar
Digital Library
- Itanium. Inside the Intel Itanium 2 processor. Hewlett Packard Technical White Paper, 2002.Google Scholar
- A. Kamil, J. Su, and K. Yelick. Making sequential consistency practical in Titanium. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing, page 15. IEEE Computer Society, 2005. Google Scholar
Digital Library
- L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers, 100(28):690--691, 1979. Google Scholar
Digital Library
- C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization. IEEE Computer Society, 2004. Google Scholar
Digital Library
- C. Lin, V. Nagarajan, and R. Gupta. Efficient sequential consistency using conditional fences. In International Conference on Parallel Architectres and Compilation Techniques, 2010. Google Scholar
Digital Library
- B. Lucia, L. Ceze, K. Strauss, S. Qadeer, and H. Boehm. Conflict Exceptions: Providing simple parallel language semantics with precise hardware exceptions. In 37th Annual International Symposium on Computer Architecture, June 2010. Google Scholar
Digital Library
- S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hållberg, J. Högberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. IEEE Computer, 35(2):50--58, 2002. Google Scholar
Digital Library
- J. Manson, W. Pugh, and S. V. Adve. The java memory model. In Proceedings of POPL, pages 378--391. ACM, 2005. Google Scholar
Digital Library
- D. Marino, A. Singh, T. Millstein, M. Musuvathi, and S. Narayanasamy. DRFx: A simple and efficient memory model for concurrent programming languages. In PLDI '10, pages 351--362. ACM, 2010. Google Scholar
Digital Library
- A. Muzahid, D. Suarez, S. Qi, and J. Torrellas. SigRace: Signaturebased data race detection. In ISCA, 2009. Google Scholar
Digital Library
- V. Nagarajan and R. Gupta. Speculative optimizations for parallel programs on multicores. In LCPC, pages 323--337, 2009. Google Scholar
Digital Library
- S. Owens, S. Sarkar, and P. Sewell. A better x86 memory model: x86-TSO. In In TPHOLs '09: Conference on Theorem Proving in Higher Order Logics, volume 5674 of LNCS, pages 391--407. Springer, 2009. Google Scholar
Digital Library
- M. Postiff, D. Greene, and T. N. Mudge. The store-load address table and speculative register promotion. In MICRO, pages 235--244, 2000. Google Scholar
Digital Library
- P. Pratikakis, J. S. Foster, and M. Hicks. LOCKSMITH: Contextsensitive correlation analysis for race detection. In Proceedings of PLDI, pages 320--331, 2006. Google Scholar
Digital Library
- M. Prvulovic and J. Torrelas. Reenact: Using thread-level speculation mechanisms to debug data races inmultithreaded codes. In Proceedings of ISCA, San Diego, CA, June 2003. Google Scholar
Digital Library
- P. Ranganathan, V. Pai, and S. Adve. Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models. In SPAA '97, pages 199--210, 1997. Google Scholar
Digital Library
- D. Shasha and M. Snir. Efficient and correct execution of parallel programs that share memory. ACM Transactions on Programming Languages and Systems (TOPLAS), 10(2):282--312, 1988. Google Scholar
Digital Library
- Z. Sura, X. Fang, C. Wong, S. Midkiff, J. Lee, and D. Padua. Compiler techniques for high performance sequentially consistent java programs. In Proceedings of PPoPP, pages 2--13, 2005. Google Scholar
Digital Library
- J. Ševčík and D. Aspinall. On validity of program transformations in the Java memory model. In ECOOP, pages 27--51, 2008. Google Scholar
Digital Library
- J. Ševčík, V. Vafeiadis, F. Zappa Nardelli, S. Jagannathan, and P. Sewell. Relaxed-memory concurrency and verified compilation. In Proceedings of the 38th annual ACMSIGPLAN-SIGACT symposium on Principles of programming languages, POPL '11, pages 43--54. ACM, 2011. Google Scholar
Digital Library
- S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: characterization and methodological considerations. In ISCA, pages 24--36, New York, NY, USA, 1995. ACM. Google Scholar
Digital Library
- K. Yeager.The MIPS R10000 superscalar microprocessor. Micro, IEEE, 16(2):28--41, 2002. ISSN 0272-1732. Google Scholar
Digital Library
Index Terms
A case for an SC-preserving compiler
Recommendations
DRFX: a simple and efficient memory model for concurrent programming languages
PLDI '10: Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and ImplementationThe most intuitive memory model for shared-memory multithreaded programming is sequential consistency(SC), but it disallows the use of many compiler and hardware optimizations thereby impacting performance. Data-race-free (DRF) models, such as the ...
A case for an SC-preserving compiler
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and ImplementationThe most intuitive memory consistency model for shared-memory multi-threaded programming is sequential consistency (SC). However, current concurrent programming languages support a relaxed model, as such relaxations are deemed necessary for enabling ...
Impact of Instruction Re-Ordering on the Correctness of Shared-Memory Programs
ISPAN '05: Proceedings of the 8th International Symposium on Parallel Architectures,Algorithms and NetworksSequential consistency is an intuitive consistency model that simplifies reasoning about concurrent multiprocessor programs. Most implementations of high-performance multiprocessors, however, utilize mechanisms that allow instructions to execute out of ...







Comments