skip to main content
research-article

A case for an SC-preserving compiler

Published:04 June 2011Publication History
Skip Abstract Section

Abstract

The most intuitive memory consistency model for shared-memory multi-threaded programming is sequential consistency (SC). However, current concurrent programming languages support a relaxed model, as such relaxations are deemed necessary for enabling important optimizations. This paper demonstrates that an SC-preserving compiler, one that ensures that every SC behavior of a compiler-generated binary is an SC behavior of the source program, retains most of the performance benefits of an optimizing compiler. The key observation is that a large class of optimizations crucial for performance are either already SC-preserving or can be modified to preserve SC while retaining much of their effectiveness. An SC-preserving compiler, obtained by restricting the optimization phases in LLVM, a state-of-the-art C/C++ compiler, incurs an average slowdown of 3.8% and a maximum slowdown of 34% on a set of 30 programs from the SPLASH-2, PARSEC, and SPEC CINT2006 benchmark suites.

While the performance overhead of preserving SC in the compiler is much less than previously assumed, it might still be unacceptable for certain applications. We believe there are several avenues for improving performance without giving up SC-preservation. In this vein, we observe that the overhead of our SC-preserving compiler arises mainly from its inability to aggressively perform a class of optimizations we identify as eager-load optimizations. This class includes common-subexpression elimination, constant propagation, global value numbering, and common cases of loop-invariant code motion. We propose a notion of interference checks in order to enable eager-load optimizations while preserving SC. Interference checks expose to the compiler a commonly used hardware speculation mechanism that can efficiently detect whether a particular variable has changed its value since last read.

References

  1. S. V. Adve and H.-J. Boehm. Memory models: A case for rethinking parallel languages and hardware. Commun. ACM, 53(8):90--101, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. V. Adve and K. Gharachorloo. Shared memory consistency models: a tutorial. Computer, 29(12):66--76, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. V. Adve and M. D. Hill. Weak ordering---a new definition. In Proceedings of ISCA, pages 2--14. ACM, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. V. Adve, M. D. Hill, B. P.Miller, and R. H. B. Netzer. Detecting data races on weak memory systems. In ISCA, pages 234--243, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. W. Ahn, S. Qi, J.-W. Lee, M. Nicolaides, X. Fang, J. Torrellas, D. Wong, and S. Midkiff. BulkCompiler: High-performance sequential consistency through cooperative compiler and hardware support. In 42nd International Symposium on Microarchitecture, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Batty, S. Owens, S. Sarkar, P. Sewell, and T.Weber. Mathematizing C++ concurrency. In Proceedings of the 38th annual ACM SIGPLAN- SIGACT symposium on Principles of programming languages, POPL '11, pages 55--66. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, October 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Blundell, M. M. Martin, and T. F. Wenisch. InvisiFence: Performance-transparent memory ordering in conventional multiprocessors. In Proceedings of the 36th annual International Symposium on Computer architecture, ISCA '09, pages 233--244. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Bocchino, V. Adve, D. Dig, S. Adve, S. Heumann, R. Komuravelli, J. Overbey, P. Simmons, H. Sung, and M. Vakilian. A type and effect system for Deterministic Parallel Java. In OOPSLA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. J. Boehm. Simple thread semantics require race detection. In FIT session at PLDI, 2009.Google ScholarGoogle Scholar
  11. H. J. Boehm and S. V. Adve. Foundations of the C++ concurrency memory model. In Proceedings of PLDI, pages 68--78. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Boyapati and M. Rinard. A parameterized type system for race-free Java programs. In Proceedings of OOPSLA, pages 56--69. ACM Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Boyapati, R. Lee, and M. Rinard. Ownership types for safe programming: Preventing data races and deadlocks. In Proceedings of OOPSLA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Burckhardt, M. Musuvathi, and V. Singh. Verifying local transformations on relaxed memorymodels. In Compiler Construction, volume 6011 of Lecture Notes in Computer Science, pages 104--123. Springer Berlin / Heidelberg, 2010. Google ScholarGoogle Scholar
  15. L. Ceze, J. Tuck, P. Montesinos, and J. Torrellas. BulkSC: Bulk enforcement of sequential consistency. In ISCA, pages 278--289, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Ceze, J. Devietti, B. Lucia, and S. Qadeer. The case for system support for concurrency exceptions. In USENIX HotPar, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. W. Chuang, S. Narayanasamy, G. Venkatesh, J. Sampson, M. V. Biesbrouck, G. Pokam, B. Calder, and O. Colavin. Unbounded page-based transactional memory. International Conference on Architectural Sup- port for Programming Languages and Operating Systems, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Elmas, S. Qadeer, and S. Tasiran. Goldilocks: A race and transactionaware Java runtime. In PLDI, pages 245--255, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. FeS2. The FeS2 simulator. URL http://fes2.cs.uiuc.edu/.Google ScholarGoogle Scholar
  20. C. Flanagan and S. Freund. FastTrack: Efficient and precise dynamic race detection. In Proceedings of PLDI, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Flanagan and S. N. Freund. Type-based race detection for Java. In Proceedings of PLDI, pages 219--232, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. M. Gallagher, W. Y. Chen, S. A. Mahlke, J. C. Gyllenhaal, and W. mei W. Hwu. Dynamic memory disambiguation using the memory conflict buffer. In ASPLOS, pages 183--193, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. Gharachorloo, A. Gupta, and J. Hennessy. Two techniques to enhance the performance of memory consistency models. In Proceedings of the 1991 International Conference on Parallel Processing, volume 1, pages 355--364, 1991.Google ScholarGoogle Scholar
  24. C. Gniady and B. Falsafi. Speculative sequential consistency with little custom storage. In IEEE PACT, pages 179--188, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. Hammond, V. Wong, M. K. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun. Transactional memory coherence and consistency. In ISCA, pages 102-- 113, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. L. Henning. SPEC CPU2006 benchmark descriptions. SIGARCH Computer Architecture News, 34:1--17, September 2006. ISSN 0163-5964. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data structures. In Proceedings of ISCA, pages 289--300. ACM, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. D. Hill. Multiprocessors should support simple memoryconsistency models. IEEE Computer, 31:28--34, 1998. ISSN 0018- 9162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Itanium. Inside the Intel Itanium 2 processor. Hewlett Packard Technical White Paper, 2002.Google ScholarGoogle Scholar
  30. A. Kamil, J. Su, and K. Yelick. Making sequential consistency practical in Titanium. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing, page 15. IEEE Computer Society, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers, 100(28):690--691, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization. IEEE Computer Society, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. C. Lin, V. Nagarajan, and R. Gupta. Efficient sequential consistency using conditional fences. In International Conference on Parallel Architectres and Compilation Techniques, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. B. Lucia, L. Ceze, K. Strauss, S. Qadeer, and H. Boehm. Conflict Exceptions: Providing simple parallel language semantics with precise hardware exceptions. In 37th Annual International Symposium on Computer Architecture, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hållberg, J. Högberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. IEEE Computer, 35(2):50--58, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Manson, W. Pugh, and S. V. Adve. The java memory model. In Proceedings of POPL, pages 378--391. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. D. Marino, A. Singh, T. Millstein, M. Musuvathi, and S. Narayanasamy. DRFx: A simple and efficient memory model for concurrent programming languages. In PLDI '10, pages 351--362. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Muzahid, D. Suarez, S. Qi, and J. Torrellas. SigRace: Signaturebased data race detection. In ISCA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. V. Nagarajan and R. Gupta. Speculative optimizations for parallel programs on multicores. In LCPC, pages 323--337, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. S. Owens, S. Sarkar, and P. Sewell. A better x86 memory model: x86-TSO. In In TPHOLs '09: Conference on Theorem Proving in Higher Order Logics, volume 5674 of LNCS, pages 391--407. Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. Postiff, D. Greene, and T. N. Mudge. The store-load address table and speculative register promotion. In MICRO, pages 235--244, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. P. Pratikakis, J. S. Foster, and M. Hicks. LOCKSMITH: Contextsensitive correlation analysis for race detection. In Proceedings of PLDI, pages 320--331, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. Prvulovic and J. Torrelas. Reenact: Using thread-level speculation mechanisms to debug data races inmultithreaded codes. In Proceedings of ISCA, San Diego, CA, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. P. Ranganathan, V. Pai, and S. Adve. Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models. In SPAA '97, pages 199--210, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. D. Shasha and M. Snir. Efficient and correct execution of parallel programs that share memory. ACM Transactions on Programming Languages and Systems (TOPLAS), 10(2):282--312, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Z. Sura, X. Fang, C. Wong, S. Midkiff, J. Lee, and D. Padua. Compiler techniques for high performance sequentially consistent java programs. In Proceedings of PPoPP, pages 2--13, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. J. Ševčík and D. Aspinall. On validity of program transformations in the Java memory model. In ECOOP, pages 27--51, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. J. Ševčík, V. Vafeiadis, F. Zappa Nardelli, S. Jagannathan, and P. Sewell. Relaxed-memory concurrency and verified compilation. In Proceedings of the 38th annual ACMSIGPLAN-SIGACT symposium on Principles of programming languages, POPL '11, pages 43--54. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: characterization and methodological considerations. In ISCA, pages 24--36, New York, NY, USA, 1995. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. K. Yeager.The MIPS R10000 superscalar microprocessor. Micro, IEEE, 16(2):28--41, 2002. ISSN 0272-1732. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A case for an SC-preserving compiler

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 46, Issue 6
            PLDI '11
            June 2011
            652 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/1993316
            Issue’s Table of Contents
            • cover image ACM Conferences
              PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation
              June 2011
              668 pages
              ISBN:9781450306638
              DOI:10.1145/1993498
              • General Chair:
              • Mary Hall,
              • Program Chair:
              • David Padua

            Copyright © 2011 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 4 June 2011

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!