skip to main content
research-article
Public Access

DRFx: An Understandable, High Performance, and Flexible Memory Model for Concurrent Languages

Published:15 September 2016Publication History
Skip Abstract Section

Abstract

The most intuitive memory model for shared-memory multi-threaded programming is sequential consistency (SC), but it disallows the use of many compiler and hardware optimizations and thus affects performance. Data-race-free (DRF) models, such as the C++11 memory model, guarantee SC execution for data-race-free programs. But these models provide no guarantee at all for racy programs, compromising the safety and debuggability of such programs. To address the safety issue, the Java memory model, which is also based on the DRF model, provides a weak semantics for racy executions. However, this semantics is subtle and complex, making it difficult for programmers to reason about their programs and for compiler writers to ensure the correctness of compiler optimizations.

We present the drfx memory model, which is simple for programmers to understand and use while still supporting many common optimizations. We introduce a memory model (MM) exception that can be signaled to halt execution. If a program executes without throwing this exception, then drfx guarantees that the execution is SC. If a program throws an MM exception during an execution, then drfx guarantees that the program has a data race. We observe that SC violations can be detected in hardware through a lightweight form of conflict detection. Furthermore, our model safely allows aggressive compiler and hardware optimizations within compiler-designated program regions. We formalize our memory model, prove several properties of this model, describe a compiler and hardware design suitable for drfx, and evaluate the performance overhead due to our compiler and hardware requirements.

References

  1. S. Adve and K. Gharachorloo. 1996. Shared memory consistency models: A tutorial. Computer 29, 12 (1996), 66--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Sarita V. Adve and Hans-J. Boehm. 2010. Memory models: A case for rethinking parallel languages and hardware. Commun. ACM 53, 8 (Aug. 2010), 90--101. DOI:http://dx.doi.org/10.1145/1787234.1787255 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. V. Adve and M. D. Hill. 1990. Weak ordering—A new definition. In Proceedings of the 17th Annual International Symposium on Computer Architecture. ACM, 2--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. V. Adve, M. D. Hill, B. P. Miller, and R. H. B. Netzer. 1991. Detecting data races on weak memory systems. In Proceedings of the 18th Annual International Symposium on Computer Architecture. 234--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Wonsun Ahn, Shanxiang Qi, Jae-Woo Lee, Marios Nicolaides, Xing Fang, Josep Torrellas, David Wong, and Samuel Midkiff. 2009. BulkCompiler: High-performance sequential consistency through cooperative compiler and hardware support. In Proceedings of the 42nd International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Bienia, S. Kumar, J. P. Singh, and K. Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Blundell, M. M. K. Martin, and Thomas F. Wenisch. 2009. InvisiFence: Performance-transparent memory ordering in conventional multiprocessors. In Proceedings of the 36th Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. J. Boehm. 2009. Simple thread semantics require race detection. In FIT Session at PLDI.Google ScholarGoogle Scholar
  9. H. J. Boehm and S. Adve. 2008. Foundations of the C++ concurrency memory model. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 68--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Michael D. Bond, Katherine E. Coons, and Kathryn S. McKinley. 2010. PACER: Proportional detection of data races. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). ACM, New York, NY, 255--268. DOI:http://dx.doi.org/10.1145/1806596.1806626 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Boyapati, R. Lee, and M. Rinard. 2002. Ownership types for safe programming: Preventing data races and deadlocks. In Proceedings of OOPSLA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chandrasekhar Boyapati and Martin Rinard. 2001. A parameterized type system for race-free java programs. In Proceedings of OOPSLA. ACM Press, 56--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Ceze, J. Devietti, B. Lucia, and S. Qadeer. 2009. The case for system support for concurrency exceptions. In USENIX HotPar. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Luis Ceze, James Tuck, Pablo Montesinos, and Josep Torrellas. 2007. BulkSC: Bulk enforcement of sequential consistency. In Proceedings of the 34th Annual International Symposium on Computer Architecture. 278--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Luis Ceze, James Tuck, Josep Torrellas, and Calin Cascaval. 2006. Bulk disambiguation of speculative threads in multiprocessors. In Proceedings of the 33rd Annual International Symposium on Computer Architecture. IEEE Computer Society, 227--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Dice, Y. Lev, M. Moir, and D. Nussbaum. 2009. Early experience with a commercial hardware transactional memory implementation. In Proceedings of ASPLOS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Elmas, S. Qadeer, and S. Tasiran. 2007. Goldilocks: A race and transaction-aware java runtime. In Proceedings of the 2007 Conference on Programming Language Design and Implementation. ACM, 245--255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Fidge. 1991. Logical time in distributed computing systems. IEEE Comput. 24, 8 (Aug. 1991), 28--33. DOI:http://dx.doi.org/10.1109/2.84874 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Flanagan and S. N. Freund. 2009. FastTrack: Efficient and precise dynamic race detection. In Proceedings of the 2009 Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. Gharachorloo and P. B. Gibbons. 1991. Detecting violations of sequential consistency. In Proceedings of the 2nd Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’90). ACM New York, NY, USA, 316--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. Gharachorloo, A. Gupta, and J. Hennessy. 1991. Two techniques to enhance the performance of memory consistency models. In Proceedings of the International Conference on Parallel Processing. 355--364.Google ScholarGoogle Scholar
  22. K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. 1990. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proceedings of the 18th Annual International Symposium on Computer Architecture. 15--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Lance Hammond, Vicky Wong, Michael K. Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu, Honggo Wijaya, Christos Kozyrakis, and Kunle Olukotun. 2004. Transactional memory coherence and consistency. In Proceedings of the 31st Annual International Symposium on Computer Architecture. 102--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. A. Haring, M. Ohmacht, T. W. Fox, M. K. Gschwind, D. L. Satterfield, K. Sugavanam, P. W. Coteus, P. Heidelberger, M. A. Blumrich, R. W. Wisniewski, A. Gara, G. L.-T. Chiu, P. A. Boyle, N. H. Chist, and Changhoan Kim. 2012. The IBM blue gene/Q compute chip. IEEE Micro 32, 2 (2012), 48--60. DOI:http://dx.doi.org/10.1109/MM.2011.108 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Maurice Herlihy and J. Eliot B. Moss. 1993. Transactional memory: Architectural support for lock-free data structures. In Proceedings of the 20th Annual International Symposium on Computer Architecture. ACM, 289--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Intel Corporation. 2012. Intel architecture instruction set extensions programming reference. 319433-012 Edition (Feb. 2012).Google ScholarGoogle Scholar
  27. A. Kamil, J. Su, and K. Yelick. 2005. Making sequential consistency practical in titanium. In Proceedings of the 2005 ACM/IEEE Conference on Supercomputing. IEEE Computer Society, 15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Krishnamurthy and K. Yelick. 1996. Analyses and optimizations for shared address space programs. J. Parallel Distrib. Comput. 38, 2 (1996), 130--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Lamport. 1978. Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21, 7 (1978), 558--565. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. L. Lamport. 1979. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput. 100, 28 (1979), 690--691. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. Lattner and V. Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Changhui Lin, Vijay Nagarajan, Rajiv Gupta, and Bharghava Rajaram. 2012. Efficient sequential consistency via conflict ordering. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Brandon Lucia, Luis Ceze, Karin Strauss, Shaz Qadeer, and Hans Boehm. 2010. Conflict exceptions: Providing simple parallel language semantics with precise hardware exceptions. In Proceedings of the 37th Annual International Symposium on Computer Architecture.Google ScholarGoogle Scholar
  34. J. Manson, W. Pugh, and S. Adve. 2005. The java memory model. In Proceedings of POPL. ACM, 378--391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. Marino, M. Musuvathi, and S. Narayanasamy. 2009a. LiteRace: Effective sampling for lightweight data-race detection. (2009).Google ScholarGoogle Scholar
  36. Daniel Marino, Abhayendra Singh, Todd Millstein, Madanlal Musuvathi, and Satish Narayanasamy. 2009b. DRFx: A Simple and Efficient Memory Model for Concurrent Programming Languages. Technical Report 090021. UCLA Computer Science Department. http://fmdb.cs.ucla.edu/Treports/090021.pdf.Google ScholarGoogle Scholar
  37. Daniel Marino, Abhayendra Singh, Todd Millstein, Madanlal Musuvathi, and Satish Narayanasamy. 2010. DRFx: A simple and efficient memory model for concurrent programming languages. In PLDI’10. ACM, 351--362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Daniel Marino, Abhayendra Singh, Todd Millstein, Madanlal Musuvathi, and Satish Narayanasamy. 2011. A case for an SC-preserving compiler. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Friedemann Mattern. 1989. Virtual time and global states of distributed systems. In Proceedings Workshop on Parallel and Distributed Algorithms, Cosnard M. et al. (Ed.). North-Holland/Elsevier, 215--226. (Reprinted in: Z. Yang, T. A. Marsland (Eds.), Global States and Time in Distributed Systems, IEEE, 1994, pp. 123--133.).Google ScholarGoogle Scholar
  40. Abdullah Muzahid, Shanxiang Qi, and Josep Torrellas. 2012. Vulcan: Hardware support for detecting sequential consistency violations dynamically. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’12). IEEE Computer Society, Washington, DC, USA, 363--375. DOI:http://dx.doi.org/10.1109/MICRO.2012.41 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. A. Muzahid, D. Suarez, S. Qi, and J. Torrellas. 2009. SigRace: Signature-based data race detection. In Proceedings of the 36th Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. N. Neelakantam, C. Blundell, J. Devietti, M. Martin, and C. Zilles. 2008. FeS2: A Full-system Execution-driven Simulator for x86. In Poster at Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'08).Google ScholarGoogle Scholar
  43. M. Prvulovic and J. Torrelas. 2003. ReEnact: Using thread-level speculation mechanisms to debug data races in multithreaded codes. In Proceedings of the 30th Annual International Symposium on Computer Architecture. San Diego, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Xuehai Qian, Josep Torrellas, Benjamin Sahelices, and Depei Qian. 2013. Volition: Scalable and precise sequential consistency violation detection. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’13). ACM, New York, NY, 535--548. DOI:http://dx.doi.org/10.1145/2451116.2451174 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. P. Ranganathan, V. S. Pai, and S. V. Adve. 1997. Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models. In Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures. 199--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Koushik Sen. 2008. Race directed random testing of concurrent programs. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’08). ACM, New York, NY, 11--21. DOI:http://dx.doi.org/10.1145/1375581.1375584 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond, and Milind Kulkarni. 2015. Hybrid static--dynamic analysis for statically bounded region serializability. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). ACM, New York, NY, 561--575. DOI:http://dx.doi.org/10.1145/2694344.2694379 Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. D. Shasha and M. Snir. 1988. Efficient and correct execution of parallel programs that share memory. ACM Trans. Program. Lang. Syst. 10, 2 (1988), 282--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Abhayendra Singh, Daniel Marino, Satish Narayanasamy, Todd Millstein, and Madan Musuvathi. 2011a. Efficient processor support for DRFx, a memory model with exceptions. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). ACM, 53--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Abhayendra Singh, Daniel Marino, Satish Narayanasamy, Todd Millstein, and Madanlal Musuvathi. 2011b. Efficient Processor Support for DRFx, a Memory Model with Exceptions. Technical Report 110002. UCLA Computer Science Department. Retrieved from http://fmdb.cs.ucla.edu/Treports/110002.pdf.Google ScholarGoogle Scholar
  51. Abhayendra Singh, S. Narayanasamy, D. Marino, T. Millstein, and M. Musuvathi. 2012. End-to-end sequential consistency. In Proceedings of the 39th Annual International Symposium on Computer Architecture. 524--535. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Z. Sura, X. Fang, C. L. Wong, S. P. Midkiff, J. Lee, and D. Padua. 2005. Compiler techniques for high performance sequentially consistent java programs. In Proceedings of the 10th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 2--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Walter Triebel, Joseph Bissell, and Rick Booth. 2001. Programming Itaniumö-based Systems. Intel Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Thomas F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. 2007. Mechanisms for store-wait-free multiprocessors. In Proceedings of the 34th Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. M. Wolfe. 1989. More iteration space tiling. In Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing’89). ACM, New York, NY, 655--664. DOI:http://dx.doi.org/10.1145/76263.76337 Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture. 24--36. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. DRFx: An Understandable, High Performance, and Flexible Memory Model for Concurrent Languages

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Programming Languages and Systems
          ACM Transactions on Programming Languages and Systems  Volume 38, Issue 4
          October 2016
          204 pages
          ISSN:0164-0925
          EISSN:1558-4593
          DOI:10.1145/2982214
          Issue’s Table of Contents

          Copyright © 2016 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 15 September 2016
          • Accepted: 1 April 2016
          • Revised: 1 February 2016
          • Received: 1 July 2013
          Published in toplas Volume 38, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!