Abstract
A *memory consistency model* (or simply *memory model*) defines the possible values that a shared-memory read may return in a multithreaded programming language. Choosing a memory model involves an inherent performance-programmability tradeoff. The Java language has adopted a *relaxed* (or *weak*) memory model that is designed to admit most traditional compiler optimizations and obviate the need for hardware fences on most shared-memory accesses. The downside, however, is that programmers are exposed to a complex and unintuitive semantics and must carefully declare certain variables as `volatile` in order to enforce program orderings that are necessary for proper behavior.
This paper proposes a simpler and stronger memory model for Java through a conceptually small change: *every* variable has `volatile` semantics by default, but the language allows a programmer to tag certain variables, methods, or classes as `relaxed` and provides the current Java semantics for these portions of code. This *volatile-by-default* semantics provides *sequential consistency* (SC) for all programs by default. At the same time, expert programmers retain the freedom to build performance-critical libraries that violate the SC semantics.
At the outset, it is unclear if the `volatile`-by-default semantics is practical for Java, given the cost of memory fences on today's hardware platforms. The core contribution of this paper is to demonstrate, through comprehensive empirical evaluation, that the `volatile`-by-default semantics is arguably acceptable for a predominant use case for Java today -- server-side applications running on Intel x86 architectures. We present VBD-HotSpot, a modification to Oracle's widely used HotSpot JVM that implements the `volatile`-by-default semantics for x86. To our knowledge VBD-HotSpot is the first implementation of SC for Java in the context of a modern JVM. VBD-HotSpot incurs an average overhead versus the baseline HotSpot JVM of 28% for the Da Capo benchmarks, which is significant though perhaps less than commonly assumed. Further, VBD-HotSpot incurs average overheads of 12% and 19% respectively on standard benchmark suites for big-data analytics and machine learning in the widely used Spark framework.
- Sarita V. Adve and Hans-J. Boehm. 2010. Memory Models: A Case for Rethinking Parallel Languages and Hardware. Commun. ACM 53, 8 (Aug. 2010), 90–101. Google Scholar
Digital Library
- S. V. Adve and M. D. Hill. 1990. Weak ordering—a new definition. In Proc. of the 17th Annual International Symposium on Computer Architecture. ACM, 2–14.Google Scholar
- Wonsun Ahn, Shanxiang Qi, Jae-Woo Lee, Marios Nicolaides, Xing Fang, Josep Torrellas, David Wong, and Samuel Midkiff. 2009. BulkCompiler: High-Performance Sequential Consistency through Cooperative Compiler and Hardware Support. In 42nd International Symposium on Microarchitecture. Google Scholar
Digital Library
- Jade Alglave, Daniel Kroening, Vincent Nimal, and Daniel Poetzl. 2014. Don’t Sit on the Fence - A Static Analysis Approach to Automatic Fence Insertion. In Computer Aided Verification - 26th International Conference. 508–524.Google Scholar
- Bowen Alpern, Steve Augart, Stephen M. Blackburn, Maria A. Butrico, Anthony Cocchi, Perry Cheng, Julian Dolby, Stephen J. Fink, David Grove, Michael Hind, Kathryn S. McKinley, Mark F. Mergen, J. Eliot B. Moss, Ton Anh Ngo, Vivek Sarkar, and Martin Trapp. 2005. The Jikes Research Virtual Machine project: Building an open-source research community. IBM Systems Journal 44, 2 (2005), 399–418. Google Scholar
Digital Library
- ARMv8 2017. ARM Cortex-A Series Programmer’s Guide for ARMv8-A Version: 1.0, Section 13.2.1. (2017). https: //developer.arm.com/docs/den0024/latest/13-memory-ordering/132-barriers/1321-one-way-barriers Accessed July 2017.Google Scholar
- D. Bacon, J. Bloch, J. Bogda, C. Click, P. Haahr, D. Lea, T. May, J. W. Maessen, J. D. Mitchell, K. Nilsen, B. Pugh, and E. S. Sirer. Accessed April 2017. The “Double-Checked Locking is Broken” Declaration. (Accessed April 2017). http: //www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.htmlGoogle Scholar
- S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In OOPSLA ’06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-Oriented Programing, Systems, Languages, and Applications. ACM Press, New York, NY, USA, 169–190.Google Scholar
- Hans-J. Boehm. 2011. How to Miscompile Programs with "Benign" Data Races. In Proceedings of the 3rd USENIX Conference on Hot Topic in Parallelism (HotPar’11). USENIX Association, Berkeley, CA, USA.Google Scholar
Digital Library
- Hans-J. Boehm. 2012. Position Paper: Nondeterminism is Unavoidable, but Data Races Are Pure Evil. In Proceedings of the 2012 ACM Workshop on Relaxing Synchronization for Multicore and Manycore Scalability (RACES ’12). ACM, 9–14.Google Scholar
Digital Library
- H. J. Boehm and S. Adve. 2008. Foundations of the C++ concurrency memory model. In Proc. of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 68–78. Google Scholar
Digital Library
- Hans-J. Boehm and Brian Demsky. 2014. Outlawing Ghosts: Avoiding Out-of-thin-air Results. In Proceedings of the Workshop on Memory Systems Performance and Correctness (MSPC ’14). ACM, Article 7, 6 pages.Google Scholar
- Pietro Cenciarelli, Alexander Knapp, and Eleonora Sibilio. 2007. The Java Memory Model: Operationally, Denotationally, Axiomatically. In Programming Languages and Systems, 16th European Symposium on Programming (Lecture Notes in Computer Science), Rocco De Nicola (Ed.), Vol. 4421. Springer, 331–346.Google Scholar
- Luis Ceze, James Tuck, Pablo Montesinos, and Josep Torrellas. 2007. BulkSC: Bulk enforcement of sequential consistency. In Proc. of the 34th Annual International Symposium on Computer Architecture. 278–289. Google Scholar
Digital Library
- Delphine Demange, Vincent Laporte, Lei Zhao, Suresh Jagannathan, David Pichardie, and Jan Vitek. 2013. Plan B: A Buffered Memory Model for Java. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’13). ACM, New York, NY, USA, 329–342. Google Scholar
Digital Library
- Cormac Flanagan and Stephen N. Freund. 2010. Adversarial Memory for Detecting Destructive Races. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’10). ACM, 244–254. Google Scholar
Digital Library
- Andy Georges, Dries Buytaert, and Lieven Eeckhout. 2007. Statistically Rigorous Java Performance Evaluation. In Proceedings of the 22Nd Annual ACM SIGPLAN Conference on Object-oriented Programming Systems and Applications (OOPSLA ’07). ACM, 57–76. Google Scholar
Digital Library
- K. Gharachorloo, A. Gupta, and J. Hennessy. 1991. Two Techniques to Enhance the Performance of Memory Consistency Models. In Proc. of the International Conference on Parallel Processing. 355–364.Google Scholar
- Mohammad Majharul Islam and Abdullah Muzahid. 2016. Detecting, Exposing, and Classifying Sequential Consistency Violations. In 27th IEEE International Symposium on Software Reliability Engineering, ISSRE 2016, Ottawa, ON, Canada, October 23-27, 2016. IEEE Computer Society, 241–252. Google Scholar
Cross Ref
- Java Virtual Machine Specification 2017. Accessed July 2017. (2017). https://docs.oracle.com/javase/specs/jvms/se8/htmlGoogle Scholar
- JSR133 2017. JSR-133 Cookbook for Compiler Writers. Accessed July 2017. (2017). http://g.oswego.edu/dl/jmm/cookbook.htmlGoogle Scholar
- Jan-Oliver Kaiser, Hoang-Hai Dang, Derek Dreyer, Ori Lahav, and Viktor Vafeiadis. 2017. Strong Logic for Weak Memory: Reasoning About Release-Acquire Consistency in Iris. In 31st European Conference on Object-Oriented Programming (ECOOP 2017) (Leibniz International Proceedings in Informatics (LIPIcs)), Peter Müller (Ed.), Vol. 74. 17:1–17:29.Google Scholar
- A. Kamil, J. Su, and K. Yelick. 2005. Making sequential consistency practical in Titanium. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing. IEEE Computer Society. Google Scholar
Digital Library
- L. Lamport. 1979. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE transactions on computers 100, 28 (1979), 690–691. Google Scholar
Digital Library
- Brandon Lucia, Luis Ceze, Karin Strauss, Shaz Qadeer, and Hans Boehm. 2010. Conflict Exceptions: Providing Simple Parallel Language Semantics with Precise Hardware Exceptions. In Proc. of the 37th Annual International Symposium on Computer Architecture. Google Scholar
Digital Library
- Sela Mador-Haim, Luc Maranget, Susmit Sarkar, Kayvan Memarian, Jade Alglave, Scott Owens, Rajeev Alur, Milo M. K. Martin, Peter Sewell, and Derek Williams. 2012. An Axiomatic Memory Model for POWER Multiprocessors. In Computer Aided Verification - 24th International Conference, P. Madhusudan and Sanjit A. Seshia (Eds.), Vol. 7358. Springer, 495–512. Google Scholar
Digital Library
- J. Manson, W. Pugh, and S. Adve. 2005. The Java memory model. In Proceedings of POPL. ACM, 378–391. Google Scholar
Digital Library
- Daniel Marino, Todd Millstein, Madanlal Musuvathi, Satish Narayanasamy, and Abhayendra Singh. 2015. The Silently Shifting Semicolon. In 1st Summit on Advances in Programming Languages (SNAPL 2015) (Leibniz International Proceedings in Informatics (LIPIcs)), Thomas Ball, Rastislav Bodik, Shriram Krishnamurthi, Benjamin S. Lerner, and Greg Morrisett (Eds.), Vol. 32. 177–189.Google Scholar
- Daniel Marino, Abhayendra Singh, Todd Millstein, Madanlal Musuvathi, and Satish Narayanasamy. 2010. DRFx: A simple and efficient memory model for concurrent programming languages. In PLDI ’10. ACM, 351–362. Google Scholar
Digital Library
- Daniel Marino, Abhayendra Singh, Todd Millstein, Madanlal Musuvathi, and Satish Narayanasamy. 2011. A Case for an SC-Preserving Compiler. In Proc. of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation. Google Scholar
Digital Library
- Luis Mastrangelo, Luca Ponzanelli, Andrea Mocci, Michele Lanza, Matthias Hauswirth, and Nathaniel Nystrom. 2015. Use at Your Own Risk: The Java Unsafe API in the Wild. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015). ACM, 695–710. Google Scholar
Digital Library
- Xiangrui Meng, Joseph K. Bradley, Burak Yavuz, Evan R. Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, D. B. Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia, and Ameet Talwalkar. 2015. MLlib: Machine Learning in Apache Spark. CoRR abs/1505.06807 (2015). http://arxiv.org/abs/1505.06807Google Scholar
- OpenJDK 2017. Accessed July 2017. (2017). http://openjdk.java.netGoogle Scholar
- Jessica Ouyang, Peter M. Chen, Jason Flinn, and Satish Narayanasamy. 2013. ...And Region Serializability for All. In 5th USENIX Workshop on Hot Topics in Parallelism, HotPar’13, Emery D. Berger and Kim M. Hazelwood (Eds.). USENIX Association.Google Scholar
Digital Library
- Scott Owens, Susmit Sarkar, and Peter Sewell. 2009. A Better x86 Memory Model: x86-TSO. In Theorem Proving in Higher Order Logics, 22nd International Conference, TPHOLs 2009 (Lecture Notes in Computer Science), Stefan Berghofer, Tobias Nipkow, Christian Urban, and Makarius Wenzel (Eds.), Vol. 5674. Springer, 391–407.Google Scholar
- Filip Pizlo, Lukasz Ziarek, Ethan Blanton, Petr Maj, and Jan Vitek. 2010. High-level Programming of Embedded Hard Real-time Devices. In Proceedings of the 5th European Conference on Computer Systems (EuroSys ’10). 69–82. Google Scholar
Digital Library
- Carl G. Ritson and Scott Owens. 2016. Benchmarking Weak Memory Models. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’16). Article 24, 11 pages. Google Scholar
Digital Library
- Douglas C. Schmidt and Tim Harrison. 1997. Double-checked Locking: An optimization pattern for efficiently initializing and accessing thread-safe objects. In Pattern Languages of Program Design 3, Robert C. Martin, Dirk Riehle, and Frank Buschmann (Eds.). Addison-Wesley Longman Publishing Co., Inc., 363–375.Google Scholar
- Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond, and Milind Kulkarni. 2015a. Hybrid Static–Dynamic Analysis for Statically Bounded Region Serializability. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’15). 561–575.Google Scholar
Digital Library
- Aritra Sengupta, Man Cao, Michael D. Bond, and Milind Kulkarni. 2015b. Toward Efficient Strong Memory Model Support for the Java Platform via Hybrid Synchronization. In Proceedings of the Principles and Practices of Programming on The Java Platform, PPPJ 2015, Ryan Stansifer and Andreas Krall (Eds.). ACM, 65–75.Google Scholar
Digital Library
- Jaroslav Sevcík and David Aspinall. 2008. On Validity of Program Transformations in the Java Memory Model. In ECOOP. 27–51.Google Scholar
- D. Shasha and M. Snir. 1988. Efficient and correct execution of parallel programs that share memory. ACM Transactions on Programming Languages and Systems (TOPLAS) 10, 2 (1988), 282–312. Google Scholar
Digital Library
- Abhayendra Singh, Daniel Marino, Satish Narayanasamy, Todd Millstein, and Madan Musuvathi. 2011. Efficient processor support for DRFx, a memory model with exceptions. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). ACM, 53–66. Google Scholar
Digital Library
- Abhayendra Singh, S. Narayanasamy, D. Marino, T. Millstein, and M. Musuvathi. 2012. End-to-end Sequential Consistency. In Proc. of the 39th Annual International Symposium on Computer Architecture. 524 –535.Google Scholar
- Z. Sura, X. Fang, C.L. Wong, S.P. Midkiff, J. Lee, and D. Padua. 2005. Compiler techniques for high performance sequentially consistent Java programs. In Proceedings of the tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 2–13. Google Scholar
Digital Library
- TSM03-J 2017. TSM03-J. Do not publish partially initialized objects. Accessed July 2017. (2017). https://www.securecoding. cert.org/confluence/display/java/TSM03-J.+Do+not+publish+partially+initialized+objectsGoogle Scholar
- Michael Vollmer, Ryan G. Scott, Madanlal Musuvathi, and Ryan R. Newton. 2017. SC-Haskell: Sequential Consistency in Languages That Minimize Mutable Shared Heap. In Proceedings of the 22Nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’17). ACM, 283–298.Google Scholar
- Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Ali Ghodsi, Joseph Gonzalez, Scott Shenker, and Ion Stoica. 2016. Apache Spark: A unified engine for big data processing. Commun. ACM 59, 11 (2016), 56–65. Google Scholar
Digital Library
Index Terms
A volatile-by-default JVM for server applications
Recommendations
Accelerating sequential consistency for Java with speculative compilation
PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and ImplementationA memory consistency model (or simply a memory model) specifies the granularity and the order in which memory accesses by one thread become visible to other threads in the program. We previously proposed the volatile-by-default (VBD) memory model as a ...
Hera-JVM: a runtime system for heterogeneous multi-core architectures
OOPSLA '10: Proceedings of the ACM international conference on Object oriented programming systems languages and applicationsHeterogeneous multi-core processors, such as the IBM Cell processor, can deliver high performance. However, these processors are notoriously difficult to program: different cores support different instruction set architectures, and the processor as a ...
Hera-JVM: a runtime system for heterogeneous multi-core architectures
OOPSLA '10Heterogeneous multi-core processors, such as the IBM Cell processor, can deliver high performance. However, these processors are notoriously difficult to program: different cores support different instruction set architectures, and the processor as a ...






Comments