Abstract
Many distributed workloads in today's data centers are written in managed languages such as Java or Ruby. Examples include big data frameworks such as Hadoop, data stores such as Cassandra or applications such as the SOLR search engine. These workloads typically run across many independent language runtime systems on different nodes. This setup represents a source of inefficiency, as these language runtime systems are unaware of each other. For example, they may perform Garbage Collection at times that are locally reasonable but not in a distributed setting.
We address these problems by introducing the concept of a Holistic Runtime System that makes runtime-level decisions for the entire distributed application rather than locally. We then present Taurus, a Holistic Runtime System prototype. Taurus is a JVM drop-in replacement, requires almost no configuration and can run unmodified off-the-shelf Java applications. Taurus enforces user-defined coordination policies and provides a DSL for writing these policies.
By applying Taurus to Garbage Collection, we demonstrate the potential of such a system and use it to explore coordination strategies for the runtime systems of real-world distributed applications, to improve application performance and address tail-latencies in latency-sensitive workloads.
- "The Apache Cassandra Project." [Online]. Available: http://cassandra.apache.org/Google Scholar
- "Apache Harmony." [Online]. Available: http://harmony.apache.org/Google Scholar
- "ART vs Dalvik - introducing the new Android runtime in KitKat." [Online]. Available: http://www.infinum.co/the-capsized-eight/articles/art-vs-dalvik-introducing-the-new-android-runtime-in-kit-katGoogle Scholar
- "Credit Suisse Case Study." [Online]. Available: http://www.azulsystems.com/customers/creditsuisseGoogle Scholar
- "G1: One Garbage Collector To Rule Them All." [Online]. Available: http://www.infoq.com/articles/G1-One-Garbage-Collector-To-Rule-Them-AllGoogle Scholar
- "Garbage Collection Notifications." [Online]. Available: https://msdn.microsoft.com/en-us/library/cc713687(v=vs.110).aspxGoogle Scholar
- "Google App Engine: Platform as a Service."Google Scholar
- "Hack: a new programming language for HHVM." [Online]. Available: https://code.facebook.com/posts/264544830379293/hack-a-new-programming-language-for-hhvm/Google Scholar
- "HDFS Issue 7244: "Reduce Namenode memory using Flyweight pattern"." [Online]. Available: https://issues.apache.org/jira/browse/HDFS-7244Google Scholar
- "Inside .NET Native (Channel 9)." [Online]. Available: http://channel9.msdn.com/Shows/GoingGoogle Scholar
- Deep/Inside-NET-NativeGoogle Scholar
- "JSR-000121 Application Isolation API Specification." [Online]. Available: https://jcp.org/aboutJava/communityprocess/final/jsr121/Google Scholar
- "LogCabin (GitHub)." [Online]. Available: http://github.com/logcabin/logcabinGoogle Scholar
- "Microsoft Windows Azure." [Online]. Available: http://www.windowsazure.com/Google Scholar
- "On Garbage Collection." [Online]. Available: http://hhvm.com/blog/431/on-garbage-collectionGoogle Scholar
- "Predictable Low Latency: "Cinnober on GC pause-free Java applications through orchestrated memory management"," Tech. Rep. [Online]. Available: http://www.cinnober.com/sites/cinnober.com/files/news/Cinnober%20on%20GC%20pause%20free%20Java%20applications.pdfGoogle Scholar
- "Project Tungsten: Bringing Spark Closer to Bare Metal." [Online]. Available: https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.htmlGoogle Scholar
- "Twitter Shifting More Code to JVM, Citing Performance and Encapsulation As Primary Drivers." [Online]. Available: http://www.infoq.com/articles/twitter-java-useGoogle Scholar
- "ZooKeeper SessionExpired events," in Apache HBase Reference Guide.\hskip 1em plus 0.5em minus 0.4em\relax Apache HBase Team. [Online]. Available: http://hbase.apache.org/book.htmlGoogle Scholar
- O. Anderson, E. Fortuna, L. Ceze, and S. Eggers, "Checked Load: Architectural Support for JavaScript Type-checking on Mobile Processors," in Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture, 2011.Google Scholar
- J. Andersson, S. Weber, E. Cecchet, C. Jensen, and V. Cahill, "Kaffemik - a Distributed JVM Featuring a Single Address Space Architecture," in Proceedings of the 2001 Symposium on Java Virtual Machine Research and Technology Symposium, 2001.Google Scholar
- Y. Aridor, M. Factor, and A. Teperman, "cJVM: A single system image of a JVM on a cluster," in Proceedings of the 1999 International Conference on Parallel Processing, 1999.Google Scholar
- S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann, "The DaCapo Benchmarks: Java Benchmarking Development and Analysis," in Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programing, Systems, Languages, and Applications, 2006.Google Scholar
- J. Bonér and E. Kuleshov, "Clustering the Java Virtual Machine using Aspect-Oriented Programming," in Proceedings of the 6th International Conference on Aspect-Oriented Software Development, 2007.Google Scholar
- C. Cameron, J. Singer, and D. Vengerov, "The Judgment of Forseti: Economic Utility for Dynamic Heap Sizing of Multiple Runtimes," in Proceedings of the 2015 ACM SIGPLAN International Symposium on Memory Management, 2015.Google Scholar
- T. Cao, S. M. Blackburn, T. Gao, and K. S. McKinley, "The Yin and Yang of Power and Performance for Asymmetric Hardware and Managed Software," in Proceedings of the 39th Annual International Symposium on Computer Architecture, 2012.Google Scholar
- D. Cheriton, "The V Distributed System," Commun. ACM, vol. 31, no. 3, pp. 314--333, Mar. 1988.Google Scholar
Digital Library
- J. A. Colmenares, G. Eads, S. Hofmeyr, S. Bird, M. Moretó, D. Chou, B. Gluzman, E. Roman, D. B. Bartolini, N. Mor, K. Asanović, and J. D. Kubiatowicz, "Tessellation: Refactoring the OS Around Explicit Resource Containers with Continuous Adaptation," in Proceedings of the 50th Annual Design Automation Conference, 2013.Google Scholar
- B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears, "Benchmarking Cloud Serving Systems with YCSB," in Proceedings of the 1st ACM Symposium on Cloud Computing, 2010.Google Scholar
- G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels, "Dynamo: Amazon's Highly Available Key-value Store," in Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Principles, 2007.Google Scholar
- C. Delimitrou and C. Kozyrakis, "Quasar: Resource-efficient and QoS-aware Cluster Management," in Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, 2014.Google Scholar
- H. Fan, A. Ramaraju, M. McKenzie, W. Golab, and B. Wong, "Understanding the Causes of Consistency Anomalies in Apache Cassandra," Proceedings of the VLDB Endowment, vol. 8, no. 7, 2015.Google Scholar
- L. Gidra, G. Thomas, J. Sopena, M. Shapiro, and N. Nguyen, "NumaGiC: A Garbage Collector for Big Data on Big NUMA Machines," in Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015.Google Scholar
- I. Gog, J. Giceva, M. Schwarzkopf, K. Viswani, D. Vytiniotis, G. Ramalingan, M. Costa, D. Murray, S. Hand, and M. Isard, "Broom: sweeping out Garbage Collection from Big Data systems," in Proceedings of the 15th USENIX/ACM Workshop on Hot Topics in Operating Systems (HotOS 2015), 2015.Google Scholar
- J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica, "GraphX: Graph Processing in a Distributed Dataflow Framework," in Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, 2014.Google Scholar
- T. Harris, M. Maas, and V. J. Marathe, "Callisto: Co-scheduling Parallel Runtime Systems," in Proceedings of the Ninth European Conference on Computer Systems, 2014.Google Scholar
- B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica, "Mesos: a platform for fine-grained resource sharing in the data center," in Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, 2011.Google Scholar
- P. Hintjens, "ZeroMQ: The Guide," Tech. Rep., 2010. [Online]. Available: http://zguide.zeromq.org/page:allGoogle Scholar
- G. C. Hunt and J. R. Larus, "Singularity: Rethinking the Software Stack," SIGOPS Oper. Syst. Rev., vol. 41, no. 2, pp. 37--49, Apr. 2007.Google Scholar
Digital Library
- P. Hunt, M. Konar, F. P. Junqueira, and B. Reed, "ZooKeeper: Wait-free Coordination for Internet-scale Systems," in Proceedings of the 2010 USENIX Annual Technical Conference, 2010.Google Scholar
- R. Jones and R. Lins, Garbage Collection: Algorithms for Automatic Dynamic Memory Management.\hskip 1em plus 0.5em minus 0.4em\relax Wiley, Sep. 1996.Google Scholar
- M. Jordan, L. Daynès, G. Czajkowski, M. Jarzab, and C. Bryce, "Scaling J2EE Application Servers with the Multi-tasking Virtual Machine," Sun Microsystems, Inc., Mountain View, CA, USA, Tech. Rep., 2004.Google Scholar
- M. Kornacker, A. Behm, V. Bittorf, T. Bobrovytsky, C. Ching, A. Choi, J. Erickson, M. Grund, D. Hecht, M. Jacobs, I. Joshi, L. Kuff, D. Kumar, A. Leblang, N. Li, I. Pandis, H. Robinson, D. Rorke, S. Rus, J. Russell, D. Tsirogiannis, S. Wanderman-Milne, and M. Yoder, "Impala: A modern, open-source SQL engine for hadoop," in Seventh Biennial Conference on Innovative Data Systems Research, 2015.Google Scholar
- M. A. Laurenzano, Y. Zhang, L. Tang, and J. Mars, "Protean Code: Achieving Near-Free Online Code Transformations for Warehouse Scale Computers," in Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014.Google Scholar
- E. D. Lazowska, H. M. Levy, G. T. Almes, M. J. Fischer, R. J. Fowler, and S. C. Vestal, "The Architecture of the Eden System," in Proceedings of the Eighth ACM Symposium on Operating Systems Principles, 1981.Google Scholar
- M. Maas, K. Asanovic, T. Harris, and J. Kubiatowicz, "The Case for the Holistic Language Runtime System," in First International Workshop on Rack-scale Computing (WRSC '14), 2014.Google Scholar
- M. Maas, T. Harris, K. Asanovic, and J. Kubiatowicz, "Trash Day: Coordinating Garbage Collection in Distributed Systems," in Proceedings of the 15th USENIX/ACM Workshop on Hot Topics in Operating Systems (HotOS 2015), 2015.Google Scholar
- M. Maas and R. McIlroy, "A JVM for the Barrelfish Operating System," in 2nd Workshop on Systems for Future Multi-core Architectures (SFMA '12), 2012.Google Scholar
- L. A. Meyerovich and A. S. Rabkin, "Empirical Analysis of Programming Language Adoption," in Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, 2013.Google Scholar
- S. Mullender, G. van Rossum, A. Tananbaum, R. van Renesse, and H. van Staveren, "Amoeba: a distributed operating system for the 1990s," Computer, vol. 23, no. 5, pp. 44--53, May 1990.Google Scholar
Digital Library
- K. Nguyen, K. Wang, Y. Bu, L. Fang, J. Hu, and G. Xu, "FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications," in Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015.Google Scholar
- D. Ongaro and J. Ousterhout, "In Search of an Understandable Consensus Algorithm," in Proceedings of the 2014 USENIX Annual Technical Conference, 2014.Google Scholar
- J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra, A. Narayanan, G. Parulkar, M. Rosenblum, S. M. Rumble, E. Stratmann, and R. Stutsman, "The Case for RAMClouds: Scalable High-performance Storage Entirely in DRAM," SIGOPS Oper. Syst. Rev., vol. 43, no. 4, pp. 92--105, Jan. 2010.Google Scholar
Digital Library
- J. K. Ousterhout, A. R. Cherenson, F. Douglis, M. N. Nelson, and B. B. Welch, "The Sprite Network Operating System," Computer, vol. 21, no. 2, pp. 23--36, Feb. 1988.Google Scholar
Digital Library
- K. Ousterhout, R. Rasti, S. Ratnasamy, S. Shenker, and B.-G. Chun, "Making Sense of Performance in Data Analytics Frameworks," in 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15), 2015.Google Scholar
- A. Portillo-Dominguez, M. Wang, J. Murphy, and D. Magoni, "Adaptive GC-Aware Load Balancing Strategy for High-Assurance Java Distributed Systems," in 16th International Symposium on High Assurance Systems Engineering (HASE), 2015.Google Scholar
- A. O. Portillo-Domínguez, M. Wang, D. Magoni, P. Perry, and J. Murphy, "Load balancing of Java applications by forecasting garbage collections," 2014.Google Scholar
- M. Schwarzkopf, M. P. Grosvenor, and S. Hand, "New Wine in Old Skins: The Case for Distributed Operating Systems in the Data Center," in Proceedings of the 4th Asia-Pacific Workshop on Systems, 2013.Google Scholar
- M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, and J. Wilkes, "Omega: flexible, scalable schedulers for large compute clusters," in Proceedings of the 8th European Conference on Computer Systems, 2013.Google Scholar
- J. Simão, J. Lemos, and L. Veiga, "A2-VM : A Cooperative Java VM with Support for Resource-Awareness and Cluster-Wide Thread Scheduling," in On the Move to Meaningful Internet Systems: O™ 2011, ser. Lecture Notes in Computer Science, 2011.Google Scholar
- D. Smiley and D. E. Pugh, Apache Solr 3 Enterprise Search Server.\hskip 1em plus 0.5em minus 0.4em\relax Packt Publishing Ltd, 2011.Google Scholar
- G. Tene, B. Iyengar, and M. Wolf, "C4: The Continuously Concurrent Compacting Collector," in Proceedings of the International Symposium on Memory Management, 2011.Google Scholar
- D. Terei and A. Levy, "Blade: A Data Center Garbage Collector," arXiv:1504.02578 [cs], Apr. 2015, arXiv: 1504.02578.Google Scholar
- D. Tsafrir, Y. Etsion, D. G. Feitelson, and S. Kirkpatrick, "System Noise, OS Clock Ticks, and Fine-grained Parallel Applications," in Proceedings of the 19th Annual International Conference on Supercomputing, 2005.Google Scholar
- K. Varda, Protocol buffers: Google's data interchange format, 2008.Google Scholar
- N. Wakart, "Correcting YCSB's Coordinated Omission problem," Mar. 2015. [Online]. Available: http://psy-lob-saw.blogspot.com/2015/03/fixing-ycsb-coordinated-omission.htmlGoogle Scholar
- T. White, Hadoop: The Definitive Guide: The Definitive Guide.\hskip 1em plus 0.5em minus 0.4em\relax O'Reilly Media, 2009.Google Scholar
- R. S. Xin, J. Rosen, M. Zaharia, M. J. Franklin, S. Shenker, and I. Stoica, "Shark: SQL and Rich Analytics at Scale," in Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, 2013.Google Scholar
- M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, "Spark: Cluster Computing with Working Sets," in Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, 2010.Google Scholar
- M. Zaharia, T. Das, H. Li, S. Shenker, and I. Stoica, "Discretized Streams: An Efficient and Fault-tolerant Model for Stream Processing on Large Clusters," in Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Ccomputing, 2012.Google Scholar
- W. Zhu, C.-L. Wang, and F. Lau, "JESSICA2: a distributed Java Virtual Machine with transparent thread migration support," in Proceedings of the IEEE International Conference on Cluster Computing, 2002.Google Scholar
- J. N. Zigman and R. Sankaranarayana, "dJVM-A distributed JVM on a Cluster," Australian National University, Tech. Rep., 2002.Google Scholar
Index Terms
Taurus: A Holistic Language Runtime System for Coordinating Distributed Managed-Language Applications
Recommendations
Taurus: A Holistic Language Runtime System for Coordinating Distributed Managed-Language Applications
ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating SystemsMany distributed workloads in today's data centers are written in managed languages such as Java or Ruby. Examples include big data frameworks such as Hadoop, data stores such as Cassandra or applications such as the SOLR search engine. These workloads ...
Taurus: A Holistic Language Runtime System for Coordinating Distributed Managed-Language Applications
ASPLOS'16Many distributed workloads in today's data centers are written in managed languages such as Java or Ruby. Examples include big data frameworks such as Hadoop, data stores such as Cassandra or applications such as the SOLR search engine. These workloads ...
An on-the-fly mark and sweep garbage collector based on sliding views
Special Issue: Proceedings of the OOPSLA '03 conferenceWith concurrent and garbage collected languages like Java and C# becoming popular, the need for a suitable non-intrusive, efficient, and concurrent multiprocessor garbage collector has become acute. We propose a novel mark and sweep on-the-fly algorithm ...







Comments