skip to main content
research-article
Open Access

Taurus: A Holistic Language Runtime System for Coordinating Distributed Managed-Language Applications

Published:25 March 2016Publication History
Skip Abstract Section

Abstract

Many distributed workloads in today's data centers are written in managed languages such as Java or Ruby. Examples include big data frameworks such as Hadoop, data stores such as Cassandra or applications such as the SOLR search engine. These workloads typically run across many independent language runtime systems on different nodes. This setup represents a source of inefficiency, as these language runtime systems are unaware of each other. For example, they may perform Garbage Collection at times that are locally reasonable but not in a distributed setting.

We address these problems by introducing the concept of a Holistic Runtime System that makes runtime-level decisions for the entire distributed application rather than locally. We then present Taurus, a Holistic Runtime System prototype. Taurus is a JVM drop-in replacement, requires almost no configuration and can run unmodified off-the-shelf Java applications. Taurus enforces user-defined coordination policies and provides a DSL for writing these policies.

By applying Taurus to Garbage Collection, we demonstrate the potential of such a system and use it to explore coordination strategies for the runtime systems of real-world distributed applications, to improve application performance and address tail-latencies in latency-sensitive workloads.

References

  1. "The Apache Cassandra Project." [Online]. Available: http://cassandra.apache.org/Google ScholarGoogle Scholar
  2. "Apache Harmony." [Online]. Available: http://harmony.apache.org/Google ScholarGoogle Scholar
  3. "ART vs Dalvik - introducing the new Android runtime in KitKat." [Online]. Available: http://www.infinum.co/the-capsized-eight/articles/art-vs-dalvik-introducing-the-new-android-runtime-in-kit-katGoogle ScholarGoogle Scholar
  4. "Credit Suisse Case Study." [Online]. Available: http://www.azulsystems.com/customers/creditsuisseGoogle ScholarGoogle Scholar
  5. "G1: One Garbage Collector To Rule Them All." [Online]. Available: http://www.infoq.com/articles/G1-One-Garbage-Collector-To-Rule-Them-AllGoogle ScholarGoogle Scholar
  6. "Garbage Collection Notifications." [Online]. Available: https://msdn.microsoft.com/en-us/library/cc713687(v=vs.110).aspxGoogle ScholarGoogle Scholar
  7. "Google App Engine: Platform as a Service."Google ScholarGoogle Scholar
  8. "Hack: a new programming language for HHVM." [Online]. Available: https://code.facebook.com/posts/264544830379293/hack-a-new-programming-language-for-hhvm/Google ScholarGoogle Scholar
  9. "HDFS Issue 7244: "Reduce Namenode memory using Flyweight pattern"." [Online]. Available: https://issues.apache.org/jira/browse/HDFS-7244Google ScholarGoogle Scholar
  10. "Inside .NET Native (Channel 9)." [Online]. Available: http://channel9.msdn.com/Shows/GoingGoogle ScholarGoogle Scholar
  11. Deep/Inside-NET-NativeGoogle ScholarGoogle Scholar
  12. "JSR-000121 Application Isolation API Specification." [Online]. Available: https://jcp.org/aboutJava/communityprocess/final/jsr121/Google ScholarGoogle Scholar
  13. "LogCabin (GitHub)." [Online]. Available: http://github.com/logcabin/logcabinGoogle ScholarGoogle Scholar
  14. "Microsoft Windows Azure." [Online]. Available: http://www.windowsazure.com/Google ScholarGoogle Scholar
  15. "On Garbage Collection." [Online]. Available: http://hhvm.com/blog/431/on-garbage-collectionGoogle ScholarGoogle Scholar
  16. "Predictable Low Latency: "Cinnober on GC pause-free Java applications through orchestrated memory management"," Tech. Rep. [Online]. Available: http://www.cinnober.com/sites/cinnober.com/files/news/Cinnober%20on%20GC%20pause%20free%20Java%20applications.pdfGoogle ScholarGoogle Scholar
  17. "Project Tungsten: Bringing Spark Closer to Bare Metal." [Online]. Available: https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.htmlGoogle ScholarGoogle Scholar
  18. "Twitter Shifting More Code to JVM, Citing Performance and Encapsulation As Primary Drivers." [Online]. Available: http://www.infoq.com/articles/twitter-java-useGoogle ScholarGoogle Scholar
  19. "ZooKeeper SessionExpired events," in Apache HBase Reference Guide.\hskip 1em plus 0.5em minus 0.4em\relax Apache HBase Team. [Online]. Available: http://hbase.apache.org/book.htmlGoogle ScholarGoogle Scholar
  20. O. Anderson, E. Fortuna, L. Ceze, and S. Eggers, "Checked Load: Architectural Support for JavaScript Type-checking on Mobile Processors," in Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture, 2011.Google ScholarGoogle Scholar
  21. J. Andersson, S. Weber, E. Cecchet, C. Jensen, and V. Cahill, "Kaffemik - a Distributed JVM Featuring a Single Address Space Architecture," in Proceedings of the 2001 Symposium on Java Virtual Machine Research and Technology Symposium, 2001.Google ScholarGoogle Scholar
  22. Y. Aridor, M. Factor, and A. Teperman, "cJVM: A single system image of a JVM on a cluster," in Proceedings of the 1999 International Conference on Parallel Processing, 1999.Google ScholarGoogle Scholar
  23. S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann, "The DaCapo Benchmarks: Java Benchmarking Development and Analysis," in Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programing, Systems, Languages, and Applications, 2006.Google ScholarGoogle Scholar
  24. J. Bonér and E. Kuleshov, "Clustering the Java Virtual Machine using Aspect-Oriented Programming," in Proceedings of the 6th International Conference on Aspect-Oriented Software Development, 2007.Google ScholarGoogle Scholar
  25. C. Cameron, J. Singer, and D. Vengerov, "The Judgment of Forseti: Economic Utility for Dynamic Heap Sizing of Multiple Runtimes," in Proceedings of the 2015 ACM SIGPLAN International Symposium on Memory Management, 2015.Google ScholarGoogle Scholar
  26. T. Cao, S. M. Blackburn, T. Gao, and K. S. McKinley, "The Yin and Yang of Power and Performance for Asymmetric Hardware and Managed Software," in Proceedings of the 39th Annual International Symposium on Computer Architecture, 2012.Google ScholarGoogle Scholar
  27. D. Cheriton, "The V Distributed System," Commun. ACM, vol. 31, no. 3, pp. 314--333, Mar. 1988.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. A. Colmenares, G. Eads, S. Hofmeyr, S. Bird, M. Moretó, D. Chou, B. Gluzman, E. Roman, D. B. Bartolini, N. Mor, K. Asanović, and J. D. Kubiatowicz, "Tessellation: Refactoring the OS Around Explicit Resource Containers with Continuous Adaptation," in Proceedings of the 50th Annual Design Automation Conference, 2013.Google ScholarGoogle Scholar
  29. B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears, "Benchmarking Cloud Serving Systems with YCSB," in Proceedings of the 1st ACM Symposium on Cloud Computing, 2010.Google ScholarGoogle Scholar
  30. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels, "Dynamo: Amazon's Highly Available Key-value Store," in Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Principles, 2007.Google ScholarGoogle Scholar
  31. C. Delimitrou and C. Kozyrakis, "Quasar: Resource-efficient and QoS-aware Cluster Management," in Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, 2014.Google ScholarGoogle Scholar
  32. H. Fan, A. Ramaraju, M. McKenzie, W. Golab, and B. Wong, "Understanding the Causes of Consistency Anomalies in Apache Cassandra," Proceedings of the VLDB Endowment, vol. 8, no. 7, 2015.Google ScholarGoogle Scholar
  33. L. Gidra, G. Thomas, J. Sopena, M. Shapiro, and N. Nguyen, "NumaGiC: A Garbage Collector for Big Data on Big NUMA Machines," in Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015.Google ScholarGoogle Scholar
  34. I. Gog, J. Giceva, M. Schwarzkopf, K. Viswani, D. Vytiniotis, G. Ramalingan, M. Costa, D. Murray, S. Hand, and M. Isard, "Broom: sweeping out Garbage Collection from Big Data systems," in Proceedings of the 15th USENIX/ACM Workshop on Hot Topics in Operating Systems (HotOS 2015), 2015.Google ScholarGoogle Scholar
  35. J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica, "GraphX: Graph Processing in a Distributed Dataflow Framework," in Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, 2014.Google ScholarGoogle Scholar
  36. T. Harris, M. Maas, and V. J. Marathe, "Callisto: Co-scheduling Parallel Runtime Systems," in Proceedings of the Ninth European Conference on Computer Systems, 2014.Google ScholarGoogle Scholar
  37. B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica, "Mesos: a platform for fine-grained resource sharing in the data center," in Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, 2011.Google ScholarGoogle Scholar
  38. P. Hintjens, "ZeroMQ: The Guide," Tech. Rep., 2010. [Online]. Available: http://zguide.zeromq.org/page:allGoogle ScholarGoogle Scholar
  39. G. C. Hunt and J. R. Larus, "Singularity: Rethinking the Software Stack," SIGOPS Oper. Syst. Rev., vol. 41, no. 2, pp. 37--49, Apr. 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. P. Hunt, M. Konar, F. P. Junqueira, and B. Reed, "ZooKeeper: Wait-free Coordination for Internet-scale Systems," in Proceedings of the 2010 USENIX Annual Technical Conference, 2010.Google ScholarGoogle Scholar
  41. R. Jones and R. Lins, Garbage Collection: Algorithms for Automatic Dynamic Memory Management.\hskip 1em plus 0.5em minus 0.4em\relax Wiley, Sep. 1996.Google ScholarGoogle Scholar
  42. M. Jordan, L. Daynès, G. Czajkowski, M. Jarzab, and C. Bryce, "Scaling J2EE Application Servers with the Multi-tasking Virtual Machine," Sun Microsystems, Inc., Mountain View, CA, USA, Tech. Rep., 2004.Google ScholarGoogle Scholar
  43. M. Kornacker, A. Behm, V. Bittorf, T. Bobrovytsky, C. Ching, A. Choi, J. Erickson, M. Grund, D. Hecht, M. Jacobs, I. Joshi, L. Kuff, D. Kumar, A. Leblang, N. Li, I. Pandis, H. Robinson, D. Rorke, S. Rus, J. Russell, D. Tsirogiannis, S. Wanderman-Milne, and M. Yoder, "Impala: A modern, open-source SQL engine for hadoop," in Seventh Biennial Conference on Innovative Data Systems Research, 2015.Google ScholarGoogle Scholar
  44. M. A. Laurenzano, Y. Zhang, L. Tang, and J. Mars, "Protean Code: Achieving Near-Free Online Code Transformations for Warehouse Scale Computers," in Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014.Google ScholarGoogle Scholar
  45. E. D. Lazowska, H. M. Levy, G. T. Almes, M. J. Fischer, R. J. Fowler, and S. C. Vestal, "The Architecture of the Eden System," in Proceedings of the Eighth ACM Symposium on Operating Systems Principles, 1981.Google ScholarGoogle Scholar
  46. M. Maas, K. Asanovic, T. Harris, and J. Kubiatowicz, "The Case for the Holistic Language Runtime System," in First International Workshop on Rack-scale Computing (WRSC '14), 2014.Google ScholarGoogle Scholar
  47. M. Maas, T. Harris, K. Asanovic, and J. Kubiatowicz, "Trash Day: Coordinating Garbage Collection in Distributed Systems," in Proceedings of the 15th USENIX/ACM Workshop on Hot Topics in Operating Systems (HotOS 2015), 2015.Google ScholarGoogle Scholar
  48. M. Maas and R. McIlroy, "A JVM for the Barrelfish Operating System," in 2nd Workshop on Systems for Future Multi-core Architectures (SFMA '12), 2012.Google ScholarGoogle Scholar
  49. L. A. Meyerovich and A. S. Rabkin, "Empirical Analysis of Programming Language Adoption," in Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, 2013.Google ScholarGoogle Scholar
  50. S. Mullender, G. van Rossum, A. Tananbaum, R. van Renesse, and H. van Staveren, "Amoeba: a distributed operating system for the 1990s," Computer, vol. 23, no. 5, pp. 44--53, May 1990.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. K. Nguyen, K. Wang, Y. Bu, L. Fang, J. Hu, and G. Xu, "FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications," in Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015.Google ScholarGoogle Scholar
  52. D. Ongaro and J. Ousterhout, "In Search of an Understandable Consensus Algorithm," in Proceedings of the 2014 USENIX Annual Technical Conference, 2014.Google ScholarGoogle Scholar
  53. J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra, A. Narayanan, G. Parulkar, M. Rosenblum, S. M. Rumble, E. Stratmann, and R. Stutsman, "The Case for RAMClouds: Scalable High-performance Storage Entirely in DRAM," SIGOPS Oper. Syst. Rev., vol. 43, no. 4, pp. 92--105, Jan. 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. J. K. Ousterhout, A. R. Cherenson, F. Douglis, M. N. Nelson, and B. B. Welch, "The Sprite Network Operating System," Computer, vol. 21, no. 2, pp. 23--36, Feb. 1988.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. K. Ousterhout, R. Rasti, S. Ratnasamy, S. Shenker, and B.-G. Chun, "Making Sense of Performance in Data Analytics Frameworks," in 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15), 2015.Google ScholarGoogle Scholar
  56. A. Portillo-Dominguez, M. Wang, J. Murphy, and D. Magoni, "Adaptive GC-Aware Load Balancing Strategy for High-Assurance Java Distributed Systems," in 16th International Symposium on High Assurance Systems Engineering (HASE), 2015.Google ScholarGoogle Scholar
  57. A. O. Portillo-Domínguez, M. Wang, D. Magoni, P. Perry, and J. Murphy, "Load balancing of Java applications by forecasting garbage collections," 2014.Google ScholarGoogle Scholar
  58. M. Schwarzkopf, M. P. Grosvenor, and S. Hand, "New Wine in Old Skins: The Case for Distributed Operating Systems in the Data Center," in Proceedings of the 4th Asia-Pacific Workshop on Systems, 2013.Google ScholarGoogle Scholar
  59. M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, and J. Wilkes, "Omega: flexible, scalable schedulers for large compute clusters," in Proceedings of the 8th European Conference on Computer Systems, 2013.Google ScholarGoogle Scholar
  60. J. Simão, J. Lemos, and L. Veiga, "A2-VM : A Cooperative Java VM with Support for Resource-Awareness and Cluster-Wide Thread Scheduling," in On the Move to Meaningful Internet Systems: O™ 2011, ser. Lecture Notes in Computer Science, 2011.Google ScholarGoogle Scholar
  61. D. Smiley and D. E. Pugh, Apache Solr 3 Enterprise Search Server.\hskip 1em plus 0.5em minus 0.4em\relax Packt Publishing Ltd, 2011.Google ScholarGoogle Scholar
  62. G. Tene, B. Iyengar, and M. Wolf, "C4: The Continuously Concurrent Compacting Collector," in Proceedings of the International Symposium on Memory Management, 2011.Google ScholarGoogle Scholar
  63. D. Terei and A. Levy, "Blade: A Data Center Garbage Collector," arXiv:1504.02578 [cs], Apr. 2015, arXiv: 1504.02578.Google ScholarGoogle Scholar
  64. D. Tsafrir, Y. Etsion, D. G. Feitelson, and S. Kirkpatrick, "System Noise, OS Clock Ticks, and Fine-grained Parallel Applications," in Proceedings of the 19th Annual International Conference on Supercomputing, 2005.Google ScholarGoogle Scholar
  65. K. Varda, Protocol buffers: Google's data interchange format, 2008.Google ScholarGoogle Scholar
  66. N. Wakart, "Correcting YCSB's Coordinated Omission problem," Mar. 2015. [Online]. Available: http://psy-lob-saw.blogspot.com/2015/03/fixing-ycsb-coordinated-omission.htmlGoogle ScholarGoogle Scholar
  67. T. White, Hadoop: The Definitive Guide: The Definitive Guide.\hskip 1em plus 0.5em minus 0.4em\relax O'Reilly Media, 2009.Google ScholarGoogle Scholar
  68. R. S. Xin, J. Rosen, M. Zaharia, M. J. Franklin, S. Shenker, and I. Stoica, "Shark: SQL and Rich Analytics at Scale," in Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, 2013.Google ScholarGoogle Scholar
  69. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, "Spark: Cluster Computing with Working Sets," in Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, 2010.Google ScholarGoogle Scholar
  70. M. Zaharia, T. Das, H. Li, S. Shenker, and I. Stoica, "Discretized Streams: An Efficient and Fault-tolerant Model for Stream Processing on Large Clusters," in Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Ccomputing, 2012.Google ScholarGoogle Scholar
  71. W. Zhu, C.-L. Wang, and F. Lau, "JESSICA2: a distributed Java Virtual Machine with transparent thread migration support," in Proceedings of the IEEE International Conference on Cluster Computing, 2002.Google ScholarGoogle Scholar
  72. J. N. Zigman and R. Sankaranarayana, "dJVM-A distributed JVM on a Cluster," Australian National University, Tech. Rep., 2002.Google ScholarGoogle Scholar

Index Terms

  1. Taurus: A Holistic Language Runtime System for Coordinating Distributed Managed-Language Applications

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!