Abstract
Distributed consistency is perhaps the most-discussed topic in distributed systems today. Coordination protocols can ensure consistency, but in practice they cause undesirable performance unless used judiciously. Scalable distributed architectures avoid coordination whenever possible, but under-coordinated systems can exhibit behavioral anomalies under fault, which are often extremely difficult to debug. This raises significant challenges for distributed system architects and developers.
In this article, we present Blazes, a cross-platform program analysis framework that (a) identifies program locations that require coordination to ensure consistent executions, and (b) automatically synthesizes application-specific coordination code that can significantly outperform general-purpose techniques. We present two case studies, one using annotated programs in the Twitter Storm system and another using the Bloom declarative language.
- Daniel Abadi. 2012. Consistency tradeoffs in modern distributed database system design: CAP is only part of the story. Computer 45, 2 (2012), 37--42. Google Scholar
Digital Library
- Daniel J. Abadi, Donald Carney, Ugur Çetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stanley B. Zdonik. 2003. Aurora: A new model and architecture for data stream management. VLDB J. 12, 2 (Aug. 2003), 120--139. Google Scholar
Digital Library
- Rakesh Agrawal et al. 2008. The Claremont report on database research. SIGMOD Rec. 37 (2008), 9--19. Google Scholar
Digital Library
- Peter Alvaro, Peter Bailis, Neil Conway, and Joseph M. Hellerstein. 2013. Consistency without borders. In Proceedings of the SoCC. Google Scholar
Digital Library
- Peter Alvaro, Tyson Condie, Neil Conway, Khaled Elmeleegy, Joseph M. Hellerstein, and Russell Sears. 2010a. BOOM analytics: Exploring Data-centric, Declarative Programming for the Cloud. In Proceedings of the EuroSys. Google Scholar
Digital Library
- Peter Alvaro, Tyson Condie, Neil Conway, Joseph M. Hellerstein, and Russell Sears. 2010b. I do declare: Consensus in a logic language. ACM SIGOPS Operating Systems Review 43, 4 (2010), 25--30. Google Scholar
Digital Library
- Peter Alvaro, Neil Conway, Joseph M. Hellerstein, and David Maier. 2014. Blazes: Coordination Analysis for Distributed Programs. In Proceedings of the ICDE. Google Scholar
Cross Ref
- Peter Alvaro, Neil Conway, Joseph M. Hellerstein, and William R. Marczak. 2011. Consistency analysis in bloom: A CALM and collected approach. In Proceedings of the CIDR.Google Scholar
- Tom J. Ameloot, Frank Neven, and Jan Van den Bussche. 2011. Relational transducers for declarative networking. In Proceedings of the PODS. Google Scholar
Digital Library
- Arvind Arasu, Shivnath Babu, and Jennifer Widom. 2006. The CQL continuous query language: Semantic Foundations and query execution. VLDB J. 15, 2 (June 2006). Google Scholar
Digital Library
- Peter Bailis, Alan Fekete, Michael J. Franklin, Ali Ghodsi, Joseph M. Hellerstein, and Ion Stoica. 2014. Coordination avoidance in database systems. Proceedings of the VLDB Endowment (2014).Google Scholar
Digital Library
- Jason Baker, Chris Bond, James C. Corbett, J. J. Furman, Andrey Khorlin, James Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, and Vadim Yushprakh. 2011. Megastore: Providing scalable, highly available storage for interactive services. In Proceedings of the CIDR.Google Scholar
- Magdalena Balazinska, Jeong-Hyon Hwang, and Mehul A. Shah. 2009. Fault-tolerance and high availability in data stream management systems. In Encyclopedia of Database Systems, Ling Liu and M. Tamer Özsu (Eds.). Springer U.S., 1109--1115.Google Scholar
- Valter Balegas, Sérgio Duarte, Carla Ferreira, Rodrigo Rodrigues, Nuno Preguiça, Mahsa Najafzadeh, and Marc Shapiro. 2015. Putting consistency back into eventual consistency. InProceedings of the EuroSys.Google Scholar
- Ken Birman, Gregory Chockler, and Robbert van Renesse. 2009. Toward a cloud computing research agenda. SIGACT News 40, 2 (June 2009), 68--80. Google Scholar
Digital Library
- Kenneth P. Birman and Robert V. Renesse. 1994. Reliable Distributed Computing with the ISIS Toolkit. IEEE Computer Society Press, Los Alamitos, CA.Google Scholar
- Bloom. 2012. Bloom programming language. Retrieved from http://www.bloom-lang.org.Google Scholar
- Eric Brewer. 2012. CAP twelve years later: How the “rules” have changed. IEEE Comput. 45 (2012), 23--29. Google Scholar
Digital Library
- Mike Burrows. 2006. The Chubby lock service for loosely-coupled distributed systems. In Proceedings of the OSDI.Google Scholar
- Rick Cattell. 2010. Scalable SQL and NoSQL data stores. SIGMOD Rec. (2010), 12--27.Google Scholar
- Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, Michael J. Franklin, Joseph M. Hellerstein, Wei Hong, Sailesh Krishnamurthy, Sam Madden, Vijayshankar Raman, Fred Reiss, and Mehul Shah. 2003. TelegraphCQ: Continuous dataflow processing for an uncertain world. In Proceedings of the CIDR. Google Scholar
Digital Library
- A. Church and J. B. Rosser. 1936. Some properties of conversion. Trans. Amer. Math. Soc. (1936).Google Scholar
- Neil Conway, William R. Marczak, Peter Alvaro, Joseph M. Hellerstein, and David Maier. 2012. Logic and lattices for distributed programming. In Proceedings of the SoCC. Google Scholar
Cross Ref
- James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. 2012. Spanner: Google’s globally-distributed database. In Proceedings of the OSDI.Google Scholar
- Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon’s highly available key-value store. In Proceedings of the SOSP. Google Scholar
Digital Library
- Matthias Felleisen, Robert Bruce Findler, and Matthew Flatt. 2009. Semantics Engineering with PLT Redex (1st ed.). The MIT Press.Google Scholar
Digital Library
- Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. 1985. Impossibility of distributed consensus with one faulty process. J. ACM 32, 2 (April 1985), 374--382. Google Scholar
Digital Library
- Message P. Forum. 1994. MPI: A Message-Passing Interface Standard. Technical Report. Knoxville, TN.Google Scholar
Digital Library
- Alexey Gotsman, Hongseok Yang, Carla Ferreira, Mahsa Najafzadeh, and Marc Shapiro. 2016. ’Cause I’M strong enough: Reasoning about consistency choices in distributed systems. InProceedings of the POPL.Google Scholar
- Pat Helland. 2007. Life beyond distributed transactions: an apostate’s opinion. In Proceedings of the CIDR.Google Scholar
- Joseph M. Hellerstein. 2010. The declarative imperative: Experiences and conjectures in distributed logic. SIGMOD Rec. 39, 1 (2010), 5--19. Google Scholar
Digital Library
- Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. 2010. ZooKeeper: Wait-free coordination for internet-scale systems. In Proceedings of the USENIX ATC.Google Scholar
- Flavio P. Junqueira, Benjamin C. Reed, and Marco Serafini. 2011. Zab: High-performance broadcast for primary-backup systems. In Proceedings of the DSN. Google Scholar
Digital Library
- G. Kahn. 1974. The semantics of a simple language for parallel programming. In Information Processing, J. L. Rosenfeld (Ed.). North Holland, Amsterdam, Stockholm, Sweden, 471--475.Google Scholar
- Lindsey Kuper and Ryan R. Newton. 2012. A Lattice-Theoretical Approach to Deterministic Parallelism with Shared State. Technical Report TR702. Indiana University. Retrieved from http://www.cs.indiana.edu/cgi-bin/techreports/TRNNN.cgi?trnum=TR702.Google Scholar
- Leslie Lamport. 1998. The part-time parliament. ACM Trans. Comput. Syst. 16, 2 (May 1998), 133--169. Google Scholar
Digital Library
- Jonathan Leibiusky, Gabriel Eisbruch, and Dario Simonassi. 2012. Getting Started with Storm—Continuous Streaming Computation with Twitter’s Cluster Technology. O’Reilly.Google Scholar
- Cheng Li, Daniel Porto, Allen Clement, Johannes Gehrke, Nuno Preguiça, and Rodrigo Rodrigues. 2012. Making geo-replicated systems fast as possible, consistent when necessary. In Proceedings of the OSDI.Google Scholar
- Jin Li, Kristin Tufte, Vladislav Shkapenyuk, Vassilis Papadimos, Theodore Johnson, and David Maier. 2008. Out-of-order Processing: A new architecture for high-performance stream systems. PVLDB 1, 1 (2008), 274--288. Google Scholar
Digital Library
- Boon Thau Loo, Tyson Condie, Minos Garofalakis, David E. Gay, Joseph M. Hellerstein, Petros Maniatis, Raghu Ramakrishnan, Timothy Roscoe, and Ion Stoica. 2006. Declarative networking: Language, execution and optimization. In Proceedings of the SIGMOD. Google Scholar
Digital Library
- Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed graphlab: A framework for machine learning and data mining in the cloud. In Proceedings of the VLDB. Google Scholar
Digital Library
- David Maier, Alberto O. Mendelzon, and Yehoshua Sagiv. 1979. Testing implications of data dependencies. ACM Trans. Database Syst. 4 (1979), 455--469. Google Scholar
Digital Library
- Christopher Meiklejohn and Peter Van Roy. 2015. Lasp: A language for distributed, coordination-free programming. InProceedings of the PPDP.Google Scholar
- Leonardo Neumeyer, Bruce Robbins, Anish Nair, and Anand Kesari. 2010. S4: Distributed stream computing platform. In Proceedings of the ICDMW. Google Scholar
Digital Library
- Diego Ongaro and John Ousterhout. 2014. In search of an understandable consensus algorithm. In Proceedings of the USENIX ATC.Google Scholar
- Benjamin Recht, Christopher Re, Stephen J. Wright, and Feng Niu. 2011. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Proceedings of the NIPS.Google Scholar
- Fred B. Schneider. 1990. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv. 22, 4 (Dec. 1990). Google Scholar
Digital Library
- Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. 2011. A comprehensive study of Convergent and Commutative Replicated Data Types. Research report. INRIA.Google Scholar
- K. C. Sivaramakrishnan, Gowtham Kaki, and Suresh Jagannathan. 2015. Declarative programming over eventually consistent data stores. In Proceedings of the PLDI.Google Scholar
Digital Library
- Peter A. Tucker, David Maier, Tim Sheard, and Leonidas Fegaras. 2003. Exploiting punctuation semantics in continuous data streams. TKDE 15, 3 (2003), 555--568. Google Scholar
Digital Library
- Leslie G. Valiant. 1999. A bridging model for parallel computation. Commun. ACM (1999).Google Scholar
- Werner Vogels. 2009. Eventually consistent. CACM 52, 1 (Jan. 2009), 40--44. http://doi.acm.org/10.1145/1435417.1435432. Google Scholar
Digital Library
- Matei Zaharia, Tathagata Das, Haoyuan Li, Scott Shenker, and Ion Stoica. 2012. Discretized streams: An efficient and fault-tolerant model for stream processing on large clusters. In Proceedings of the HotCloud.Google Scholar
- Daniel Zinn, Todd J. Green, and Bertram Ludäscher. 2012. Win-move is coordination-free (Sometimes). In Proceedings of the ICDT. Google Scholar
Digital Library
Index Terms
Blazes: Coordination Analysis and Placement for Distributed Programs





Comments