Abstract
The CAP theorem is a fundamental result that applies to distributed storage systems. In this article, we first present and prove two CAP-like impossibility theorems. To state these theorems, we present probabilistic models to characterize the three important elements of the CAP theorem: consistency (C), availability or latency (A), and partition tolerance (P). The theorems show the un-achievable envelope, that is, which combinations of the parameters of the three models make them impossible to achieve together. Next, we present the design of a class of systems called Probabilistic CAP (PCAP) that perform close to the envelope described by our theorems. In addition, these systems allow applications running on a single data center to specify either a latency Service Level Agreement (SLA) or a consistency SLA. The PCAP systems automatically adapt, in real time and under changing network conditions, to meet the SLA while optimizing the other C/A metric. We incorporate PCAP into two popular key-value stores: Apache Cassandra and Riak. Our experiments with these two deployments, under realistic workloads, reveal that the PCAP systems satisfactorily meets SLAs and perform close to the achievable envelope. We also extend PCAP from a single data center to multiple geo-distributed data centers.
- Daniel Abadi. 2012. Consistency tradeoffs in modern distributed database system design: CAP is only part of the story. IEEE Comput. 45, 2 (2012), 37--42. Google Scholar
Digital Library
- Ittai Abraham and Dahlia Malkhi. 2005. Probabilistic quorums for dynamic systems. Distrib. Comput. 18, 2 (2005), 113--124. Google Scholar
Digital Library
- Masoud Saeida Ardekani and Douglas B. Terry. 2014. A self-configurable geo-replicated cloud storage system. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI). Broomfield, CO, USA, 367--381. Google Scholar
Digital Library
- K. J. Astrom and T. Hagglund. 1995. PID Controllers: Theory, Design, and Tuning, 2nd Ed. The Instrument, Systems, and Automation Society, Research Triangle Park, NC.Google Scholar
- Hagit Attiya and Jennifer L. Welch. 1994. Sequential consistency versus linearizability. ACM Trans. Comput. Syst. 12, 2 (1994), 91--122. Google Scholar
Digital Library
- Peter Bailis and Ali Ghodsi. 2013. Eventual consistency today: Limitations, extensions, and beyond. ACM Queue 11, 3 (2013), 20:20--20:32. Google Scholar
Digital Library
- Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M. Hellerstein, and Ion Stoica. 2014. Quantifying eventual consistency with PBS. VLDB J. 23, 2 (2014), 279--302.Google Scholar
Cross Ref
- Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M. Hellerstein, and Ion Stoica. 2013. PBS at work: Advancing data management with consistency metrics. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, NY. 1113--1116. Google Scholar
Digital Library
- R. Baldoni, C. Marchetti, and A. Virgillito. 2006. Impact of WAN channel behavior on end-to-end latency of replication protocols. In Proceedings of European Dependable Computing Conference (EDCC). 109--118. Google Scholar
Digital Library
- Jeff Barr. 2013. Real-time ad impression bids using DynamoDB. Retrieved from http://goo.gl/C7gdpc.Google Scholar
- Theophilus Benson, Aditya Akella, and David A. Maltz. 2010. Network traffic characteristics of datacenters in the wild. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement (IMC). 267--280. Google Scholar
Digital Library
- Eric Brewer. 2000. Towards robust distributed systems (abstract). In Proceedings of the 19th Annual ACM Symposium on Principles of Distributed Computing (PODC). Google Scholar
Digital Library
- Eric Brewer. 2010. A certain freedom: Thoughts on the CAP theorem. In Proceedings of the 29th ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC). 335--335. Google Scholar
Digital Library
- Adrian Cockcroft. 2013. Dystopia as a service (invited talk). In Proceedings of the 4th Annual Symposium on Cloud Computing (SoCC). Santa Clara, California.Google Scholar
- Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010a. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC). 143--154. Google Scholar
Digital Library
- Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010b. Yahoo! Cloud serving benchmark (YCSB). Retrieved from http://goo.gl/GiA5c.Google Scholar
- Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010c. Yahoo! Cloud serving benchmark (YCSB) workloads. Retrieved from https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads.Google Scholar
- James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. 2012. Spanner: Google’s globally-distributed database. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI). 251--264. Google Scholar
Digital Library
- DataStax. 2016. Configuring data consistency. Retrieved from https://goo.gl/PKNUXV.Google Scholar
- Aaron Davidson, Aviad Rubinstein, Anirudh Todi, Peter Bailis, and Shivaram Venkataraman. 2013. Adaptive hybrid quorums in practical settings. Retrieved from http://goo.gl/LbRSW3.Google Scholar
- Jeff Dean. 2009. Design, Lessons and Advice from Building Large Distributed Systems. Retrieved from http://goo.gl/HGJqUh.Google Scholar
- Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon’s highly available key-value store. In Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Principles (SOSP). Stevenson, Washington, DC, 205--220. Google Scholar
Digital Library
- Schahram Dustdar and Wolfgang Schreiner. 2005. A survey on web services composition. Int. J. Web Grid Serv. 1, 1 (2005), 1--30. Google Scholar
Digital Library
- Maria Eleftheriou and Marios Mavronicolas. 1999. Linearizability in the presence of drifting clocks and under different delay assumptions. In Distributed Computing. Vol. 1693. 327--341. Google Scholar
Digital Library
- Hua Fan, Aditya Ramaraju, Marlon McKenzie, Wojciech Golab, and Bernard Wong. 2015. Understanding the causes of consistency anomalies in apache Cassandra. Proc. VLDB Endow. 8, 7 (2015), 810--813. Google Scholar
Digital Library
- Armando Fox and Eric A. Brewer. 1999. Harvest, yield, and scalable tolerant systems. In Proceedings of the The Seventh Workshop on Hot Topics in Operating Systems (HotOS). 174--178. Google Scholar
Digital Library
- Bugra Gedik, Scott Schneider, Martin Hirzel, and Kun-Lung Wu. 2014. Elastic scaling for data stream processing. IEEE Trans. Parallel Distrib. Syst. 25, 6 (2014), 1447--1463. Google Scholar
Digital Library
- Seth Gilbert and Nancy Lynch. 2002. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News 33, 2 (2002), 51--59. Google Scholar
Digital Library
- Seth Gilbert and Nancy Lynch. 2012. Perspectives on the CAP theorem. IEEE Comput. 45, 2 (2012), 30--36. Google Scholar
Digital Library
- Eric Gilmore. 2011. Cassandra multi data-center deployment. Retrieved from http://goo.gl/aA8YIS.Google Scholar
- Lisa Glendenning, Ivan Beschastnikh, Arvind Krishnamurthy, and Thomas Anderson. 2011. Scalable consistency in scatter. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP), 15--28. Google Scholar
Digital Library
- Wojciech Golab, Xiaozhou Li, and Mehul A. Shah. 2011. Analyzing consistency properties for fun and profit. In Proceedings of the 30th Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC). ACM, New York, NY, 197--206. Google Scholar
Digital Library
- Wojciech Golab, Muntasir Raihan Rahman, Alvin AuYoung, Kimberly Keeton, and Indranil Gupta. 2014. Client-centric benchmarking of eventual consistency for cloud storage systems. In Proceedings of the 2014 IEEE 34th International Conference on Distributed Computing Systems (ICDCS). Madrid, Spain. Google Scholar
Digital Library
- Wojciech Golab and John Johnson Wylie. 2014. Providing A Measure Representing an Instantaneous Data Consistency Level. (Jan. 2014). US Patent Application 20,140,032,504.Google Scholar
- Andy Gross. 2009. Basho riak. Retrieved from http://basho.com/riak/.Google Scholar
- Joseph L. Hellerstein, Yixin Diao, Sujay Parekh, and Dawn M. Tilbury. 2004. Feedback Control of Computing Systems. John Wiley 8 Sons. Google Scholar
Digital Library
- Herodotos Herodotou, Fei Dong, and Shivnath Babu. 2011. No one (cluster) size fits all: Automatic cluster sizing for data-intensive analytics. In Proceedings of the 2nd ACM Symposium on Cloud Computing (SoCC), 18:1--18:14. Google Scholar
Digital Library
- Sudha Krishnamurthy, William H. Sanders, and Michel Cukier. 2003. An adaptive quality of service aware middleware for replicated services. IEEE Transactions on Parallel and Distributed Systems (TPDS) 14 (2003), 1112--1125. Google Scholar
Digital Library
- Avinash Lakshman and Prashant Malik. 2008. Apache Cassandra. Retrieved from http://cassandra. apache.org/.Google Scholar
- Keith Lang. 2009. Amazon: Milliseconds means money. Retrieved from http://goo.gl/fs9pZb.Google Scholar
- Hyunyoung Lee and Jennifer L. Welch. 2001. Applications of probabilistic quorums to iterative algorithms. In Proceedings of the The 21st International Conference on Distributed Computing Systems (ICDCS). 21. Google Scholar
Digital Library
- Cheng Li, Daniel Porto, Allen Clement, Johannes Gehrke, Nuno Preguica, and Rodrigo Rodrigues. 2012. Making geo-replicated systems fast as possible, consistent when necessary. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI). 265--278. Google Scholar
Digital Library
- Jin Liang and Klara Nahrstedt. 2006. Service composition for generic service graphs. Multimedia Syst. 11, 6 (2006), 568--581. Google Scholar
Digital Library
- Harold C. Lim, Shivnath Babu, and Jeffrey S. Chase. 2010. Automated control for elastic storage. In Proceedings of the 7th International Conference on Autonomic Computing (ICAC). 1--10. Google Scholar
Digital Library
- LinkedIn. 2009. Project Voldemort. Retrieved from http://goo.gl/9uhLoU.Google Scholar
- Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. 2011. Don’t settle for eventual: Scalable causal consistency for wide-area storage with COPS. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP). 401--416. Google Scholar
Digital Library
- Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. 2013. Stronger semantics for low-latency geo-replicated storage. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation (NSDI). 313--328. Google Scholar
Digital Library
- Dahlia Malkhi, Michael Reiter, and Rebecca Wright. 1997. Probabilistic quorum systems. In Proceedings of the Sixteenth Annual ACM Symposium on Principles of Distributed Computing (PODC). 267--273. Google Scholar
Digital Library
- Marios Mavronicolas and Dan Roth. 1999. Linearizable read/write objects. Theoret. Comput. Sci. 220, 1 (1999), 267--319. Google Scholar
Digital Library
- Marlon McKenzie, Hua Fan, and Wojciech Golab. 2015. Fine-tuning the consistency-latency trade-off in quorum-replicated distributed storage systems. In Proceedings of the 2015 IEEE International Conference on Big Data. 1708--1717. Google Scholar
Digital Library
- Christopher Monsanto, Joshua Reich, Nate Foster, Jennifer Rexford, and David Walker. 2013. Composing software-defined networks. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation (NSDI). 1--14. Google Scholar
Digital Library
- Ron Peled. 2010. Activo: “Why low latency matters?” Retrieved from http://goo.gl/2XQ8Ul.Google Scholar
- Dorian Perkins, Nitin Agrawal, Akshat Aranya, Curtis Yu, Younghwan Go, Harsha V. Madhyastha, and Cristian Ungureanu. 2015. Simba: Tunable end-to-end data consistency for mobile apps. In Proceedings of the 10th European Conference on Computer Systems (EuroSys). 7:1--7:16. Google Scholar
Digital Library
- Marc Shapiro. 1986. Structure and encapsulation in distributed systems: The proxy principle. In Proceedings of the International Conference on Distributed Computing Systems (ICDCS). 198--204.Google Scholar
- Marc Shapiro, Nuno M. Preguiça, Carlos Baquero, and Marek Zawirski. 2011. Conflict-free replicated data types. In Proceedings of the 13th International Conference on Stabilization, Safety, and Security of Distributed Systems (SSS). 386--400. Google Scholar
Digital Library
- K. C. Sivaramakrishnan, Gowtham Kaki, and Suresh Jagannathan. 2015. Declarative programming over eventually consistent data stores. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 413--424. Google Scholar
Digital Library
- Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. 2001. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM). 149--160. Google Scholar
Digital Library
- Shlomo Swidler. 2009. Consistency in Amazon S3. Retrieved from http://goo.gl/yhAoJy.Google Scholar
- Douglas B. Terry, Vijayan Prabhakaran, Ramakrishna Kotla, Mahesh Balakrishnan, Marcos K. Aguilera, and Hussam Abu-Libdeh. 2013. Consistency-based service level agreements for cloud storage. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP). 309--324. Google Scholar
Digital Library
- Werner Vogels. 2009. Eventually consistent. Commun. ACM (2009), 40--44. Google Scholar
Digital Library
- Hengfeng Wei, Yu Huang, Jiannong Cao, and Jian Lu. 2015. Almost strong consistency: “Good enough” in distributed storage systems. Retrieved from http://arxiv.org/abs/1507.01663.Google Scholar
- Brian White, Jay Lepreau, Leigh Stoller, Robert Ricci, Shashi Guruprasad, Mac Newbold, Mike Hibler, Chad Barb, and Abhijeet Joglekar. 2002. An integrated experimental environment for distributed systems and networks. In Proceedings of the Fifth Symposium on Operating Systems Design and Implementation (OSDI). 255--270. Google Scholar
Digital Library
- Haifeng Yu and Amin Vahdat. 2002. Design and evaluation of a conit-based continuous consistency model for replicated services. ACM Trans. Comput. Syst. 20, 3 (2002), 239--282. Google Scholar
Digital Library
- Marek Zawirski, Nuno Preguiça, Sérgio Duarte, Annette Bieniusa, Valter Balegas, and Marc Shapiro. 2015. Write fast, read in the past: Causal consistency for client-side applications. In Proceedings of the 16th Annual Middleware Conference. 75--87. Google Scholar
Digital Library
- Chi Zhang and Zheng Zhang. 2003. Trading replication consistency for performance and availability: An adaptive approach. In Proceedings of the 23rd International Conference on Distributed Computing Systems (ICDCS). 687--695. Google Scholar
Digital Library
Index Terms
Characterizing and Adapting the Consistency-Latency Tradeoff in Distributed Key-Value Stores
Recommendations
Building Efficient Key-Value Stores via a Lightweight Compaction Tree
Special Issue on MSST 2017 and Regular PapersLog-Structure Merge tree (LSM-tree) has been one of the mainstream indexes in key-value systems supporting a variety of write-intensive Internet applications in today’s data centers. However, the performance of LSM-tree is seriously hampered by ...
Robust data sharing with key-value stores
PODC '11: Proceedings of the 30th annual ACM SIGACT-SIGOPS symposium on Principles of distributed computingA key-value store (KVS) offers functions for storing and retrieving values associated with unique keys. KVSs have become widely used as shared storage solutions for Internet-scale distributed applications.
We present a fault-tolerant wait-free efficient ...
LSM-tree managed storage for large-scale key-value store
SoCC '17: Proceedings of the 2017 Symposium on Cloud ComputingKey-value stores are increasingly adopting LSM-trees as their enabling data structure in the backend storage, and persisting their clustered data through a file system. A file system is expected to not only provide file/directory abstraction to organize ...






Comments