Abstract
We introduce consistency-aware durability or Cad, a new approach to durability in distributed storage that enables strong consistency while delivering high performance. We demonstrate the efficacy of this approach by designing cross-client monotonic reads, a novel and strong consistency property that provides monotonic reads across failures and sessions in leader-based systems; such a property can be particularly beneficial in geo-distributed and edge-computing scenarios. We build Orca, a modified version of ZooKeeper that implements Cad and cross-client monotonic reads. We experimentally show that Orca provides strong consistency while closely matching the performance of weakly consistent ZooKeeper. Compared to strongly consistent ZooKeeper, Orca provides significantly higher throughput (1.8--3.3×) and notably reduces latency, sometimes by an order of magnitude in geo-distributed settings. We also implement Cad in Redis and show that the performance benefits are similar to that of Cad’s implementation in ZooKeeper.
- Ramnatthan Alagappan, Aishwarya Ganesan, Jing Liu, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2018. Fault-tolerance, fast and slow: Exploiting failure asynchrony in distributed systems. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (OSDI’18). Google Scholar
Digital Library
- Ramnatthan Alagappan, Aishwarya Ganesan, Yuvraj Patel, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. Correlated crash vulnerabilities. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16). Google Scholar
Digital Library
- Apache. ZooKeeper. Retrieved from https://zookeeper.apache.org/.Google Scholar
- Apache. ZooKeeper Configuration Parameters. Retrieved from https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_configuration.Google Scholar
- Apache. ZooKeeper Guarantees, Properties, and Definitions. Retrieved from https://zookeeper.apache.org/doc/r3.2.2/zookeeperInternals.html#sc_guaranteesPropertiesDefinitions.Google Scholar
- Apache. ZooKeeper Leader Activation. Retrieved from https://zookeeper.apache.org/doc/r3.2.2/zookeeperInternals.html#sc_leaderElection.Google Scholar
- Apache. ZooKeeper Overview. Retrieved from https://zookeeper.apache.org/doc/r3.5.1-alpha/zookeeperOver.html.Google Scholar
- Apache ZooKeeper. ZooKeeper Consistency Guarantees. Retrieved from https://zookeeper.apache.org/doc/r3.3.3/zookeeperProgrammers.html#ch_zkGuarantees.Google Scholar
- Apache ZooKeeper. ZooKeeper Programmer’s Guide - ZooKeeper Stat Structure. Retrieved from https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#sc_zkStatStructure.Google Scholar
- Peter Bailis, Ali Ghodsi, Joseph M. Hellerstein, and Ion Stoica. 2013. Bolt-on causal consistency. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’13). Google Scholar
Digital Library
- William J. Bolosky, Dexter Bradshaw, Randolph B. Haagens, Norbert P. Kusters, and Peng Li. 2011. Paxos replicated state machines as the basis of a high-performance data store. In Proceedings of the 8th Symposium on Networked Systems Design and Implementation (NSDI’11). Google Scholar
Digital Library
- Randal C. Burns, Robert M. Rees, and Darrell D. E. Long. 2001. An analytical study of opportunistic lease renewal. In Proceedings of the International Conference on Distributed Computing Systems (ICDCS’01). Google Scholar
Digital Library
- Vijay Chidambaram, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2013. Optimistic crash consistency. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). Google Scholar
Digital Library
- James Cipar, Greg Ganger, Kimberly Keeton, Charles B. Morrey III, Craig A. N. Soules, and Alistair Veitch. 2012. LazyBase: Trading freshness for performance in a scalable database. In Proceedings of the EuroSys Conference (EuroSys’12). Google Scholar
Digital Library
- Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the ACM Symposium on Cloud Computing (SOCC’10). Google Scholar
Digital Library
- Alan Demers, Dan Greene, Carl Hauser, Wes Irish, John Larson, Scott Shenker, Howard Sturgis, Dan Swinehart, and Doug Terry. 1987. Epidemic algorithms for replicated database maintenance. In Proceedings of the 26th ACM Symposium on Principles of Distributed Computing. Google Scholar
Digital Library
- Aishwarya Ganesan, Ramnatthan Alagappan, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2020. Strong and efficient consistency with consistency-aware durability. In Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST’20).Google Scholar
- Yilong Geng, Shiyu Liu, Zi Yin, Ashish Naik, Balaji Prabhakar, Mendel Rosenblum, and Amin Vahdat. 2018. Exploiting a natural network effect for scalable, fine-grained clock synchronization. In Proceedings of the 15th Symposium on Networked Systems Design and Implementation (NSDI’18). Google Scholar
Digital Library
- Cary G. Gray and David Cheriton. 1989. Leases: An efficient fault-tolerant mechanism for distributed file cache consistency. In Proceedings of the 12th ACM Symposium on Operating Systems Principles (SOSP’89). Google Scholar
Digital Library
- Rachid Guerraoui, Matej Pavlovic, and Dragos-Adrian Seredinschi. 2016. Incremental consistency guarantees for replicated objects. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16). Google Scholar
Digital Library
- Maurice P. Herlihy and Jeannette M. Wing. 1990. Linearizability: A correctness condition for concurrent objects. ACM Trans. Prog. Lang. Syst. 12, 3 (1990). Google Scholar
Digital Library
- Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. 2010. ZooKeeper: Wait-free coordination for internet-scale systems. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’10). Google Scholar
Digital Library
- Henrik Ingo and Aishwarya Ganesan. 2020. Discussion with Henrik Ingo. Retrieved from https://www.openlife.cc/comment/662091#comment-662091.Google Scholar
- Jonathan Corbet. 2009. O_*SYNC. Retrieved from https://lwn.net/Articles/350219/.Google Scholar
- Karthik Ranganathan. 2019. Low latency reads in geo-distributed SQL with raft leader leases. Retrieved from https://blog.yugabyte.com/low-latency-reads-in-geo-distributed-sql-with-raft-leader-leases/.Google Scholar
- Leslie Lamport, Robert Shostak, and Marshall Pease. 1982. The Byzantine generals problem. ACM Trans. Prog. Lang. Syst. 4, 3 (1982), 382--401. Google Scholar
Digital Library
- Collin Lee, Seo Jin Park, Ankita Kejriwal, Satoshi Matsushita, and John Ousterhout. 2015. Implementing linearizability at large scale and low latency. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP’15). Google Scholar
Digital Library
- Barbara Liskov and James Cowling. 2012. Viewstamped Replication Revisited. Technical Report MIT-CSAIL-TR-2012-021, MIT CSAIL, 2012.Google Scholar
- Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. 2011. Don’t settle for eventual: Scalable causal consistency for wide-area storage with COPS. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP’11). Google Scholar
Digital Library
- LogCabin. 2016. Retrieved from https://github.com/logcabin/logcabin.Google Scholar
- Haonan Lu, Kaushik Veeraraghavan, Philippe Ajoux, Jim Hunt, Yee Jiun Song, Wendy Tobagus, Sanjeev Kumar, and Wyatt Lloyd. 2015. Existential consistency: Measuring and understanding consistency at Facebook. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP’15). Google Scholar
Digital Library
- Syed Akbar Mehdi, Cody Littley, Natacha Crooks, Lorenzo Alvisi, Nathan Bronson, and Wyatt Lloyd. 2017. I can’t believe it’s not causal! Scalable causal consistency with no slowdown cascades. In Proceedings of the 14th Symposium on Networked Systems Design and Implementation (NSDI’17). Google Scholar
Digital Library
- MongoDB. 2020. MongoDB Read Preference. Retrieved from https://docs.mongodb.com/manual/core/read-preference/.Google Scholar
- MongoDB. 2020. MongoDB Replication. Retrieved from https://docs.mongodb.org/manual/replication/.Google Scholar
- MongoDB. 2018. Non-Blocking Secondary Reads. Retrieved from https://www.mongodb.com/blog/post/mongodb-40-nonblocking-secondary-reads.Google Scholar
- MongoDB. 2019. Read Concern Linearizable. Retrieved from https://docs.mongodb.com/manual/reference/read-concern-linearizable/.Google Scholar
- Iulian Moraru, David G. Andersen, and Michael Kaminsky. 2013. There is more consensus in egalitarian parliaments. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). Google Scholar
Digital Library
- Seyed Hossein Mortazavi, Bharath Balasubramanian, Eyal de Lara, and Shankaranarayanan Puzhavakath Narayanan. 2018. Toward session consistency for the edge. In Proceedings of the USENIX Workshop on Hot Topics in Edge Computing (HotEdge’18).Google Scholar
- Edmund B. Nightingale, Kaushik Veeraraghavan, Peter M. Chen, and Jason Flinn. 2006. Rethink the sync. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI’06). Google Scholar
Digital Library
- Diego Ongaro. 2014. Consensus: Bridging Theory and Practice. PhD thesis. Stanford University, Stanford, CA.Google Scholar
- Diego Ongaro and John Ousterhout. 2014. In search of an understandable consensus algorithm. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’14). Google Scholar
Digital Library
- Dan R. K. Ports, Jialin Li, Vincent Liu, Naveen Kr. Sharma, and Arvind Krishnamurthy. 2015. Designing distributed systems using approximate synchrony in data center networks. In Proceedings of the 12th Symposium on Networked Systems Design and Implementation (NSDI’15). Google Scholar
Digital Library
- David Ratner, Peter Reiher, Gerald J. Popek, and Geoffrey H. Kuenning. 2001. Replication requirements in mobile environments. Mob. Netw. Applic. 6, 6 (2001), 525--533. Google Scholar
Digital Library
- Redis. Redis. Retrieved from http://redis.io/.Google Scholar
- Redis. Redis Persistence. Retrieved from https://redis.io/topics/persistence.Google Scholar
- Redis. Redis Replication. Retrieved from http://redis.io/topics/replication.Google Scholar
- Redis. Redis Sentinel Documentation. Retrieved from https://redis.io/topics/sentinel.Google Scholar
- Redis. Redis WAIT. Retrieved from https://redis.io/commands/wait.Google Scholar
- Redis. Scaling Reads. Retrieved from https://redislabs.com/ebook/part-3-next-steps/chapter-10-scaling-redis/10-1-scaling-reads/.Google Scholar
- Retwis. 2014. Retwis. Retrieved from https://github.com/antirez/retwis.Google Scholar
- Robert Ricci, Eric Eide, and CloudLab Team. 2014. Introducing CloudLab: Scientific infrastructure for advancing cloud architectures and applications. USENIX ;login: 39, 6 (2014).Google Scholar
- Doug Terry. 2013. Replicated data consistency explained through baseball. Commun. ACM 56, 12 (2013), 82--89. Google Scholar
Digital Library
- Douglas B. Terry, Alan J. Demers, Karin Petersen, Mike J. Spreitzer, Marvin M. Theimer, and Brent B. Welch. 1994. Session guarantees for weakly consistent replicated data. In Proceedings of the 3rd International Conference on on Parallel and Distributed Information Systems (PDIS’94). Google Scholar
Digital Library
- Paolo Viotti and Marko Vukolić. 2016. Consistency in non-transactional distributed storage systems. ACM Comput. Surv. 49, 1 (2016), 19:1--19:34. Google Scholar
Digital Library
- Benjamin Wester, James Cowling, Edmund B. Nightingale, Peter M. Chen, Jason Flinn, and Barbara Liskov. 2009. Tolerating latency in replicated state machines through client speculation. In Proceedings of the 6th Symposium on Networked Systems Design and Implementation (NSDI’09). Google Scholar
Digital Library
- Youjip Won, Jaemin Jung, Gyeongyeol Choi, Joontaek Oh, Seongbae Son, Jooyoung Hwang, and Sangyeun Cho. 2018. Barrier-enabled IO stack for flash storage. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST’18). Google Scholar
Digital Library
Index Terms
Strong and Efficient Consistency with Consistency-aware Durability
Recommendations
Strong and efficient consistency with consistency-aware durability
FAST'20: Proceedings of the 18th USENIX Conference on File and Storage TechnologiesWe introduce consistency-aware durability or CAD, a new approach to durability in distributed storage that enables strong consistency while delivering high performance. We demonstrate the efficacy of this approach by designing cross-client monotonic ...
Verifying strong eventual consistency in distributed systems
Data replication is used in distributed systems to maintain up-to-date copies of shared data across multiple computers in a network. However, despite decades of research, algorithms for achieving consistency in replicated systems are still poorly ...
Consistency-Aware Scheduling for Weakly Consistent Programs
Modern geo-replicated data stores provide high availability by relaxing the underlying consistency requirements. Programs layered over such data stores are called weakly consistent programs. Due to the reduced consistency requirements, they exhibit ...






Comments