skip to main content
research-article
Public Access

Strong and Efficient Consistency with Consistency-aware Durability

Published:18 January 2021Publication History
Skip Abstract Section

Abstract

We introduce consistency-aware durability or Cad, a new approach to durability in distributed storage that enables strong consistency while delivering high performance. We demonstrate the efficacy of this approach by designing cross-client monotonic reads, a novel and strong consistency property that provides monotonic reads across failures and sessions in leader-based systems; such a property can be particularly beneficial in geo-distributed and edge-computing scenarios. We build Orca, a modified version of ZooKeeper that implements Cad and cross-client monotonic reads. We experimentally show that Orca provides strong consistency while closely matching the performance of weakly consistent ZooKeeper. Compared to strongly consistent ZooKeeper, Orca provides significantly higher throughput (1.8--3.3×) and notably reduces latency, sometimes by an order of magnitude in geo-distributed settings. We also implement Cad in Redis and show that the performance benefits are similar to that of Cad’s implementation in ZooKeeper.

References

  1. Ramnatthan Alagappan, Aishwarya Ganesan, Jing Liu, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2018. Fault-tolerance, fast and slow: Exploiting failure asynchrony in distributed systems. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (OSDI’18). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ramnatthan Alagappan, Aishwarya Ganesan, Yuvraj Patel, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. Correlated crash vulnerabilities. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Apache. ZooKeeper. Retrieved from https://zookeeper.apache.org/.Google ScholarGoogle Scholar
  4. Apache. ZooKeeper Configuration Parameters. Retrieved from https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_configuration.Google ScholarGoogle Scholar
  5. Apache. ZooKeeper Guarantees, Properties, and Definitions. Retrieved from https://zookeeper.apache.org/doc/r3.2.2/zookeeperInternals.html#sc_guaranteesPropertiesDefinitions.Google ScholarGoogle Scholar
  6. Apache. ZooKeeper Leader Activation. Retrieved from https://zookeeper.apache.org/doc/r3.2.2/zookeeperInternals.html#sc_leaderElection.Google ScholarGoogle Scholar
  7. Apache. ZooKeeper Overview. Retrieved from https://zookeeper.apache.org/doc/r3.5.1-alpha/zookeeperOver.html.Google ScholarGoogle Scholar
  8. Apache ZooKeeper. ZooKeeper Consistency Guarantees. Retrieved from https://zookeeper.apache.org/doc/r3.3.3/zookeeperProgrammers.html#ch_zkGuarantees.Google ScholarGoogle Scholar
  9. Apache ZooKeeper. ZooKeeper Programmer’s Guide - ZooKeeper Stat Structure. Retrieved from https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#sc_zkStatStructure.Google ScholarGoogle Scholar
  10. Peter Bailis, Ali Ghodsi, Joseph M. Hellerstein, and Ion Stoica. 2013. Bolt-on causal consistency. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. William J. Bolosky, Dexter Bradshaw, Randolph B. Haagens, Norbert P. Kusters, and Peng Li. 2011. Paxos replicated state machines as the basis of a high-performance data store. In Proceedings of the 8th Symposium on Networked Systems Design and Implementation (NSDI’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Randal C. Burns, Robert M. Rees, and Darrell D. E. Long. 2001. An analytical study of opportunistic lease renewal. In Proceedings of the International Conference on Distributed Computing Systems (ICDCS’01). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Vijay Chidambaram, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2013. Optimistic crash consistency. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. James Cipar, Greg Ganger, Kimberly Keeton, Charles B. Morrey III, Craig A. N. Soules, and Alistair Veitch. 2012. LazyBase: Trading freshness for performance in a scalable database. In Proceedings of the EuroSys Conference (EuroSys’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the ACM Symposium on Cloud Computing (SOCC’10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Alan Demers, Dan Greene, Carl Hauser, Wes Irish, John Larson, Scott Shenker, Howard Sturgis, Dan Swinehart, and Doug Terry. 1987. Epidemic algorithms for replicated database maintenance. In Proceedings of the 26th ACM Symposium on Principles of Distributed Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Aishwarya Ganesan, Ramnatthan Alagappan, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2020. Strong and efficient consistency with consistency-aware durability. In Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST’20).Google ScholarGoogle Scholar
  18. Yilong Geng, Shiyu Liu, Zi Yin, Ashish Naik, Balaji Prabhakar, Mendel Rosenblum, and Amin Vahdat. 2018. Exploiting a natural network effect for scalable, fine-grained clock synchronization. In Proceedings of the 15th Symposium on Networked Systems Design and Implementation (NSDI’18). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Cary G. Gray and David Cheriton. 1989. Leases: An efficient fault-tolerant mechanism for distributed file cache consistency. In Proceedings of the 12th ACM Symposium on Operating Systems Principles (SOSP’89). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Rachid Guerraoui, Matej Pavlovic, and Dragos-Adrian Seredinschi. 2016. Incremental consistency guarantees for replicated objects. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Maurice P. Herlihy and Jeannette M. Wing. 1990. Linearizability: A correctness condition for concurrent objects. ACM Trans. Prog. Lang. Syst. 12, 3 (1990). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. 2010. ZooKeeper: Wait-free coordination for internet-scale systems. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Henrik Ingo and Aishwarya Ganesan. 2020. Discussion with Henrik Ingo. Retrieved from https://www.openlife.cc/comment/662091#comment-662091.Google ScholarGoogle Scholar
  24. Jonathan Corbet. 2009. O_*SYNC. Retrieved from https://lwn.net/Articles/350219/.Google ScholarGoogle Scholar
  25. Karthik Ranganathan. 2019. Low latency reads in geo-distributed SQL with raft leader leases. Retrieved from https://blog.yugabyte.com/low-latency-reads-in-geo-distributed-sql-with-raft-leader-leases/.Google ScholarGoogle Scholar
  26. Leslie Lamport, Robert Shostak, and Marshall Pease. 1982. The Byzantine generals problem. ACM Trans. Prog. Lang. Syst. 4, 3 (1982), 382--401. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Collin Lee, Seo Jin Park, Ankita Kejriwal, Satoshi Matsushita, and John Ousterhout. 2015. Implementing linearizability at large scale and low latency. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Barbara Liskov and James Cowling. 2012. Viewstamped Replication Revisited. Technical Report MIT-CSAIL-TR-2012-021, MIT CSAIL, 2012.Google ScholarGoogle Scholar
  29. Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. 2011. Don’t settle for eventual: Scalable causal consistency for wide-area storage with COPS. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. LogCabin. 2016. Retrieved from https://github.com/logcabin/logcabin.Google ScholarGoogle Scholar
  31. Haonan Lu, Kaushik Veeraraghavan, Philippe Ajoux, Jim Hunt, Yee Jiun Song, Wendy Tobagus, Sanjeev Kumar, and Wyatt Lloyd. 2015. Existential consistency: Measuring and understanding consistency at Facebook. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Syed Akbar Mehdi, Cody Littley, Natacha Crooks, Lorenzo Alvisi, Nathan Bronson, and Wyatt Lloyd. 2017. I can’t believe it’s not causal! Scalable causal consistency with no slowdown cascades. In Proceedings of the 14th Symposium on Networked Systems Design and Implementation (NSDI’17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. MongoDB. 2020. MongoDB Read Preference. Retrieved from https://docs.mongodb.com/manual/core/read-preference/.Google ScholarGoogle Scholar
  34. MongoDB. 2020. MongoDB Replication. Retrieved from https://docs.mongodb.org/manual/replication/.Google ScholarGoogle Scholar
  35. MongoDB. 2018. Non-Blocking Secondary Reads. Retrieved from https://www.mongodb.com/blog/post/mongodb-40-nonblocking-secondary-reads.Google ScholarGoogle Scholar
  36. MongoDB. 2019. Read Concern Linearizable. Retrieved from https://docs.mongodb.com/manual/reference/read-concern-linearizable/.Google ScholarGoogle Scholar
  37. Iulian Moraru, David G. Andersen, and Michael Kaminsky. 2013. There is more consensus in egalitarian parliaments. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Seyed Hossein Mortazavi, Bharath Balasubramanian, Eyal de Lara, and Shankaranarayanan Puzhavakath Narayanan. 2018. Toward session consistency for the edge. In Proceedings of the USENIX Workshop on Hot Topics in Edge Computing (HotEdge’18).Google ScholarGoogle Scholar
  39. Edmund B. Nightingale, Kaushik Veeraraghavan, Peter M. Chen, and Jason Flinn. 2006. Rethink the sync. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI’06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Diego Ongaro. 2014. Consensus: Bridging Theory and Practice. PhD thesis. Stanford University, Stanford, CA.Google ScholarGoogle Scholar
  41. Diego Ongaro and John Ousterhout. 2014. In search of an understandable consensus algorithm. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Dan R. K. Ports, Jialin Li, Vincent Liu, Naveen Kr. Sharma, and Arvind Krishnamurthy. 2015. Designing distributed systems using approximate synchrony in data center networks. In Proceedings of the 12th Symposium on Networked Systems Design and Implementation (NSDI’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. David Ratner, Peter Reiher, Gerald J. Popek, and Geoffrey H. Kuenning. 2001. Replication requirements in mobile environments. Mob. Netw. Applic. 6, 6 (2001), 525--533. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Redis. Redis. Retrieved from http://redis.io/.Google ScholarGoogle Scholar
  45. Redis. Redis Persistence. Retrieved from https://redis.io/topics/persistence.Google ScholarGoogle Scholar
  46. Redis. Redis Replication. Retrieved from http://redis.io/topics/replication.Google ScholarGoogle Scholar
  47. Redis. Redis Sentinel Documentation. Retrieved from https://redis.io/topics/sentinel.Google ScholarGoogle Scholar
  48. Redis. Redis WAIT. Retrieved from https://redis.io/commands/wait.Google ScholarGoogle Scholar
  49. Redis. Scaling Reads. Retrieved from https://redislabs.com/ebook/part-3-next-steps/chapter-10-scaling-redis/10-1-scaling-reads/.Google ScholarGoogle Scholar
  50. Retwis. 2014. Retwis. Retrieved from https://github.com/antirez/retwis.Google ScholarGoogle Scholar
  51. Robert Ricci, Eric Eide, and CloudLab Team. 2014. Introducing CloudLab: Scientific infrastructure for advancing cloud architectures and applications. USENIX ;login: 39, 6 (2014).Google ScholarGoogle Scholar
  52. Doug Terry. 2013. Replicated data consistency explained through baseball. Commun. ACM 56, 12 (2013), 82--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Douglas B. Terry, Alan J. Demers, Karin Petersen, Mike J. Spreitzer, Marvin M. Theimer, and Brent B. Welch. 1994. Session guarantees for weakly consistent replicated data. In Proceedings of the 3rd International Conference on on Parallel and Distributed Information Systems (PDIS’94). Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Paolo Viotti and Marko Vukolić. 2016. Consistency in non-transactional distributed storage systems. ACM Comput. Surv. 49, 1 (2016), 19:1--19:34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Benjamin Wester, James Cowling, Edmund B. Nightingale, Peter M. Chen, Jason Flinn, and Barbara Liskov. 2009. Tolerating latency in replicated state machines through client speculation. In Proceedings of the 6th Symposium on Networked Systems Design and Implementation (NSDI’09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Youjip Won, Jaemin Jung, Gyeongyeol Choi, Joontaek Oh, Seongbae Son, Jooyoung Hwang, and Sangyeun Cho. 2018. Barrier-enabled IO stack for flash storage. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST’18). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Strong and Efficient Consistency with Consistency-aware Durability

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Storage
              ACM Transactions on Storage  Volume 17, Issue 1
              Special Section on Usenix Fast 2020
              February 2021
              165 pages
              ISSN:1553-3077
              EISSN:1553-3093
              DOI:10.1145/3446939
              • Editor:
              • Sam H. Noh
              Issue’s Table of Contents

              Copyright © 2021 ACM

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 18 January 2021
              • Accepted: 1 September 2020
              • Received: 1 June 2020
              Published in tos Volume 17, Issue 1

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!