skip to main content
research-article
Public Access

Building Consistent Transactions with Inconsistent Replication

Published:16 December 2018Publication History
Skip Abstract Section

Abstract

Application programmers increasingly prefer distributed storage systems with strong consistency and distributed transactions (e.g., Google’s Spanner) for their strong guarantees and ease of use. Unfortunately, existing transactional storage systems are expensive to use—in part, because they require costly replication protocols, like Paxos, for fault tolerance. In this article, we present a new approach that makes transactional storage systems more affordable: We eliminate consistency from the replication protocol, while still providing distributed transactions with strong consistency to applications.

We present the Transactional Application Protocol for Inconsistent Replication (TAPIR), the first transaction protocol to use a novel replication protocol, called inconsistent replication, that provides fault tolerance without consistency. By enforcing strong consistency only in the transaction protocol, TAPIR can commit transactions in a single round-trip and order distributed transactions without centralized coordination. We demonstrate the use of TAPIR in a transactional key-value store, TAPIR-KV. Compared to conventional systems, TAPIR-KV provides better latency and better throughput.

References

  1. Atul Adya, Robert Gruber, Barbara Liskov, and Umesh Maheshwari. 1995. Efficient optimistic concurrency control using loosely synchronized clocks. In Proceedings of the ACM International Conference on Management of Data (SIGMOD’95). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, and Christos Karamanolis. 2007. Sinfonia: A new paradigm for building scalable distributed systems. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Peter Bailis, Aaron Davidson, Alan Fekete, Ali Ghodsi, Joseph M. Hellerstein, and Ion Stoica. 2014. Highly available transactions: Virtues and limitations. In Proceedings of the Conference on Very Large Databases (VLDB’14).Google ScholarGoogle Scholar
  4. Jason Baker, Chris Bond, James Corbett, J. J. Furman, Andrey Khorlin, James Larson, Jean-Michel Léon, Yawei Li, Alexander Lloyd, and Vadim Yushprakh. 2011. Megastore: Providing scalable, highly available storage for interactive services. In Proceedings of the Conference on Innovative Data Systems Research (CIDR’11).Google ScholarGoogle Scholar
  5. Mahesh Balakrishnan, Dahlia Malkhi, Ted Wobber, Ming Wu, Vijayan Prabhakaran, Michael Wei, John D. Davis, Sriram Rao, Tao Zou, and Aviad Zuck. 2013. Tango: Distributed data structures over a shared log. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Elizabeth O’Neil, and Patrick O’Neil. 1995. A critique of ANSI SQL isolation levels. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Philip A. Bernstein, Vassos Hadzilacos, and Nathan Goodman. 1987. Concurrency Control and Recovery in Database Systems. Addison Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ken Birman and Thomas A. Joseph. 1987. Exploiting virtual synchrony in distributed systems. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’87). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Mike Burrows. 2006. The Chubby lock service for loosely coupled distributed systems. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI’06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2008. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst. 26, 2, Article 4 (June 2008), 26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Yanzhe Chen, Xinda Wei, Jiaxin Shi, Rong Chen, and Haibo Chen. 2016. Fast and general distributed transactions using RDMA and HTM. In Proceedings of the 11th ACM SIGOPS EuroSys (EuroSys’16). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Austin T. Clements, M. Frans Kaashoek, Nickolai Zeldovich, Robert T. Morris, and Eddie Kohler. 2013. The scalable commutativity rule: Designing scalable software for multicore processors. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, and Ramana Yerneni. 2008. PNUTS: Yahoo!’s hosted data serving platform. Proceedings of the Conference on Very Large Databases (VLDB’08).Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the ACM Symposium on Cloud Computing (SOCC’10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. James C. Corbett et al. 2012. Spanner: Google’s globally distributed database. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. James Cowling and Barbara Liskov. 2012. Granola: Low-overhead distributed transaction coordination. In Proceedings of the USENIX Annual Technical Conference (ATC’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. James Cowling, Daniel Myers, Barbara Liskov, Rodrigo Rodrigues, and Liuba Shrira. 2006. HQ replication: A hybrid quorum protocol for Byzantine fault tolerance. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI’06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon’s highly available key-value store. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Akon Dey, Alan Fekete, Raghunath Nambiar, and Uwe Rohm. 2014. YCSB+T: Benchmarking web-scale transactional databases. In Proceedings of the International Conference on Data Engineering Workshops (ICDEW’14).Google ScholarGoogle ScholarCross RefCross Ref
  20. Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson, and Miguel Castro. 2014. FaRM: Fast remote memory. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI’14). USENIX. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Aleksandar Dragojević, Dushyanth Narayanan, Edmund B. Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam, and Miguel Castro. 2015. No compromises: Distributed transactions with consistency, availability, and performance. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP’15). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jiaqing Du, Sameh Elnikety, and Willy Zwaenepoel. 2013. Clock-SI: Snapshot isolation for partitioned data stores using loosely synchronized clocks. In Proceedings of the 32nd IEEE Symposium on Reliable Distributed Systems (SRDS’13). IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Robert Escriva, Bernard Wong, and Emin Gun Sirer. 2013. Warp: Multi-Key Transactions for Key-Value Stores. Technical Report. Cornell.Google ScholarGoogle Scholar
  24. Michael J. Fischer, Nancy A. Lynch, and Michael S. Patterson. 1985. Impossibility of distributed consensus with one faulty process. J. ACM 32, 2 (Apr. 1985), 374--382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. David K. Gifford. 1979. Weighted voting for replicated data. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI’79). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jim Gray and Leslie Lamport. 2006. Consensus on transaction commit. ACM Trans. Database Syst. 31, 1 (Mar. 2006), 133--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Patrick Hunt, Mahadev Konar, Flavio Paiva Junqueira, and Benjamin Reed. 2010. ZooKeeper: Wait-free coordination for internet-scale systems. In Proceedings of the USENIX Annual Technical Conference (ATC’10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Flavio Junqueira, Yanhua Mao, and Keith Marzullo. 2007. Classic Paxos vs Fast Paxos: Caveat emptor. In Proceedings of the 3rd Workshop on Hot Topics in System Dependability (HotDep’07). USENIX. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, and Daniel Lewin. 1997. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the ACM Symposium on Theory of Computing (STOC’97). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Tim Kraska, Gene Pang, Michael J. Franklin, Samuel Madden, and Alan Fekete. 2013. MDCC: Multi-data center consistency. In Proceedings of the ACM SIGOPS EuroSys (EuroSys’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Hsiang-Tsung Kung and John T. Robinson. 1981. On optimistic methods for concurrency control. ACM Trans. Database Syst. 6, 2 (June 1981), 213--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Rivka Ladin, Barbara Liskov, Liuba Shrira, and Sanjay Ghemawat. 1992. Providing high availability using lazy replication. ACM Trans. Comput. Syst. 10, 4 (Nov. 1992), 360--391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Avinash Lakshman and Prashant Malik. 2010. Cassandra: A decentralized structured storage system. ACM SIGOPS Operat. Syst. Rev. 44, 2 (Apr. 2010), 35--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Leslie Lamport. 1994. ACM Trans. Prog. Lang. Syst. 16, 3 (May 1994), 872--923. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Leslie Lamport. 2001. Paxos made simple. ACM SIGACT News 32, 4 (Dec. 2001), 51--58.Google ScholarGoogle Scholar
  36. Leslie Lamport. 2005. Generalized Consensus and Paxos. Technical Report 2005-33. Microsoft Research.Google ScholarGoogle Scholar
  37. Leslie Lamport. 2006a. Fast Paxos. Distrib. Comput. 19, 2 (2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Leslie Lamport. 2006b. Lower bounds for asynchronous consensus. Distrib. Comput. 19, 2 (Oct. 2006), 104--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Costin Leau. 2013. Spring Data Redis--Retwis-J. Retrieved from http://docs.spring.io/spring-data/data-keyvalue/examples/retwisj/current/.Google ScholarGoogle Scholar
  40. Jialin Li, Ellis Michael, and Dan R. K. Ports. 2017. Eris: Coordination-free consistent transactions using network multi-sequencing. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP’17). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jialin Li, Ellis Michael, Adriana Szekeres, Naveen Kr. Sharma, and Dan R. K. Ports. 2016. Just say no to Paxos overhead: Replacing consensus with network ordering. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16). USENIX. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Barbara Liskov, Miguel Castro, Liuba Shrira, and Atul Adya. 1999. Providing persistent objects in distributed systems. In Proceedings of the European Conference on Object-Oriented Programming (ECOOP’99). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Barbara Liskov and James Cowling. 2012. Viewstamped replication revisited. Technical report MIT-CSAIL-TR-2012-021. MIT.Google ScholarGoogle Scholar
  44. Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. 2011. Don’t settle for eventual: Scalable causal consistency for wide-area storage with COPS. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Hatem Mahmoud, Faisal Nawab, Alexander Pucher, Divyakant Agrawal, and Amr El Abbadi. 2013. Low-latency multi-datacenter databases using replicated commit. Proceedings of the Conference on Very Large Databases (VLDB’13).Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Dahlia Malkhi and Michael Reiter. 1998. Byzantine quorum systems. Distrib. Comput. 11 (1998), 203--213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. MongoDB. 2013. MongoDB: A open-source document database. Retrieved from http://www.mongodb.org/.Google ScholarGoogle Scholar
  48. Iulian Moraru, David G. Andersen, and Michael Kaminsky. 2013. There is more consensus in Egalitarian parliaments. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Shuai Mu, Yang Cui, Yang Zhang, Wyatt Lloyd, and Jinyang Li. 2014. Extracting more concurrency from distributed transactions. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Shuai Mu, Lamont Nelson, Wyatt Lloyd, and Jinyang Li. 2016. Consolidating concurrency control and consensus for commits under conflicts. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16). USENIX. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Brian M. Oki and Barbara H. Liskov. 1988. Viewstamped replication: A new primary copy method to support highly available distributed systems. In Proceedings of the ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC’88). Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Dan R. K. Ports, Jialin Li, Vincent Liu, Naveen Kr. Sharma, and Arvind Krishnamurthy. 2015. Designing distributed systems using approximate synchrony in data center networks. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Redis. 2013. Redis: Open Source Data Structure Server. Retrieved from http://redis.io/.Google ScholarGoogle Scholar
  54. Yasushi Saito and Marc Shapiro. 2005. Optimistic replication. Comput. Surveys 37, 1 (Mar. 2005), 42--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Salvatore Sanfilippo. 2013. WAIT: Synchronous replication for Redis. Retrieved from http://antirez.com/news/66.Google ScholarGoogle Scholar
  56. Yee Jiun Song and Robbert van Renesse. 2008. Bosco: One-step Byzantine asynchronous consensus. In Proceedings of the International Symposium on Distributed Computing (DISC’08).Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Yair Sovran, Russell Power, Marcos K. Aguilera, and Jinyang Li. 2011. Transactional storage for geo-replicated systems. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Douglas B. Terry, Marvin M. Theimer, Karin Petersen, Alan J. Demers, Mike J. Spreitzer, and Carl H. Hauser. 1995. Managing update conflicts in Bayou, a weakly connected replicated storage system. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’95). Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Robert H. Thomas. 1979. A majority consensus approach to concurrency control for multiple copy databases. ACM Trans. Database Syst. 4, 2 (June 1979), 180--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Xingda Wei, Jiaxin Shi, Yanzhe Chen, Rong Chen, and Haibo Chen. 2015. Fast in-memory transaction processing using RDMA and HTM. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP’15). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan R. K. Ports. 2015a. Building consistent transactions with inconsistent replication. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan R. K. Ports. 2015b. Building Consistent Transactions with Inconsistent Replication (extended version). Technical Report 2014-12-01 v2. University of Washington. Retrieved from http://syslab.cs.washington.edu/papers/tapir-tr-v2.pdf.Google ScholarGoogle Scholar
  63. Yang Zhang, Russell Power, Siyuan Zhou, Yair Sovran, Marcos K. Aguilera, and Jinyang Li. 2013. Transaction chains: Achieving serializability with low latency in geo-distributed storage systems. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’13). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Building Consistent Transactions with Inconsistent Replication

        Recommendations

        Reviews

        George R. Mayforth

        Large-scale transaction processing systems must balance the need for minimal response times with maintaining the integrity of the data being processed. In theory, transactions are processed sequentially in the order received. Very large systems exceed the capacity of even a supercomputer; therefore, the data processing must be divided among multiple systems. Consider the classic example: an airline reservation system. Multiple agents globally enter hundreds of thousands (or millions) of transactions daily, linking customers to available seats on specific flights. The traditional way to handle such a huge transaction load takes two forms. First, the total database is divided into "shards," that is, segments that are parts of the whole but can be treated independently. Within each shard there may be replicas, that is, multiple instances of the same data in that shard. The redundancy ensures the integrity of the data. In this complex system there remains the requirement of apparent serial processing, that is, transactions handled in chronological sequence. This implies coordination and synchronization among the replicas in a shard because their data cannot be treated independently. A strict synchronization protocol further implies fairly large overhead as replicas communicate with each other to maintain the integrity of the process. The paper proposes to reduce processing overhead by means of an "inconsistent" replication protocol that permits replicas to process transactions in different sequences. Maintaining the correct sequence of transaction processing is accomplished by pushing the maintenance of serial integrity up to the transaction processing and application layers. This reduces the replication overhead significantly. The authors provide a detailed description of the inconsistent replication (IR) protocol and associated transaction processing system, transactional application protocol for inconsistent replication (TAPIR), including handling failures and offering proofs of correctness regarding transaction consistency. They compare the proposed system to multiple existing systems, including experimental results running TAPIR-IR plus two similar systems built using traditional protocols. The experiments involved transaction processing and data replication in the US, Europe, and Asia. The experimental results show increased throughput by up to a factor of three times versus the two conventional systems. Given the potential performance increase, real-world testing is needed to verify the experimental results. The text is clearly written, supplemented with useful and readable charts. Many comparisons to earlier systems and a large set of references are provided. This paper should be of interest to those who design and implement large-scale transaction processing systems.

        Access critical reviews of Computing literature here

        Become a reviewer for Computing Reviews.

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!