Abstract
As raw system performance continues to improve at exponential rates, the utility of many services is increasingly limited by availability rather than performance. A key approach to improving availability involves replicating the service across multiple, wide-area sites. However, replication introduces well-known trade-offs between service consistency and availability. Thus, this article explores the benefits of dynamically trading consistency for availability using a continuous consistency model. In this model, applications specify a maximum deviation from strong consistency on a per-replica basis. In this article, we: i) evaluate the availability of a prototype replication system running across the Internet as a function of consistency level, consistency protocol, and failure characteristics, ii) demonstrate that simple optimizations to existing consistency protocols result in significant availability improvements (more than an order of magnitude in some scenarios), iii) use our experience with these optimizations to prove tight upper bound on the availability of services, and iv) show that maximizing availability typically entails remaining as close to strong consistency as possible during times of good connectivity, resulting in a communication versus availability trade-off.
- Adya, A., Liskov, B., and O'Neil, P. 2000. Generalized isolation level definitions. In Proceedings of the IEEE International Conference on Data Engineering.]] Google Scholar
- Amir, Y. and Wool, A. 1996. Evaluating quorum systems over the Internet. In Proceedings of the Annual International Symposium on Fault-Tolerant Computing.]] Google Scholar
- Amir, Y. and Wool, A. 1998. Optimal availability quorum systems: Theory and practice. Inform. Proces. Letters, 223--228.]] Google Scholar
- Andersen, D., Balakrishnan, H., Kaashoek, F., and Morris, R. 2001. Resilient overlay networks. In Proceedings of the 18th Symposium on Operating Systems Principles.]] Google Scholar
- Andersen, D. G., Balakrishnan, H., Kaashoek, M. F., and Rao, R. 2005. Improving Web availability for clients with MONET. In Proceedings of the Symposium on Networked Systems Design and Implementation.]] Google Scholar
- Baker, M., Hartman, J., Kupfer, M., Shirriff, K., and Ousterhout, J. 1991. Measurements of a distributed file system. In Proceedings of the 13th ACM Symposium on Operating Systems Principles. 198--212.]] Google Scholar
- Barbara, D. and Garcia-Molina, H. 1986. The vulnerability of vote assignments. ACM Trans. Comput. Syst.]] Google Scholar
- Barbara, D. and Garcia-Molina, H. 1987. The reliability of voting mechanisms. IEEE Trans. Comput. 36, 10 (Oct.), 1197--1208.]] Google Scholar
- Bernstein, P. A., Hadzilacos, V., and Goodman, N. 1987. Concurrency Control and Recovery in Database Systems. Addison-Wesley.]] Google Scholar
- Brown, A. and Patterson, D. 2000. Towards maintainability, availability, and growth benchmarks: A case study of software RAID systems. In Proceedings of the 2000 USENIX Annual Technical Conference.]] Google Scholar
- Candea, G., Kawamoto, S., Fujiki, Y., Friedman, G., and Fox, A. 2004. Microreboot---A technique for cheap recovery. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation.]] Google Scholar
- Cetintemel, U., Keleher, P., and Franklin, M. 2001. Support for speculative update propagation and mobility in Deno. In Proceedings of the 21st IEEE International Conference on Distributed Computing Systems.]] Google Scholar
- Coan, B., Oki, B., and Kolodner, E. 1986. Limitations on database availability when networks partition. In Proceedings of the 5th ACM Symposium on Principle of Distributed Computing. 187--194.]] Google Scholar
- Cormen, T., Leiserson, C., and Rivest, R. 1990. Introduction to Algorithms. The MIT Press.]] Google Scholar
- Czyzyk, J., Mehrotra, S., Wagner, M., and Wright, S. PCx: Software for linear programming. Available at: http://www-fp.mcs.anl.gov/otc/Tools/PCx/.]]Google Scholar
- Dahlin, M., Chandra, B., Gao, L., and Nayate, A. 2003. End-to-end WAN service availability. ACM/IEEE Trans. Network. 11, 2 (April).]] Google Scholar
- Diks, K., Kranakis, E., Krizanc, D., Mans, B., and Pelc, A. 1994. Optimal coteries and voting schemes. Inform. Proc. Letters, 1--6.]] Google Scholar
- Douceur, J. R. and Wattenhofer, R. P. 2001. Competitive hill-climbing strategies for replica placement in a distributed file system. In Proceedings of the 15th International Symposium on Distributed Computing (DISC). 48--62.]] Google Scholar
- Faloutsos, M., Faloutsos, P., and Faloutsos, C. 1999. On power-law relationships of the Internet topology. In SIGCOMM.]] Google Scholar
- Fox, A. and Brewer, E. 1999. Harvest, yield, and scalable tolerant systems. In Proceedings of HotOS-VII.]] Google Scholar
- Fox, A., Gribble, S., Chawathe, Y., and Brewer, E. 1997. Cluster-based scalable network services. In Proceedings of the 16th ACM Symposium on Operating Systems Principles. Saint-Malo, France.]] Google Scholar
- Garcia-Molina, H. and Barbara, D. 1984. Optimizing the reliability provided by voting mechanisms. In Proceedings of the 4th International Conference on Distributed Computing Systems.]]Google Scholar
- Golding, R. 1992. A weak-consistency architecture for distributed information services. Comput. Syst. 5, 4 (Fall), 379--405.]]Google Scholar
- Gray, J., Helland, P., O'Neil, P., and Shasha, D. 1996. The dangers of replication and a solution. In Proceedings of the ACM SIGMOD International Conference on Management of Data.]] Google Scholar
- Gummadi, K. P., Madhyastha, H. V., Gribble, S. D., Levy, H. M., and Wetherall, D. 2004. Improving the reliability of Internet paths with one-hop source routing. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation.]] Google Scholar
- Hennessy, J. 1999. The future of systems research. IEEE Comput. 32, 8 (Aug.), 27--33.]] Google Scholar
- Johnson, D. B. and Raab, L. 1991a. A tight upper bound on the benefits of replication and consistency control protocols. In Proceedings of the 10th ACM Symposium on Principles of Database Systems.]] Google Scholar
- Johnson, D. B. and Raab, L. 1991b. Effects of replication on data availability. Int. J. Comput. Simul. 1, 4.]]Google Scholar
- Keleher, P. 1999. Decentralized replicated-object protocols. In Proceedings of the 18th Annual ACM Symposium on Principles of Distributed Computing.]] Google Scholar
- Kistler, J. J. and Satyanarayanan, M. 1992. Disconnected operation in the coda file system. ACM Trans. Comput. Syst. 10, 1 (Feb.), 3--25.]] Google Scholar
- Krishnakumar, N. and Bernstein, A. 1994. Bounded ignorance: A technique for increasing concurrency in a replicated system. ACM Trans. Datab. Syst. 19, 4 (Dec).]] Google Scholar
- Kumar, A. and Segev, A. 1993. Cost and availability trade-offs in replicated data concurrency control. ACM Trans. Datab. Syst.]] Google Scholar
- Ladin, R., Liskov, B., Shirira, L., and Ghemawat, S. 1992. Providing availability using lazy replication. ACM Trans. Comput. Syst. 10, 4, 360--391.]] Google Scholar
- Lamport, L. 1978. Time, clocks, and the ordering of events in a distributed system. Comm. ACM 21, 7 (July), 558--565.]] Google Scholar
- Lampson, B. 1996. How to build a highly available system using consensus. In Distributed Algorithms, Lecture Notes in Computer Science Vol. 1151. Springer.]] Google Scholar
- Mummert, L. 1996. Exploiting weak connectivity in a distributed file system. Ph.D. thesis, Carnegie Mellon University.]] Google Scholar
- Noble, B., Fleis, B., and Kim, M. 1999. A Case for Fluid Replication. In Proceedings of the 1999 Network Storage Symposium (Netstore).]]Google Scholar
- Noble, B., Satyananarayanan, M., Nguyen, G., and Katz, R. 1997. Trace-based mobile network emulation. In Proceedings of SIGCOMM.]] Google Scholar
- Page, T., Guy, R., Heidemann, J., Ratner, D., Goel, A., Kuenning, G., and Popek, G. 1998. Perspectives on optimistically replicated peer-to-peer filing. Softw. Practice Exper. 28, 2 (Feb.), 155-- 180.]] Google Scholar
- Pai, V. S., Aron, M., Banga, G., Svendsen, M., Druschel, P., Zwaenepoel, W., and Nahum, E. 1998. Locality-aware request distribution in cluster-based network servers. In 8th International Conference on Architectural Support for Programming Languages and Operating Systems.]] Google Scholar
- Paxson, V. 1996. end-to-end routing behavior in the Internet. In Proceedings of the ACM SIGCOMM'96 Conference on Communications Architectures and Protocols.]] Google Scholar
- Peleg, D. and Wool, A. 1995. The availability of quorum systems. Inform. Computat. 123, 2 (Dec.), 210--223.]] Google Scholar
- Petersen, K., Spreitzer, M., Terry, D., Theimer, M., and Demers, A. 1997. Flexible update propagation for weakly consistent replication. In Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP-16).]] Google Scholar
- Pu, C. and Leff, A. 1991. Replication control in distributed system: An asynchronous approach. In Proceedings of the ACM SIGMOD Conference on Management of Data.]] Google Scholar
- Rosenthal, A. 1977. Computing the reliability of a complex network. SIAM J. App. Math. 32, 384--393.]]Google Scholar
- Saito, Y., Bershad, B., and Levy, H. 1999. Manageability, availability and performance in porcupine: A highly scalable Internet mail service. In Proceedings of the 17th ACM Symposium on Operating Systems Principles.]] Google Scholar
- Savage, S., Collins, A., Hoffman, E., Snell, J., and Anderson, T. 1999. The end-to-end effects of Internet path selection. In SIGCOMM.]] Google Scholar
- Singla, A., Ramachandran, U., and Hodgins, J. 1997. Temporal notions of synchronization and consistency in Beehive. In Proceedings of the 9th ACM Symposium on Parallel Algorithms and Architectures.]] Google Scholar
- Spasojevic, M. and Berman, P. 1994. Voting as the optimal static pessimistic scheme for managing replicated data. IEEE Trans. Parall. Distrib. Syst. 64--73.]] Google Scholar
- Swift, M. M., Annamalai, M., Bershad, B. N., and Levy, H. M. 2004a. Recovering device drivers. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation.]] Google Scholar
- Swift, M. M., Bershad, B. N., and Levy, H. M. 2004b. Improving the reliability of commodity operating systems. ACM Trans. Comput. Syst. 22, 4 (Nov.).]] Google Scholar
- Terry, D. B., Theimer, M. M., Petersen, K., Demers, A. J., Spreitzer, M. J., and Hauser, C. H. 1995. Managing update conflicts in Bayou, a weakly connected replicated storage system. In Proceedings of the 15th ACM Symposium on Operating Systems Principles.]] Google Scholar
- Thomas, R. H. 1979. A majority consensus approach to concurrency control for multiple copy databases. ACM Trans. Datab. Syst. 4, 2 (June), 180--209.]] Google Scholar
- Tong, Z. and Kain, R. Y. 1988. Vote assignments in weighted voting mechanisms. In Proceedings of the 7th IEEE Symposium on Reliable Distributed Systems. 138--143.]]Google Scholar
- Torres-Rojas, F., Ahamad, M., and Raynal, M. 1999. Timed consistency for shared distributed objects. In Proceedings of the 18th ACM Symposium on Principle of Distributed Computing.]] Google Scholar
- Yu, H. and Vahdat, A. 2000. Efficient numerical error bounding for replicated network services. In Proceedings of the 26th International Conference on Very Large Databases (VLDB).]] Google Scholar
- Yu, H. and Vahdat, A. 2001. Combining generality and practicality in a conit-based continuous consistency model for wide-area replication. In Proceedings of the 21st International Conference on Distributed Computing Systems (ICDCS).]] Google Scholar
- Yu, H. and Vahdat, A. 2002. Design and evaluation of a conit-based continuous consistency model. ACM Trans. Comput. Syst.]] Google Scholar
- Zegura, E. W., Calvert, K., and Donahoo, M. J. 1997. A quantitative comparison of graph-based models for Internet topology. IEEE/ACM Trans. Network. 5, 6 (Dec.).]] Google Scholar
Index Terms
The costs and limits of availability for replicated services
Recommendations
The costs and limits of availability for replicated services
SOSP '01: Proceedings of the eighteenth ACM symposium on Operating systems principlesAs raw system and network performance continues to improve at exponential rates, the utility of many services is increasingly limited by availability rather than performance. A key approach to improving availability involves replicating the service ...
Design and evaluation of a conit-based continuous consistency model for replicated services
The tradeoffs between consistency, performance, and availability are well understood. Traditionally, however, designers of replicated systems have been forced to choose from either strong consistency guarantees or none at all. This paper explores the ...
Building Replicated Internet Services Using TACT: A Toolkit for Tunable Availability and Consistency Tradeoffs
WECWIS '00: Proceedings of the Second International Workshop on Advance Issues of E-Commerce and Web-Based Information Systems (WECWIS 2000)An ultimate goal for modern Internet services is the development of scalable, high-performance, highly available and fault-tolerant systems. Replication is an important approach to achieve this goal. However, replication introduces the issue of ...








Comments