Abstract
The task of consistently and reliably replicating data is fundamental in distributed systems, and numerous existing protocols are able to achieve such replication efficiently. When called on to build a large-scale enterprise storage system with built-in replication, we were therefore surprised to discover that no existing protocols met our requirements. As a result, we designed and deployed a new replication protocol called Niobe. Niobe is in the primary-backup family of protocols, and shares many similarities with other protocols in this family. But we believe Niobe is significantly more practical for large-scale enterprise storage than previously published protocols. In particular, Niobe is simple, flexible, has rigorously proven yet simply stated consistency guarantees, and exhibits excellent performance. Niobe has been deployed as the backend for a commercial Internet service; its consistency properties have been proved formally from first principles, and further verified using the TLA + specification language. We describe the protocol itself, the system built to deploy it, and some of our experiences in doing so.
- Aguilera, M. and Frølund, S. 2003. Strict linearizability and the power of aborting. Tech. Rep. 2003-241, Hewlett-Packard Laboratories.Google Scholar
- Alsberg, P. and Day, J. 1976. A principle for resilient sharing of distributed resources. In Proceedings of the 2nd International Conference on Software Engineering, 627--644. Google Scholar
Digital Library
- Barroso, L. A., Dean, J., and Holzle, U. 2003. Web search for a planet: The Google cluster architecture. IEEE Micro. 23, 2, 22--28. Google Scholar
Digital Library
- Budhiraja, N., Marzullo, K., Schneider, F. B., and Toueg, S. 1993. The primary-backup approach. In Distributed Systems. ACM Press/Addison-Wesley. Google Scholar
Digital Library
- Burrows, M. 2006. The Chubby lock service for loosely-coupled distributed systems. In Proceedings of the Symposium on Operating System Design and Implementation (OSDI), 335--50. Google Scholar
Digital Library
- Chang, F., Ji, M., Leung, S., MacCormick, J., Perl, S., and Zhang, L. 2002. Myriad: Cost-Effective disaster tolerance. In Proceedings of the Conference on File and Storage Technologies (FAST). Google Scholar
Digital Library
- Dolev, D., Keidar, I., and Lotem, E. Y. 1997. Dynamic voting for consistent primary components. In Proceedings of the 16th ACM Symposium on Principles of Distributed Computing (PODC). 63--71. Google Scholar
Digital Library
- Ghemawat, S., Gobioff, H., and Leung, S.-T. 2003. The Google file system. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP). Google Scholar
Digital Library
- Gray, C. and Cheriton, D. 1989. Leases: An efficient fault-tolerant mechanism for distributed file cache consistency. In Proceedings of the 12th ACM Symposium on Operating Systems Principles (SOSP). 202--210. Google Scholar
Digital Library
- Hassin, Y. and Peleg, D. 2006. Average probe complexity in quorum systems. J. Comput. Syst. Sci. 72, 4, 592--616. Google Scholar
Digital Library
- Herlihy, M. P. and Wing, J. M. 1990. Linearizability: A correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12, 3, 463--492. Google Scholar
Digital Library
- Hsiao, H.-I. and DeWitt, D. 1990. Chained declustering: A new availability strategy for multiprocessor database machines. In Proceedings of the 6th International Data Engineering Conference, 456--465. Google Scholar
Digital Library
- Kistler, J. J. and Satyanarayanan, M. 1992. Disconnected operation in the Coda file system. ACM Trans. Comput. Syst. 10, 1, 3--25. Google Scholar
Digital Library
- Lamport, L. 1979. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput. C-28, 9 (Sept.), 690--91. Google Scholar
Digital Library
- Lamport, L. 1998. The part-time parliament. ACM Trans. Comput. Syst. 16, 2, 133--169. Google Scholar
Digital Library
- Lamport, L. 2002. Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers. Addison-Wesley. Google Scholar
Digital Library
- Lamport, L. and Massa, M. 2004. Cheap paxos. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), 307--314. Google Scholar
Digital Library
- Lee, E. K. and Thekkath, C. A. 1996. Petal: Distributed virtual disks. In Proceedings of the Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google Scholar
Digital Library
- Liskov, B., Ghemawat, S., Gruber, R., Johnson, P., and Shrira, L. 1991. Replication in the Harp file system. In Proceedings of the 13th ACM Symposium on Operating Systems Principles (SOSP). ACM Press, 226--238. Google Scholar
Digital Library
- MacCormick, J., Murphy, N., Najork, M., Thekkath, C. A., and Zhou, L. 2004. Boxwood: Abstractions as the foundation for storage infrastructure. In Proceedings of the Symposium on Operating System Design and Implementation (OSDI). Google Scholar
Digital Library
- Oki, B. M. and Liskov, B. H. 1988. Viewstamped replication: A new primary copy method to support highly-available distributed systems. In Proceedings of the 7th ACM Symposium on Operating Systems Principles (SOSP), 8--17. Google Scholar
Digital Library
- Papadimitriou, C. H. 1979. The serializability of concurrent database updates. J. ACM 26, 4, 631--653. Google Scholar
Digital Library
- Perl, S. E. and Seltzer, M. 2006. Data management for Internet-scale single-sign-on. In Proceedings of the 3rd USENIX Workshop on Real, Large Distributed Systems (WORLDS). Google Scholar
Digital Library
- Petersen, K., Spreitzer, M. J., Terry, D. B., Theimer, M. M., and Demers, A. J. 1997. Flexible update propagation for weakly consistent replication. In Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP). ACM Press, New York, 288--301. Google Scholar
Digital Library
- Risvik, K. M., Aasheim, Y., and Lidal, M. 2003. Multi-tier architecture for web search engines. In Proceedings of the 1st Latin American Web Congress (LA-WEB), Empowering Our Web. IEEE Computer Society, 132--143. Google Scholar
Digital Library
- Saito, Y., Frølund, S., Veitch, A., Merchant, A., and Spence, S. 2004. FAB: Building distributed enterprise disk arrays from commodity components. SIGOPS Oper. Syst. Rev. 38, 5, 48--58. Google Scholar
Digital Library
- van Renesse, R. and Schneider, F. B. 2004. Chain replication for supporting high throughput and availability. In Proceedings of the Symposium on Operating Systems Design and Implementation (OSDI). Google Scholar
Digital Library
Index Terms
Niobe: A practical replication protocol
Recommendations
Quorum-based synchronization protocols for multimedia replicas
Multiple replicas of multimedia objects are distributed to peers in overlay networks. In quorum-based (QB) protocols, every replica may not be up-to-date and the up-to-date replica can be found in the version counter. Multimedia objects are ...
Design and Evaluation of Multimedia Quorum-Based Synchronization Protocols
NBIS '12: Proceedings of the 2012 15th International Conference on Network-Based Information SystemsReplicas of multimedia objects are distributed to peers in overlay networks. In quorum-based (QB) protocols, every replica may not be up-to-date and the up-to-date replica can be found in the version counter. Multimedia objects are characterized in ...
An Extended Multimedia Quorum-based Synchronization Protocol
AINA '13: Proceedings of the 2013 IEEE 27th International Conference on Advanced Information Networking and ApplicationsWe discuss how to efficiently and consistently manipulate multiple replicas of a multimedia object. Multimedia replicas are characterized in terms of not only data structure but also quality of service (QoS). Multimedia replicas are written in enriching ...






Comments