skip to main content
research-article

Niobe: A practical replication protocol

Published:25 February 2008Publication History
Skip Abstract Section

Abstract

The task of consistently and reliably replicating data is fundamental in distributed systems, and numerous existing protocols are able to achieve such replication efficiently. When called on to build a large-scale enterprise storage system with built-in replication, we were therefore surprised to discover that no existing protocols met our requirements. As a result, we designed and deployed a new replication protocol called Niobe. Niobe is in the primary-backup family of protocols, and shares many similarities with other protocols in this family. But we believe Niobe is significantly more practical for large-scale enterprise storage than previously published protocols. In particular, Niobe is simple, flexible, has rigorously proven yet simply stated consistency guarantees, and exhibits excellent performance. Niobe has been deployed as the backend for a commercial Internet service; its consistency properties have been proved formally from first principles, and further verified using the TLA + specification language. We describe the protocol itself, the system built to deploy it, and some of our experiences in doing so.

References

  1. Aguilera, M. and Frølund, S. 2003. Strict linearizability and the power of aborting. Tech. Rep. 2003-241, Hewlett-Packard Laboratories.Google ScholarGoogle Scholar
  2. Alsberg, P. and Day, J. 1976. A principle for resilient sharing of distributed resources. In Proceedings of the 2nd International Conference on Software Engineering, 627--644. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Barroso, L. A., Dean, J., and Holzle, U. 2003. Web search for a planet: The Google cluster architecture. IEEE Micro. 23, 2, 22--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Budhiraja, N., Marzullo, K., Schneider, F. B., and Toueg, S. 1993. The primary-backup approach. In Distributed Systems. ACM Press/Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Burrows, M. 2006. The Chubby lock service for loosely-coupled distributed systems. In Proceedings of the Symposium on Operating System Design and Implementation (OSDI), 335--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chang, F., Ji, M., Leung, S., MacCormick, J., Perl, S., and Zhang, L. 2002. Myriad: Cost-Effective disaster tolerance. In Proceedings of the Conference on File and Storage Technologies (FAST). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Dolev, D., Keidar, I., and Lotem, E. Y. 1997. Dynamic voting for consistent primary components. In Proceedings of the 16th ACM Symposium on Principles of Distributed Computing (PODC). 63--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ghemawat, S., Gobioff, H., and Leung, S.-T. 2003. The Google file system. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gray, C. and Cheriton, D. 1989. Leases: An efficient fault-tolerant mechanism for distributed file cache consistency. In Proceedings of the 12th ACM Symposium on Operating Systems Principles (SOSP). 202--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hassin, Y. and Peleg, D. 2006. Average probe complexity in quorum systems. J. Comput. Syst. Sci. 72, 4, 592--616. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Herlihy, M. P. and Wing, J. M. 1990. Linearizability: A correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12, 3, 463--492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hsiao, H.-I. and DeWitt, D. 1990. Chained declustering: A new availability strategy for multiprocessor database machines. In Proceedings of the 6th International Data Engineering Conference, 456--465. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kistler, J. J. and Satyanarayanan, M. 1992. Disconnected operation in the Coda file system. ACM Trans. Comput. Syst. 10, 1, 3--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Lamport, L. 1979. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput. C-28, 9 (Sept.), 690--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lamport, L. 1998. The part-time parliament. ACM Trans. Comput. Syst. 16, 2, 133--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Lamport, L. 2002. Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Lamport, L. and Massa, M. 2004. Cheap paxos. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), 307--314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Lee, E. K. and Thekkath, C. A. 1996. Petal: Distributed virtual disks. In Proceedings of the Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Liskov, B., Ghemawat, S., Gruber, R., Johnson, P., and Shrira, L. 1991. Replication in the Harp file system. In Proceedings of the 13th ACM Symposium on Operating Systems Principles (SOSP). ACM Press, 226--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. MacCormick, J., Murphy, N., Najork, M., Thekkath, C. A., and Zhou, L. 2004. Boxwood: Abstractions as the foundation for storage infrastructure. In Proceedings of the Symposium on Operating System Design and Implementation (OSDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Oki, B. M. and Liskov, B. H. 1988. Viewstamped replication: A new primary copy method to support highly-available distributed systems. In Proceedings of the 7th ACM Symposium on Operating Systems Principles (SOSP), 8--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Papadimitriou, C. H. 1979. The serializability of concurrent database updates. J. ACM 26, 4, 631--653. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Perl, S. E. and Seltzer, M. 2006. Data management for Internet-scale single-sign-on. In Proceedings of the 3rd USENIX Workshop on Real, Large Distributed Systems (WORLDS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Petersen, K., Spreitzer, M. J., Terry, D. B., Theimer, M. M., and Demers, A. J. 1997. Flexible update propagation for weakly consistent replication. In Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP). ACM Press, New York, 288--301. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Risvik, K. M., Aasheim, Y., and Lidal, M. 2003. Multi-tier architecture for web search engines. In Proceedings of the 1st Latin American Web Congress (LA-WEB), Empowering Our Web. IEEE Computer Society, 132--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Saito, Y., Frølund, S., Veitch, A., Merchant, A., and Spence, S. 2004. FAB: Building distributed enterprise disk arrays from commodity components. SIGOPS Oper. Syst. Rev. 38, 5, 48--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. van Renesse, R. and Schneider, F. B. 2004. Chain replication for supporting high throughput and availability. In Proceedings of the Symposium on Operating Systems Design and Implementation (OSDI). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Niobe: A practical replication protocol

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Storage
          ACM Transactions on Storage  Volume 3, Issue 4
          February 2008
          156 pages
          ISSN:1553-3077
          EISSN:1553-3093
          DOI:10.1145/1326542
          Issue’s Table of Contents

          Copyright © 2008 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 February 2008
          • Accepted: 1 December 2007
          • Received: 1 July 2007
          Published in tos Volume 3, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Pre-selected

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!