skip to main content
research-article

Throughput optimal total order broadcast for cluster environments

Published:26 July 2010Publication History
Skip Abstract Section

Abstract

Total order broadcast is a fundamental communication primitive that plays a central role in bringing cheap software-based high availability to a wide range of services. This article studies the practical performance of such a primitive on a cluster of homogeneous machines.

We present LCR, the first throughput optimal uniform total order broadcast protocol. LCR is based on a ring topology. It only relies on point-to-point inter-process communication and has a linear latency with respect to the number of processes. LCR is also fair in the sense that each process has an equal opportunity of having its messages delivered by all processes.

We benchmark a C implementation of LCR against Spread and JGroups, two of the most widely used group communication packages. LCR provides higher throughput than the alternatives, over a large number of scenarios.

References

  1. Amir, Y., Danilov, C., Miskin-Amir, M., Schultz, J., and Stanton, J. 2004. The spread toolkit: Architecture and performance. Tech. rep., CNDS-2004-1, Johns Hopkins Univ.Google ScholarGoogle Scholar
  2. Amir, Y., Moser, L. E., Melliar-Smith, P. M., Agarwal, D. A., and Ciarfella, P. 1995. The Totem single-ring ordering and membership protocol. ACM Trans. Comput. Syst. 13, 4, 311--342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Anceaume, E. 1997. A lightweight solution to uniform atomic broadcast for asynchronous systems. In Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS'97). IEEE Computer Society Press, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Armstrong, S., Freier, A., and Marzullo, K. 1992. Multicast transport protocol. RFC 1301, IETF. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Baldoni, R., Cimmino, S., and Marchetti, C. 2006. A classification of total order specifications and its application to fixed sequencer-based implementations. J. Parall. Distrib. Comput. 66, 1 (Jan.), 108--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ban, B. 2007. JGroups—A Toolkit for Reliable Multicast Communication. http://www.jgroups.org.Google ScholarGoogle Scholar
  7. Bar-Noy, A. and Kipnis, S. 1994. Designing broadcasting algorithms in the postal model for message-passing systems. Math. Syst. Theory 27, 5, 431--452. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bar-Noy, A. and Kipnis, S. 1997. Multiple message broadcasting in the postal model. Networks 29, 1, 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  9. Birman, K. and Joseph, T. 1987a. Exploiting virtual synchrony in distributed systems. In Proceedings of the 11th ACM Symposium on Operating Systems Principles (SOSP'87). ACM, New York, 123--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Birman, K. and Joseph, T. 1987b. Reliable communication in the presence of failures. ACM Trans. Comput. Syst. 5, 1, 47--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Birman, K. and van Renesse, R. 1993. Reliable Distributed Computing with the Isis Toolkit. IEEE Computer Society Press, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Carr, R. 1985. The tandem global update protocol. Tandem Syst. Rev. 1, 74--85.Google ScholarGoogle Scholar
  13. Cecchet, E., Marguerite, J., and Zwaenepoel, W. 2004. C-JDBC: Flexible database clustering middleware. In USENIX Conferenec. USENIX Association, Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Chandra, T. and Toueg, S. 1996. Unreliable failure detectors for reliable distributed systems. J. ACM 43, 2, 225--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Chang, J.-M. and Maxemchuk, N. 1984. Reliable broadcast protocols. ACM Trans. Comput. Syst. 2, 3, 251--273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Cristian, F. 1991. Asynchronous atomic broadcast. IBM Tech. Discl. Bull. 33, 9, 115--116.Google ScholarGoogle Scholar
  17. Cristian, F., Mishra, S., and Alvarez, G. 1997. High-performance asynchronous atomic broadcast. Distrib. Syst. Eng. J. 4, 2 (Jun), 109--128.Google ScholarGoogle ScholarCross RefCross Ref
  18. Culler, D., Karp, R., Patterson, D., Sahay, A., Schauser, K., Santos, E., Subramonian, R., and von Eicken, T. 1993. LogP: Towards a realistic model of parallel computation. In Principles Practice of Parallel Programming. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Défago, X., Schiper, A., and Urbán, P. 2003. Comparative performance analysis of ordering strategies in atomic broadcast algorithms. IEICE Trans. Inf. Syst. E86-D, 12, 2698--2709.Google ScholarGoogle Scholar
  20. Défago, X., Schiper, A., and Urbán, P. 2004. Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Comput. Surv. 36, 4, 372--421. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Dunagan, J., Harvey, N. J. A., Jones, M. B., Kostic, D., Theimer, M., and Wolman, A. 2004. FUSE: Lightweight guaranteed distributed failure notification. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI'04). USENIX Association, 151--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ekwall, R., Schiper, A., and Urban, P. 2004. Token-based atomic broadcast using unreliable failure detectors. In Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems (SRDS'04). IEEE Computer Society Press, Los Alamitos, CA, 52--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ezhilchelvan, P., Macedo, R., and Shrivastava, S. 1995. Newtop: a fault-tolerant group communication protocol. In Proceedings of the 15th International Conference on Distributed Computing Systems (ICDCS'95). IEEE Computer Society Press, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Friedman, T. and Renesse, R. V. 1997. Packing messages as a tool for boosting the performance of total ordering protocls. In Proceedings of the 6th IEEE International Symposium on High Performance Distributed Computing (HPDC'97). IEEE Computer Society Press, Los Alamitos, CA, 233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Fritzke, U., Ingels, P., Mostefaoui, A., and Raynal, M. 2001. Consensus-based fault-tolerant total order multicast. IEEE Trans. Parall. Distrib. Syst. 12, 2, 147--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Garcia-Molina, H. and Spauster, A. 1991. Ordered and reliable multicast communication. ACM Trans. Comput. Syst. 9, 3, 242--271. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Gopal, A. and Toueg, S. 1989. Reliable broadcast in synchronous and asynchronous environments (preliminary version). In Proceedings of the 3rd International Workshop on Distributed Algorithms. Springer-Verlag, Berlin, 110--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Guerraoui, R., Kostic, D., Levy, R. R., and Quéma, V. 2007. A high throughput atomic storage algorithm. In Proceedings of the 27th IEEE International Conference on Distributed Computing Systems (ICDCS'07). IEEE Computer Society Press, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Guerraoui, R., Levy, R. R., Pochon, B., and Quéma, V. 2006. High throughput total order broadcast for cluster environments. In Proceedings of the IEEE International Conference on Dependable Systems and Networks (DSN'06). IEEE Computer Society Press, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Hadzilacos, V. and Toueg, S. 1993. Fault-tolerant broadcasts and related problems. In Distributed Systems. ACM, New York, 97--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jones, R. 2007. Netperf. http://www.netperf.org/.Google ScholarGoogle Scholar
  32. Kaashoek, F. and Tanenbaum, A. 1996. An evaluation of the amoeba group communication system. In Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS'96). IEEE Computer Society Press, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Kim, J. and Kim, C. 1997. A total ordering protocol using a dynamic token-passing scheme. Distrib. Syst. Eng. J. 4, 2, 87--95.Google ScholarGoogle ScholarCross RefCross Ref
  34. Luan, S. and Gligor, V. 1990. A fault-tolerant protocol for atomic broadcast. IEEE Trans. Parall. Distrib. Syst. 1, 3, 271--285. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Lynch, N. A. 1996. Distributed Algorithms. Morgan-Kaufmann, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Malhis, L., Sanders, W., and Schlichting, R. 1996. Numerical performability evaluation of a group multicast protocol. Distrib. Syst. Enj. J. 3, 1, 39--52.Google ScholarGoogle ScholarCross RefCross Ref
  37. Moser, L., Melliar-Smith, P., and Agrawala, V. 1993. Asynchronous fault-tolerant total ordering algorithms. SIAM J. Comput. 22, 4, 727--750. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Ng, T. 1991. Ordered broadcasts for large applications. In Proceedings of the 10th IEEE International Symposium on Reliable Distributed Systems (SRDS'91). IEEE Computer Society Press, Los Alamitos, CA, 188--197.Google ScholarGoogle ScholarCross RefCross Ref
  39. Peterson, L., Buchholz, N., and Schlichting, R. 1989. Preserving and using context information in interprocess communication. ACM Trans. Comput. Syst. 7, 3, 217--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Rodrigues, L., Fonseca, H., and Verssimo, P. 1996. Totally ordered multicast in large-scale systems. In Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS'96). IEEE Computer Society Press, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Schneider, F. B. 1990. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv. 22, 4, 299--319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Stanton, J. R. 2002. A Users Guide to Spread. http://www.spread.org/docs/guide/users_guide.pdf.Google ScholarGoogle Scholar
  43. Urbán, P., Dfago, X., and Schiper, A. 2000. Contention-aware metrics for distributed algorithms: Comparison of atomic broadcast algorithms. In Proceedings of the 9th IEEE International Conference on Computer Communications and Networks (IC3N2000). IEEE Computer Society Press, Los Alamitos, CA, 582--589.Google ScholarGoogle Scholar
  44. van Renesse, R. and Schneider, F. B. 2004. Chain replication for supporting high throughput and availability. In Proceedings of the 6th Symposium on Operating Systems Design & Implementation (OSDI'04). USENIX Association, Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Vicente, P. and Rodrigues, L. 2002. An indulgent uniform total order algorithm with optimistic delivery. In Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems (SRDS'02). IEEE Computer Society Press, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Whetten, B., Montgomery, T., and Kaplan, S. 1994. A high performance totally ordered multicast protocol. In Selected Papers from the International Workshop on Theory and Practice in Distributed Systems. Springer-Verlag, Berlin, 33--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Wilhelm, U. and Schiper, A. 1995. A hierarchy of totally ordered multicasts. In Proceedings of the 14th Symposium on Reliable Distributed Systems (SRDS'95). IEEE Computer Society Press, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Throughput optimal total order broadcast for cluster environments

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Computer Systems
            ACM Transactions on Computer Systems  Volume 28, Issue 2
            July 2010
            86 pages
            ISSN:0734-2071
            EISSN:1557-7333
            DOI:10.1145/1813654
            Issue’s Table of Contents

            Copyright © 2010 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 26 July 2010
            • Accepted: 1 March 2010
            • Revised: 1 October 2009
            • Received: 1 April 2008
            Published in tocs Volume 28, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!