Abstract
Total order broadcast is a fundamental communication primitive that plays a central role in bringing cheap software-based high availability to a wide range of services. This article studies the practical performance of such a primitive on a cluster of homogeneous machines.
We present LCR, the first throughput optimal uniform total order broadcast protocol. LCR is based on a ring topology. It only relies on point-to-point inter-process communication and has a linear latency with respect to the number of processes. LCR is also fair in the sense that each process has an equal opportunity of having its messages delivered by all processes.
We benchmark a C implementation of LCR against Spread and JGroups, two of the most widely used group communication packages. LCR provides higher throughput than the alternatives, over a large number of scenarios.
- Amir, Y., Danilov, C., Miskin-Amir, M., Schultz, J., and Stanton, J. 2004. The spread toolkit: Architecture and performance. Tech. rep., CNDS-2004-1, Johns Hopkins Univ.Google Scholar
- Amir, Y., Moser, L. E., Melliar-Smith, P. M., Agarwal, D. A., and Ciarfella, P. 1995. The Totem single-ring ordering and membership protocol. ACM Trans. Comput. Syst. 13, 4, 311--342. Google Scholar
Digital Library
- Anceaume, E. 1997. A lightweight solution to uniform atomic broadcast for asynchronous systems. In Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS'97). IEEE Computer Society Press, Los Alamitos, CA. Google Scholar
Digital Library
- Armstrong, S., Freier, A., and Marzullo, K. 1992. Multicast transport protocol. RFC 1301, IETF. Google Scholar
Digital Library
- Baldoni, R., Cimmino, S., and Marchetti, C. 2006. A classification of total order specifications and its application to fixed sequencer-based implementations. J. Parall. Distrib. Comput. 66, 1 (Jan.), 108--127. Google Scholar
Digital Library
- Ban, B. 2007. JGroups—A Toolkit for Reliable Multicast Communication. http://www.jgroups.org.Google Scholar
- Bar-Noy, A. and Kipnis, S. 1994. Designing broadcasting algorithms in the postal model for message-passing systems. Math. Syst. Theory 27, 5, 431--452. Google Scholar
Digital Library
- Bar-Noy, A. and Kipnis, S. 1997. Multiple message broadcasting in the postal model. Networks 29, 1, 1--10.Google Scholar
Cross Ref
- Birman, K. and Joseph, T. 1987a. Exploiting virtual synchrony in distributed systems. In Proceedings of the 11th ACM Symposium on Operating Systems Principles (SOSP'87). ACM, New York, 123--138. Google Scholar
Digital Library
- Birman, K. and Joseph, T. 1987b. Reliable communication in the presence of failures. ACM Trans. Comput. Syst. 5, 1, 47--76. Google Scholar
Digital Library
- Birman, K. and van Renesse, R. 1993. Reliable Distributed Computing with the Isis Toolkit. IEEE Computer Society Press, Los Alamitos, CA. Google Scholar
Digital Library
- Carr, R. 1985. The tandem global update protocol. Tandem Syst. Rev. 1, 74--85.Google Scholar
- Cecchet, E., Marguerite, J., and Zwaenepoel, W. 2004. C-JDBC: Flexible database clustering middleware. In USENIX Conferenec. USENIX Association, Berkeley, CA. Google Scholar
Digital Library
- Chandra, T. and Toueg, S. 1996. Unreliable failure detectors for reliable distributed systems. J. ACM 43, 2, 225--267. Google Scholar
Digital Library
- Chang, J.-M. and Maxemchuk, N. 1984. Reliable broadcast protocols. ACM Trans. Comput. Syst. 2, 3, 251--273. Google Scholar
Digital Library
- Cristian, F. 1991. Asynchronous atomic broadcast. IBM Tech. Discl. Bull. 33, 9, 115--116.Google Scholar
- Cristian, F., Mishra, S., and Alvarez, G. 1997. High-performance asynchronous atomic broadcast. Distrib. Syst. Eng. J. 4, 2 (Jun), 109--128.Google Scholar
Cross Ref
- Culler, D., Karp, R., Patterson, D., Sahay, A., Schauser, K., Santos, E., Subramonian, R., and von Eicken, T. 1993. LogP: Towards a realistic model of parallel computation. In Principles Practice of Parallel Programming. 1--12. Google Scholar
Digital Library
- Défago, X., Schiper, A., and Urbán, P. 2003. Comparative performance analysis of ordering strategies in atomic broadcast algorithms. IEICE Trans. Inf. Syst. E86-D, 12, 2698--2709.Google Scholar
- Défago, X., Schiper, A., and Urbán, P. 2004. Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Comput. Surv. 36, 4, 372--421. Google Scholar
Digital Library
- Dunagan, J., Harvey, N. J. A., Jones, M. B., Kostic, D., Theimer, M., and Wolman, A. 2004. FUSE: Lightweight guaranteed distributed failure notification. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI'04). USENIX Association, 151--166. Google Scholar
Digital Library
- Ekwall, R., Schiper, A., and Urban, P. 2004. Token-based atomic broadcast using unreliable failure detectors. In Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems (SRDS'04). IEEE Computer Society Press, Los Alamitos, CA, 52--65. Google Scholar
Digital Library
- Ezhilchelvan, P., Macedo, R., and Shrivastava, S. 1995. Newtop: a fault-tolerant group communication protocol. In Proceedings of the 15th International Conference on Distributed Computing Systems (ICDCS'95). IEEE Computer Society Press, Los Alamitos, CA. Google Scholar
Digital Library
- Friedman, T. and Renesse, R. V. 1997. Packing messages as a tool for boosting the performance of total ordering protocls. In Proceedings of the 6th IEEE International Symposium on High Performance Distributed Computing (HPDC'97). IEEE Computer Society Press, Los Alamitos, CA, 233. Google Scholar
Digital Library
- Fritzke, U., Ingels, P., Mostefaoui, A., and Raynal, M. 2001. Consensus-based fault-tolerant total order multicast. IEEE Trans. Parall. Distrib. Syst. 12, 2, 147--156. Google Scholar
Digital Library
- Garcia-Molina, H. and Spauster, A. 1991. Ordered and reliable multicast communication. ACM Trans. Comput. Syst. 9, 3, 242--271. Google Scholar
Digital Library
- Gopal, A. and Toueg, S. 1989. Reliable broadcast in synchronous and asynchronous environments (preliminary version). In Proceedings of the 3rd International Workshop on Distributed Algorithms. Springer-Verlag, Berlin, 110--123. Google Scholar
Digital Library
- Guerraoui, R., Kostic, D., Levy, R. R., and Quéma, V. 2007. A high throughput atomic storage algorithm. In Proceedings of the 27th IEEE International Conference on Distributed Computing Systems (ICDCS'07). IEEE Computer Society Press, Los Alamitos, CA. Google Scholar
Digital Library
- Guerraoui, R., Levy, R. R., Pochon, B., and Quéma, V. 2006. High throughput total order broadcast for cluster environments. In Proceedings of the IEEE International Conference on Dependable Systems and Networks (DSN'06). IEEE Computer Society Press, Los Alamitos, CA. Google Scholar
Digital Library
- Hadzilacos, V. and Toueg, S. 1993. Fault-tolerant broadcasts and related problems. In Distributed Systems. ACM, New York, 97--145. Google Scholar
Digital Library
- Jones, R. 2007. Netperf. http://www.netperf.org/.Google Scholar
- Kaashoek, F. and Tanenbaum, A. 1996. An evaluation of the amoeba group communication system. In Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS'96). IEEE Computer Society Press, Los Alamitos, CA. Google Scholar
Digital Library
- Kim, J. and Kim, C. 1997. A total ordering protocol using a dynamic token-passing scheme. Distrib. Syst. Eng. J. 4, 2, 87--95.Google Scholar
Cross Ref
- Luan, S. and Gligor, V. 1990. A fault-tolerant protocol for atomic broadcast. IEEE Trans. Parall. Distrib. Syst. 1, 3, 271--285. Google Scholar
Digital Library
- Lynch, N. A. 1996. Distributed Algorithms. Morgan-Kaufmann, San Francisco, CA. Google Scholar
Digital Library
- Malhis, L., Sanders, W., and Schlichting, R. 1996. Numerical performability evaluation of a group multicast protocol. Distrib. Syst. Enj. J. 3, 1, 39--52.Google Scholar
Cross Ref
- Moser, L., Melliar-Smith, P., and Agrawala, V. 1993. Asynchronous fault-tolerant total ordering algorithms. SIAM J. Comput. 22, 4, 727--750. Google Scholar
Digital Library
- Ng, T. 1991. Ordered broadcasts for large applications. In Proceedings of the 10th IEEE International Symposium on Reliable Distributed Systems (SRDS'91). IEEE Computer Society Press, Los Alamitos, CA, 188--197.Google Scholar
Cross Ref
- Peterson, L., Buchholz, N., and Schlichting, R. 1989. Preserving and using context information in interprocess communication. ACM Trans. Comput. Syst. 7, 3, 217--246. Google Scholar
Digital Library
- Rodrigues, L., Fonseca, H., and Verssimo, P. 1996. Totally ordered multicast in large-scale systems. In Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS'96). IEEE Computer Society Press, Los Alamitos, CA. Google Scholar
Digital Library
- Schneider, F. B. 1990. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv. 22, 4, 299--319. Google Scholar
Digital Library
- Stanton, J. R. 2002. A Users Guide to Spread. http://www.spread.org/docs/guide/users_guide.pdf.Google Scholar
- Urbán, P., Dfago, X., and Schiper, A. 2000. Contention-aware metrics for distributed algorithms: Comparison of atomic broadcast algorithms. In Proceedings of the 9th IEEE International Conference on Computer Communications and Networks (IC3N2000). IEEE Computer Society Press, Los Alamitos, CA, 582--589.Google Scholar
- van Renesse, R. and Schneider, F. B. 2004. Chain replication for supporting high throughput and availability. In Proceedings of the 6th Symposium on Operating Systems Design & Implementation (OSDI'04). USENIX Association, Berkeley, CA. Google Scholar
Digital Library
- Vicente, P. and Rodrigues, L. 2002. An indulgent uniform total order algorithm with optimistic delivery. In Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems (SRDS'02). IEEE Computer Society Press, Los Alamitos, CA. Google Scholar
Digital Library
- Whetten, B., Montgomery, T., and Kaplan, S. 1994. A high performance totally ordered multicast protocol. In Selected Papers from the International Workshop on Theory and Practice in Distributed Systems. Springer-Verlag, Berlin, 33--57. Google Scholar
Digital Library
- Wilhelm, U. and Schiper, A. 1995. A hierarchy of totally ordered multicasts. In Proceedings of the 14th Symposium on Reliable Distributed Systems (SRDS'95). IEEE Computer Society Press, Los Alamitos, CA. Google Scholar
Digital Library
Index Terms
Throughput optimal total order broadcast for cluster environments
Recommendations
Total order broadcast and multicast algorithms: Taxonomy and survey
Total order broadcast and multicast (also called atomic broadcast/multicast) present an important problem in distributed systems, especially with respect to fault-tolerance. In short, the primitive ensures that messages sent to a set of processes are, ...
High Throughput Total Order Broadcast for Cluster Environments
DSN '06: Proceedings of the International Conference on Dependable Systems and NetworksTotal order broadcast is a fundamental communication primitive that plays a central role in bringing cheap software-based high availability to a wide array of services. This paper studies the practical performance of such a primitive on a cluster of ...
Reliable and total order broadcast in the crash-recovery model
This paper addresses the problems of broadcasting messages in a reliable and totally ordered manner assuming a crash-recovery model, i.e., a model where processes and channels may crash and possibly recover. We present a suite of specifications of ...






Comments