Abstract
We investigate the complexity of algorithms in message-driven models. In such models, events in the computation can only be caused by message receptions, but not by the passage of time. Hutle and Widder [2005a] have shown that there is no deterministic message-driven self-stabilizing implementation of the eventually strong failure detector and thus Ω in systems with uncertainty in message delays and channels of unknown capacity using only bounded space. Under stronger assumptions it was shown that even the eventually perfect failure detector can be implemented in message-driven systems consisting of at least f + 2 processes (f being the upper bound on the number of processes that crash during an execution).
In this article we show that f + 2 is in fact a lower bound in message-driven systems, even if nonstabilizing algorithms are considered. This contrasts time-driven models where f + 1 is sufficient for failure detector implementations.
Moreover, we investigate algorithms where not all processes send message, that is, are active, but some (in a predetermined set) remain passive. Here, we show that the f + 2 processes required for message-driven systems must be active, while in time-driven systems it suffices that f processes are active.
We also provide message-driven implementations of Ω. Our algorithms are efficient in the sense that not all processes have to send messages forever, which is an improvement to previous message-driven failure detector implementations.
- Aguilera, M. K., Delporte-Gallet, C., Fauconnier, H., and Toueg, S. 2003. On implementing Omega with weak reliability and synchrony assumptions. In Proceedings of the 22nd Annual ACM Symposium on Principles of Distributed Computing (PODC'03). ACM Press, New York, NY, 306--314. Google Scholar
Digital Library
- Aguilera, M. K., Delporte-Gallet, C., Fauconnier, H., and Toueg, S. 2004. Communication efficient leader election and consensus with limited link synchrony. In Proceedings of the 23th ACM Symposium on Principles of Distributed Computing (PODC'04). ACM Press, 328--337. Google Scholar
Digital Library
- Beauquier, J. and Kekkonen-Moneta, S. 1997. Fault-tolerance and self-stabilization: Impossibility results and solutions using self-stabilizing failure detectors. Int. J. Syst. Sci. 28, 11, 1177--1187.Google Scholar
Cross Ref
- Chandra, T. D., Hadzilacos, V., and Toueg, S. 1996. The weakest failure detector for solving consensus. J. ACM 43, 4, 685--722. Google Scholar
Digital Library
- Chandra, T. D. and Toueg, S. 1996. Unreliable failure detectors for reliable distributed systems. J. ACM 43, 2, 225--267. Google Scholar
Digital Library
- Dolev, D., Dwork, C., and Stockmeyer, L. 1987. On the minimal synchronism needed for distributed consensus. J. ACM 34, 1, 77--97. Google Scholar
Digital Library
- Dwork, C., Lynch, N., and Stockmeyer, L. 1988. Consensus in the presence of partial synchrony. J. ACM 35, 2, 288--323. Google Scholar
Digital Library
- Ebergen, J. C. 1991. A formal approach to designing delay-insensitive circuits. Distrib. Comput. 5, 107--119. Google Scholar
Digital Library
- Einstein, A. 1905. Zur Elektrodynamik bewegter Körper. Annalen der Physik 322, 10, 891--921.Google Scholar
- Fetzer, C., Schmid, U., and Süsskraut, M. 2005. On the possibility of consensus in asynchronous systems with finite average response times. In Proceedings of the 25th International Conference on Distributed Computing Systems (ICDCS'05). IEEE Computer Society, 271--280. Google Scholar
Digital Library
- Fischer, M. and Lamport, L. 1982. Byzantine generals and transaction commit protocols. Tech. rep. 62, SRI International.Google Scholar
- Fischer, M. J., Lynch, N. A., and Paterson, M. S. 1985. Impossibility of distributed consensus with one faulty process. J. ACM 32, 2, 374--382. Google Scholar
Digital Library
- Fuegger, M., Schmid, U., Fuchs, G., and Kempf, G. 2006. Fault-tolerant distributed clock generation in VLSI systems-on-chip. In Proceedings of the 6th European Dependable Computing Conference (EDCC-6). IEEE Computer Society Press, 87--96. Google Scholar
Digital Library
- Gärtner, F. C. and Pleisch, S. 2001. (Im)possibilities of predicate detection in crash-affected systems using interrupt-style failure detectors. In Brief Announcements—15th International Symposium on DIStributed Computing (DISC'01), J. Welch, Ed. Tech. rep. TR-01-7. Departamento de Informática, Faculdade de Ciências da Universidade de Lisboa, Lisboa, Portugal. 7--12. http://www.di.fc.ul.pt/publications/di-fcul-tr-01-7_document.pdf.Google Scholar
- Guerraoui, R. and Schiper, A. 1996. “Γ-accurate” failure detectors. In Proceedings of the 10th International Workshop on Distributed Algorithms (WDAG'96). Lecture Notes in Computer Science, vol. 1151. Springer Verlag, 269--286. Google Scholar
Digital Library
- Hermant, J.-F. and Widder, J. 2005. Implementing reliable distributed real-time systems with the Θ-model. In Proceedings of the 9th International Conference on Principles of Distributed Systems (OPODIS'05). Lecture Notes in Computer Science, vol. 3974. Springer Verlag, 334--350. Google Scholar
Digital Library
- Hutle, M., Malkhi, D., Schmid, U., and Zhou, L. 2006. Brief announcement: Chasing the weakest system model for implementing omega and consensus. In Proceedings of the 8th International Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS'06). Lecture Notes in Computer Science, vol. 9280. Springer Verlag, 576--577. Google Scholar
Digital Library
- Hutle, M., Malkhi, D., Schmid, U., and Zhou, L. 2008. Chasing the weakest system model for implementing omega and consensus. IEEE Trans. Depend. Secure Comput. To appear. Google Scholar
Digital Library
- Hutle, M. and Widder, J. 2005a. On the possibility and the impossibility of message-driven self-stabilizing failure detection. In Proceedings of the 7th International Symposium on Self-Stabilizing Systems (SSS'05). Lecture Notes in Computer Science, vol. 3764. Springer Verlag, 153--170. Appeared also as brief announcement in Proceedings of the 24th ACM Symposium on Principles of Distributed Computing (PODC'05). Google Scholar
Digital Library
- Hutle, M. and Widder, J. 2005b. Self-stabilizing failure detector algorithms. In Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN'05). IASTED/ACTA Press, 485--490.Google Scholar
- Lamport, L. 1978. Time, clocks, and the ordering of events in a distributed system. Comm. ACM 21, 7, 558--565. Google Scholar
Digital Library
- Le Lann, G. and Schmid, U. 2003. How to implement a timer-free perfect failure detector in partially synchronous systems. Tech. rep. 183/1-127, Department of Automation, Technische Universität Wien.Google Scholar
- Lynch, N. 1996. Distributed Algorithms. Morgan Kaufman Publishers, Inc., San Francisco CA. Google Scholar
Digital Library
- Malkhi, D., Oprea, F., and Zhou, L. 2005. Ω meets paxos: Leader election and stability without eventual timely links. In Proceedings of the 19th Symposium on Distributed Computing (DISC'05). Lecture Notes in Computer Science, vol. 3724. Springer Verlag, 199--213. Google Scholar
Digital Library
- Pease, M., Shostak, R., and Lamport, L. 1980. Reaching agreement in the presence of faults. J. ACM 27, 2, 228--234. Google Scholar
Digital Library
- Robinson, P. and Schmid, U. 2008. Brief announcement: The asynchronous bounded-cycle model. In Proceedings of the 27th ACM Symposium on Principles of Distributed Computing (PODC'08). ACM Press, 423. Google Scholar
Digital Library
- Santoro, N. and Widmayer, P. 1989. Time is not a healer. In Proceedings of the 6th Annual Symposium on Theor. Aspects of Computer Science (STACS'89). Lecture Notes in Computer Science, vol. 349. Springer-Verlag, 304--313. Google Scholar
Digital Library
- Srikanth, T. K. and Toueg, S. 1987. Optimal clock synchronization. J. ACM 34, 3, 626--645. Google Scholar
Digital Library
- Widder, J. 2003. Booting clock synchronization in partially synchronous systems. In Proceedings of the 17th International Symposium on Distributed Computing (DISC'03). Lecture Notes in Computer Science, vol. 2848. Springer Verlag, 121--135.Google Scholar
Cross Ref
- Widder, J. 2004. Distributed computing in the presence of bounded asynchrony. Ph.D. thesis, Vienna University of Technology, Fakultät für Informatik.Google Scholar
- Widder, J., Le Lann, G., and Schmid, U. 2005. Failure detection with booting in partially synchronous systems. In Proceedings of the 5th European Dependable Computing Conference (EDCC-5). Lecture Notes in Computer Science, vol. 3463. Springer Verlag, 20--37. Google Scholar
Digital Library
Index Terms
Optimal message-driven implementations of omega with mute processes
Recommendations
Optimal message-driven implementation of omega with mute processes
SSS'06: Proceedings of the 8th international conference on Stabilization, safety, and security of distributed systemsWe consider the complexity of algorithms in message-driven models, i.e., models of distributed computations where events can only be caused by message receptions but not by the passage of time. Hutle and Widder (2005) have shown that there is no self-...
Communication-efficient and crash-quiescent Omega with unknown membership
The failure detector class Omega (@W) provides an eventual leader election functionality, i.e., eventually all correct processes permanently trust the same correct process. An algorithm is communication-efficient if the number of links that carry ...
On the implementation of communication-optimal failure detectors
LADC'07: Proceedings of the Third Latin-American conference on Dependable ComputingSeveral algorithms implementing failure detectors have been proposed in the literature. In particular, we have proposed a family of communication-efficient ⋄P algorithms, i.e., algorithms using n links to carry messages forever, being n the number of ...






Comments