Abstract
An indulgent algorithm is a distributed algorithm that, besides tolerating process failures, also tolerates unreliable information about the interleaving of the processes. This article presents a general characterization of indulgence in an abstract computing model that encompasses various communication and resilience schemes. We use our characterization to establish several results about the inherent power and limitations of indulgent algorithms.
- Alpern, B. and Schneider, F. B. 1985. Defining liveness. Inf. Process. Lett. 21, 4, 181--185.Google Scholar
Cross Ref
- Attiya, H., Bar-Noy, A., and Dolev, D. 1995. Sharing memory robustly in message passing systems. J. ACM. 42, 2, 124--142. Google Scholar
Digital Library
- Ben-Or, M. 1983. Another advantage of free choice: completely asynchronous agreement protocols (extended abstract). In Proceedings of the Annual ACM Symposium on Principles of Distributed Computing (PODC). ACM, New York, NY, 27--30. Google Scholar
Digital Library
- Chandra, T. D., Hadzilacos, V., and Toueg, S. 1996. The weakest failure detector for solving consensus. J. ACM. 43, 4, 685--722. Google Scholar
Digital Library
- Chandra, T. D. and Toueg, S. 1996. Unreliable failure detectors for reliable distributed systems. J. ACM. 43, 2, 225--267. Google Scholar
Digital Library
- Chaudhuri, S. 1993. More choices allow more faults: Set consensus problems in totally asynchronous systems. Inform. Comput. 105, 1, 132--158. Google Scholar
Digital Library
- Dutta, P. and Guerraoui, R. 2002. The inherent price of indulgence. In Proceedings of the Annual ACM Symposium on Principles of Distributed Computing (PODC). ACM, New York, NY, 88--97. Google Scholar
Digital Library
- Dwork, C., Lynch, N. A., and Stockmeyer, L. 1988. Consensus in the presence of partial synchrony. J. ACM 35, 2, 288--323. Google Scholar
Digital Library
- Fetzer, C., Schmid, U., and Susskraut, M. 2005. On the possibility of consensus in asynchronous systems with finite average response times. In Proceedings of the International Conference on Distributed Computing Systems (ICDCS). IEEE Computer Society, Los Alamitos, CA. 271--280. Google Scholar
Digital Library
- Fischer, M. J., Lynch, N. A., and Paterson, M. S. 1985. Impossibility of distributed consensus with one faulty process. J. ACM. 32, 2, 374--382. Google Scholar
Digital Library
- Guerraoui, R. 2000. Indulgent algorithms (preliminary version). In Proceedings of the 19th Annual ACM Symposium on Principles of Distributed Computing (PODC). ACM, New York, NY, 289--297. Google Scholar
Digital Library
- Guerraoui, R. 2001. On the hardness of failure sensitive agreement problems. Inf. Process. Lett. 79. Google Scholar
Digital Library
- Guerraoui, R. 2002. Non-blocking atomic commit in asynchronous distributed systems with failure detectors. Distrib. Comput. 15, 1, 17--25. Google Scholar
Digital Library
- Guerraoui, R. and Raynal, M. 2004. The information structure of indulgent consensus. IEEE Trans. Comput. 53, 4, 453--466. Google Scholar
Digital Library
- Hadzilacos, V. and Toueg, S. 1993. Fault-tolerant broadcasts and related problems. In Distributed Systems, S. J. Mullender, Ed. Addison-Wesley, Chapter 5, 97--145. Google Scholar
Digital Library
- Herlihy, M. P. 1991. Wait-free synchronization. ACM Trans. Program. Lang. Syst. 13, 1, 123--149. Google Scholar
Digital Library
- Keidar, I. and Shraer, A. 2006. Timeliness, failure detectors and consensus peformance. In Proceedings of the Annual ACM Symposium on Principles of Distributed Computing (PODC). ACM, New York, NY. Google Scholar
Digital Library
- Lamport, L. 1977. Proving the correctness of multiprocessor programs. IEEE Trans. Softw. Eng. 3, 2, 125--143. Google Scholar
Digital Library
- Lamport, L. 1979. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput. C-28, 9, 690--691. Google Scholar
Digital Library
- Lamport, L. 1998. The part-time parliament. ACM Trans. Comput. Sys. 16, 2, 133--169. Google Scholar
Digital Library
- Lamport, L., Shostak, R., and Pease, M. 1982. The Byzantine generals problem. ACM Trans. Program. Lang. Syst. 4, 3, 382--401. Google Scholar
Digital Library
- Lynch, N. A. 1996. Distributed Algorithms. Morgan Kaufmann. Google Scholar
Digital Library
- Mostefaoui, A., Raynal, M., and Travers, C. 2004. Crash-resilient time-free eventual leadership. In Proceedings of the IEEE International Symposium on Reliable Distributed Systems (SRDS). IEEE Computer Society, Los Alamitos, CA, 208--217. Google Scholar
Digital Library
- Sampaio, L. and Brasileiro, F. 2005. Adaptive indulgent consensus. In Proceedings of the International Conference on Dependable Systems and Networks (DSN). 422--431. Google Scholar
Digital Library
- Skeen, D. 1981. Nonblocking commit protocols. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). Y. E. Lien, Ed. ACM Press, New York, NY, 133--142. Google Scholar
Digital Library
- Taubenfeld, G. 2007. Computing in the presence of timing failures. In Proceedings of the IEEE International Conference on Distributed Computing Systems (ICDCS). IEEE Computer Society, Los Alamitos, CA. Google Scholar
Digital Library
- Vicente, P. and Rodrigues, L. 2002. An indulgent uniform total order broadcast algorithm with optimistic delivery. In Proceedings of the IEEE International Symposium on Reliable Distributed Systems (SRDS). IEEE Computer Society, Los Alamitos, CA, 92--80. Google Scholar
Digital Library
- Voelzer, H. 2004. A constructive proof for flp. Inf. Process. Lett. 92. Google Scholar
Digital Library
- Widder, J., Lann, G. L., and Schmid, U. 2005. Failure detection with booting in partially synchronous systems. In Proceedings of the 5th European Dependable Computing Conference. Lecture Notes in Computer Science, vol. 3464. Springer, Berlin, Germany. Google Scholar
Digital Library
- Zielinski, P. 2006. Optimistically terminating consensus. In Proceedings of the IEEE International Symposium on Parallel and Distributed Computing (ISPDC). IEEE Computer Society, Los Alamitos, CA. Google Scholar
Digital Library
Index Terms
A general characterization of indulgence
Recommendations
A general characterization of indulgence
SSS'06: Proceedings of the 8th international conference on Stabilization, safety, and security of distributed systemsAn indulgent algorithm is a distributed algorithm that, besides tolerating process failures, also tolerates arbitrarily long periods of instability, with an unbounded number of timing and scheduling failures. In particular, no process can take any ...
Distributed Reset
A reset subsystem is designed that can be embedded in an arbitrary distributed system in order to allow the system processes to reset the system when necessary. Our design is layered, and comprises three main components: a leader election, a spanning ...
Pattern-based Modeling of Multiresilience Solutions for High-Performance Computing
ICPE '18: Proceedings of the 2018 ACM/SPEC International Conference on Performance EngineeringResiliency is the ability of large-scale high-performance computing (HPC) applications to gracefully handle errors, and recover from failures. In this paper, we propose a pattern-based approach to constructing resilience solutions that handle multiple ...






Comments