On scalable and efficient distributed failure detectors
|
Tools and Resources
Share: |
|||||||||||||||||||||||||
ABSTRACTProcess groups in distributed applications and services rely on failure detectors to detect process failures completely, and as quickly, accurately, and scalably as possible, even in the face of unreliable message deliveries. In this paper, we look at quantifying the optimal scalability, in terms of network load, (in messages per second, with messages having a size limit) of distributed, complete failure detectors as a function of application-specified requirements. These requirements are 1) quick failure detection by some non-faulty process, and 2) accuracy of failure detection. We assume a crash-recovery (non-Byzantine) failure model, and a network model that is probabilistically unreliable (w.r.t. message deliveries and process failures). First, we characterize, under certain independence assumptions, the optimum worst-case network load imposed by any failure detector that achieves an application's requirements. We then discuss why traditional heart beating schemes are inherently unscalable according to the optimal load. We also present a randomized, distributed, failure detector algorithm that imposes an equal expected load per group member. This protocol satisfies the application defined constraints of completeness and accuracy, and speed of detection on an average. It imposes a network load that differs frown the optimal by a sub-optimality factor that is much lower than that for traditional distributed heartbeating schemes. Moreover, this sub-optimality factor does not vary with group size (for large groups).
AUTHORS
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Indranil Gupta | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Tushar D. Chandra | |||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
| View colleagues of Germán S. Goldszmidt | |||||||||||||||||||||||||||||||||||||||||
REFERENCESNote: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
|
1
|
||
|
2
|
C. Almeida and P. Verissimo. Timing failure detection and real-time group communication in real-time systems. In Proceedings of 8th Euromicro Workshop on Real-Time Systems, June 1996.
|
|
| |
3
|
|
|
4
|
||
| |
5
|
|
|
6
|
||
|
7
|
S. A. Fakhouri, G. S. Goldszmidt, I. Gupta, M. Kalantar, and J. A. Pershing. Guffstream - a system for dynamic topology management in multi-domain server farms. Technical Report RC 21954, IBM T.J. Watson Research Center, February 2001.
|
|
| |
8
|
|
| |
9
|
|
|
10
|
||
|
11
|
J. M. Helary and M. Hurfin. Solving Agreement problems with failure detectors; a survey. Annals of Telecommunications, 52(9-10):447-464, September-October 1997.
|
|
| |
12
|
Mikel Larrea , Antonio Fernández , Sergio Arévalo, Optimal implementation of the weakest failure detector for solving consensus (brief announcement), Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing, p.334, July 16-19, 2000, Portland, Oregon, USA [doi>10.1145/343477.362113]
|
|
13
|
||
|
14
|
CITED BY33 Citations
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
INDEX TERMSThe ACM Computing Classification System (CCS rev.2012)
PUBLICATION| Title | PODC '01 Proceedings of the twentieth annual ACM symposium on Principles of distributed computing table of contents | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Chairmen | Ajay Kshemkalyani Univ. of Illinois at Chicago, Chicago | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Nir Shavit Tel-Aviv Univ.; and Sun Labs. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Pages | 170-179 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Publication Date | 2001-08-01 (yyyy-mm-dd) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sponsors | SIGACT ACM Special Interest Group on Algorithms and Computation Theory | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| SIGOPS ACM Special Interest Group on Operating Systems | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Publisher | ACM New York, NY, USA ©2001 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ISBN: 1-58113-383-9 Order Number: 536010 doi>10.1145/383962.384010 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Conference |
PODCPrinciples of Distributed Computing
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Paper Acceptance Rate 39 of 118 submissions, 33% | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Overall Acceptance Rate 942 of 3,170 submissions, 30% | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
REVIEWS
COMMENTSBe the first to comment To Post a comment please sign in or create a free Web account
Table of Contents| Lamport on mutual exclusion: 27 years of planting seeds | |
| James H. Anderson | |
| Pages: 3-12 | |
| doi>10.1145/383962.383967 | |
Full text: PDF
|
|
|
Mutual exclusion is a topic that Leslie Lamport has returned to many times throughout his career. This article, which is being written in celebration of Lamport's sixtieth birthday, is an attempt to survey some of his many contributions to research on ...
expand
|
|
| The ABCD's of Paxos | |
| Butler Lampson | |
| Page: 13 | |
| doi>10.1145/383962.383969 | |
Full text: PDF
|
|
|
We explain how consensus is used to implement replicated state machines, the general mechanism for fault-tolerance. We describe an abstract version of Lamport's Paxos algorithm for asynchronous consensus. Then we derive the Byzantine, classic, and disk ...
expand
|
|
| Sticks and stones: a coding scheme for parameterized verification | |
| Amir Pnueli | |
| Page: 14 | |
| doi>10.1145/383962.383971 | |
Full text: PDF
|
|
|
We consider the problem of Uniform Algorithmic Verification of Parameterized Systems, which requires establishing in a single verification effort the correctness of a parameterized family of systems for any value of the parameter. As has been observed ...
expand
|
|
| Leslie Lamport's properties and actions | |
| Martín Abadi | |
| Page: 15 | |
| doi>10.1145/383962.383973 | |
Full text: PDF
|
|
|
Since the 1970s, Leslie Lamport has done substantial work on specification and verification methods. This work might be regarded as complementary to his other celebrated research on concurrency and distributed computing, perhaps sometimes even as secondary. ...
expand
|
|
| Implementing atomic objects in a dynamic environment | |
| Nancy Lynch | |
| Page: 16 | |
| doi>10.1145/383962.383977 | |
Full text: PDF
|
|
|
This talk will describe a new algorithm for implementing atomic objects in distributed settings where processes may fail (by stopping), and may also join and leave voluntarily. This strategy builds on Lamport's Paxos algorithm, and also on work by [Yeger-Lotem, ...
expand
|
|
| The L |
|
| Chris Rowley | |
| Pages: 17-25 | |
| doi>10.1145/383962.383978 | |
Full text: PDF
|
|
|
The second edition of The Manual [23] begins: `LATEX is a system for typesetting documents. Its first widely available version, mysteriously numbered 2.09, appeared in 1985.'
It is too early for a complete critical assessment of the impact of LATEX ...
expand
|
|
| On beyond registers: wait-free readable objects | |
| Maurice Herlihy | |
| Pages: 26-42 | |
| doi>10.1145/383962.383979 | |
Full text: PDF
|
|
|
Leslie Lamport was the first to pose many of the fundamental questions about synchronization that drive much of our community's research, even today. In this paper, we revisit some of Lamport's classic questions in a modern context. In particular, we ...
expand
|
|
| Restoration by path concatenation: fast recovery of MPLS paths | |
| Anat Bremler-Barr, Yehuda Afek, Haim Kaplan, Edith Cohen, Michael Merritt | |
| Pages: 43-52 | |
| doi>10.1145/383962.383980 | |
Full text: PDF
|
|
|
A new general theory about restoration of network paths is first introduced. The theory pertains to restoration of shortest paths in a network following failure, e.g., we prove that a shortest path in a network after removing k edges is ...
expand
|
|
| Computing almost shortest paths | |
| Michael Elkin | |
| Pages: 53-62 | |
| doi>10.1145/383962.383983 | |
Full text: PDF
|
|
|
We study the s-sources almost shortest paths (shortly, s-ASP) problem. Given an unweighted graph G = (V, E), and a subset S ⊈ V of s nodes, the goal is to compute almost shortest paths between ...
expand
|
|
| Distributed MST for constant diameter graphs | |
| Zvi Lotker, Boaz Patt-Shamir, David Peleg | |
| Pages: 63-71 | |
| doi>10.1145/383962.383984 | |
Full text: PDF
|
|
|
This paper considers the problem of distributively constructing a minimum-weight spanning tree (MST) for graphs of constant diameter in the bounded-messages model, where each message can contain at most B bits for some parameter B. It is ...
expand
|
|
| String realizers of posets with applications to distributed computing | |
| Vijay K. Garg, Chakarat Skawratananond | |
| Pages: 72-80 | |
| doi>10.1145/383962.383988 | |
Full text: PDF
|
|
|
In this paper, we show the connection between vector clocks used in distributed computing and dimension theory of partially ordered sets. Based on this connection, we provide lower bounds on the number of coordinates for timestamping events in a distributed ...
expand
|
|
| On the generalized dining philosophers problem | |
| Oltea Mihaela Herescu, Catuscia Palamidessi | |
| Pages: 81-89 | |
| doi>10.1145/383962.383994 | |
Full text: PDF
|
|
|
We consider a generalization of the dining philosophers problem to arbitrary connection topologies. We focus on symmetric, fully distributed systems, and we address the problem of guaranteeing progress and lockout-freedom, even in presence of adversary ...
expand
|
|
| An improved lower bound for the time complexity of mutual exclusion | |
| James H. Anderson, Yong-Jik Kim | |
| Pages: 90-99 | |
| doi>10.1145/383962.383996 | |
Full text: PDF
|
|
|
We establish a lower bound of &OHgr;(log N/log log N) remote memory references for N-process mutual exclusion algorithms based on reads, writes, or comparison primitives. Our bound improves an earlier bound of &OHgr;(log log N/log ...
expand
|
|
| A note on group mutual exclusion | |
| Vassos Hadzilacos | |
| Pages: 100-106 | |
| doi>10.1145/383962.383997 | |
Full text: PDF
|
|
|
Group mutual exclusion is a natural problem, formulated by Joung in 1998, that generalises the classical mutual exclusion problem. In group mutual exclusion a process requests a “session” before entering its critical section; processes are ...
expand
|
|
| Nearly optimal perfectly-periodic schedules | |
| Amotz Bar-Noy, Aviv Nisgav, Boaz Patt-Shamir | |
| Pages: 107-116 | |
| doi>10.1145/383962.383998 | |
Full text: PDF
|
|
|
We consider the problem of scheduling a set of jobs on a single shared resource using time-multiplexing. A perfectly-periodic schedule is one where resource time is divided into equal size “time-slots” quanta, and each job gets a time ...
expand
|
|
| The do-all problem in broadcast networks | |
| Bogdan S. Chlebus, Dariusz R. Kowalski, Andrzej Lingas | |
| Pages: 117-127 | |
| doi>10.1145/383962.384000 | |
Full text: PDF
|
|
|
The problem of performing t tasks in a distributed system on p failure-prone processors is one of the fundamental problems in distributed computing. If the tasks are similar and independent and the processors communicate by sending messages ...
expand
|
|
| Competitive concurrent distributed queuing | |
| Maurice Herlihy, Srikanta Tirthapura, Rogert Wattenhofer | |
| Pages: 127-133 | |
| doi>10.1145/383962.384001 | |
Full text: PDF
|
|
|
Distributed queuing is a fundamental problem in distributed computing, arising in a variety of applications. The challenge in designing a distributed queuing algorithm is to minimize message traffic and delay.
This paper gives a novel competitive analysis ...
expand
|
|
| Bandwidth constrained placement in a WAN | |
| Arun Venkataramani, Phoebe Weidmann, Mike Dahlin | |
| Pages: 134-143 | |
| doi>10.1145/383962.384002 | |
Full text: PDF
|
|
|
In this paper, we examine the bandwidth-constrained placement problem, focusing on trade-offs appropriate for wide area network (WAN) environments. The goal is to place copies of objects at a collection of distributed caches to minimize expected ...
expand
|
|
| Compressed bloom filters | |
| Michael Mitzenmacher | |
| Pages: 144-150 | |
| doi>10.1145/383962.384004 | |
Full text: PDF
|
|
|
A Bloom filter is a simple space-efficient randomized data structure for representing a set in order to support membership queries. Although Bloom filters allow false positives, for many applications the space savings outweigh this draw-back when the ...
expand
|
|
| A hierarchy of conditions for consensus solvability | |
| Achour Mostefaoui, Sergio Rajsbaum, Michel Raynal, Matthieu Roy | |
| Pages: 151-160 | |
| doi>10.1145/383962.384006 | |
Full text: PDF
|
|
|
In a previous paper we introduced the condition-based approach, consisting of identifying sets of input vectors, called conditions, for which there exists an asynchronous protocol solving consensus despite the occurrence of up to f ...
expand
|
|
| The concurrency hierarchy, and algorithms for unbounded concurrency | |
| Eli Gafni, Michael Merritt, Gadi Taubenfeld | |
| Pages: 161-169 | |
| doi>10.1145/383962.384008 | |
Full text: PDF
|
|
|
We study wait-free computation using (read/write) shared memory under a range of assumptions on the arrival pattern of processes. We distinguish first between bounded and infinite arrival patterns, and further distinguish these models by restricting ...
expand
|
|
| On scalable and efficient distributed failure detectors | |
| Indranil Gupta, Tushar D. Chandra, Germán S. Goldszmidt | |
| Pages: 170-179 | |
| doi>10.1145/383962.384010 | |
Full text: PDF
|
|
|
Process groups in distributed applications and services rely on failure detectors to detect process failures completely, and as quickly, accurately, and scalably as possible, even in the face of unreliable message deliveries. In ...
expand
|
|
| Average probe complexity in quorum systems | |
| Yehuda Hassin, David Peleg | |
| Pages: 180-189 | |
| doi>10.1145/383962.384014 | |
Full text: PDF
|
|
|
This paper discusses the probe complexity of randomized algorithms and the deterministic average case probe complexity for some classes of non-dominated coteries, including majority, crumbling walls, tree, wheel and hierarchical quorum systems, and presents ...
expand
|
|
| Lock-free reference counting | |
| David L. Detlefs, Paul A. Martin, Mark Moir, Guy L. Steele, Jr. | |
| Pages: 190-199 | |
| doi>10.1145/383962.384016 | |
Full text: PDF
|
|
|
Assuming the existence of garbage collection makes it easier to design implementations of concurrent data structures. However, this assumption limits their applicability. We present a methodology that, for a significant class of data structures, allows ...
expand
|
|
| Detecting distributed cycles of garbage in large-scale systems | |
| Fabrice Le Fessant | |
| Pages: 200-209 | |
| doi>10.1145/383962.384018 | |
Full text: PDF
|
|
| The IceCube approach to the reconciliation of divergent replicas | |
| Anne-Marie Kermarrec, Antony Rowstron, Marc Shapiro, Peter Druschel | |
| Pages: 210-218 | |
| doi>10.1145/383962.384020 | |
Full text: PDF
|
|
|
We describe a novel approach to log-based reconciliation called IceCube. It is general and is parameterised by application and object semantics. IceCube considers more flexible orderings and is designed to ease the burden of reconciliation on the application ...
expand
|
|
| Exploitng event stream interpretation in publish-subscribe systems | |
| Yuanyuan Zhao, Rob Strom | |
| Pages: 219-228 | |
| doi>10.1145/383962.384023 | |
Full text: PDF
|
|
|
Publish-subscribe messaging middleware typically offers limited and low-level options for quality of service, such as best-effort delivery versus reliable delivery, or ordered versus unordered. We propose a new, high-level approach to specifying quality ...
expand
|
|
| Replicated condition monitoring | |
| Yongqiang Huang, Hector Garcia-Molina | |
| Pages: 229-237 | |
| doi>10.1145/383962.384026 | |
Full text: PDF
|
|
|
A condition monitoring system tracks real-world variables and alerts users when a predefined condition becomes true, e.g., when stock price drops, or when a nuclear reactor over-heats. Replication of monitoring servers can reduce the probability that ...
expand
|
|
| Computing property-preserving behaviour abstractions from trace reductions: abstraction-based verification of linear-time properties under fairness | |
| Simon St. James, Ulrich Ultes-Nitsche | |
| Pages: 238-245 | |
| doi>10.1145/383962.384027 | |
Full text: PDF
|
|
|
Weakly continuation-closed abstractions are known to preserve properties satisfied within fairness, i.e. linear-time temporal properties under an abstract notion of fairness. Being defined on the complete behaviour of a distributed system, weakly continuation-closed ...
expand
|
|
| Reliability and performance of hierarchical RAID with multiple controllers | |
| Sung Hoon Baek, Bong Wan Kim, Eui Joung Joung, Chong Won Park | |
| Pages: 246-254 | |
| doi>10.1145/383962.384036 | |
Full text: PDF
|
|
|
Redundant arrays of inexpensive disks (RAID) offer fault tolerance against disk failures. However a storage system having more disks suffers from less reliability and performance. A RAID architecture tolerating multiple disk failures shows severe performance ...
expand
|
|
| Distributed multi-broadcast in unknown radio networks | |
| Andrea E. F. Clementi, Angelo Monti, Riccardo Silvestri | |
| Pages: 255-264 | |
| doi>10.1145/383962.384040 | |
Full text: PDF
|
|
|
One of the most frequent tasks in multi-hop synchronous radio networks is the multi-broadcast operation: it consists in performing r independent message broadcasts through a network of n nodes. We investigate the case in which messages ...
expand
|
|
| Analysis of a cone-based distributed topology control algorithm for wireless multi-hop networks | |
| Li Li, Joseph Y. Halpern, Paramvir Bahl, Yi-Min Wang, Rogert Wattenhofer | |
| Pages: 264-273 | |
| doi>10.1145/383962.384043 | |
Full text: PDF
|
|
|
The topology of a wireless multi-hop network can be controlled by varying the transmission power at each node. In this paper, we give a detailed analysis of a cone-based distributed topology control algorithm. This algorithm, introduced in [16], does ...
expand
|
|
| Practical multi-candidate election system | |
| Olivier Baudron, Pierre-Alain Fouque, David Pointcheval, Jacques Stern, Guillaume Poupard | |
| Pages: 274-283 | |
| doi>10.1145/383962.384044 | |
Full text: PDF
|
|
|
The aim of electronic voting schemes is to provide a set of protocols that allow voters to cast ballots while a group of authorities collect the votes and output the final tally. In this paper we describe a practical multi-candidate election scheme that ...
expand
|
|
| An optimally robust hybrid mix network | |
| Markus Jakobsson, Ari Juels | |
| Pages: 284-292 | |
| doi>10.1145/383962.384046 | |
Full text: PDF
|
|
|
We present a mix network that achieves efficient integration of public-key and symmetric-key operations. This hybrid mix network is capable of natural processing of arbitrarily long input elements, and is fast in both practical and asymptotic ...
expand
|
|
| Selective private function evaluation with applications to private statistics | |
| Ran Canetti, Yuval Ishai, Ravi Kumar, Michael K. Reiter, Ronitt Rubinfeld, Rebecca N. Wright | |
| Pages: 293-304 | |
| doi>10.1145/383962.384047 | |
Full text: PDF
|
|
|
Motivated by the application of private statistical analysis of large databases, we consider the problem of selective private function evaluation (SPFE). In this problem, a client interacts with one or more servers holding copies of a database ...
expand
|
|
| Optimal scheduling for disconnected cooperation | |
| Grzegorz Greg Malewicz, Alexander Russell, Alex Shvartsman | |
| Pages: 305-307 | |
| doi>10.1145/383962.384048 | |
Full text: PDF
|
|
|
We consider a distributed environment consisting of n processors that need to perform t tasks. We assume that communication is initially unavailable and that processors begin work in isolation. At some unknown point of time an unknown collection ...
expand
|
|
| Adding networks | |
| Panagiota Fatourou, Maurice Herlihy | |
| Pages: 308-310 | |
| doi>10.1145/383962.384049 | |
Full text: PDF
|
|
|
An adding network is a distributed data structure that supports a concurrent, lock-free, low-contention implementation of a fetch&add counter. We give a lower bound showing that adding networks have inherently high latency. We ...
expand
|
|
| Randomized shared queues | |
| Hyunyoung Lee, Jennifer L. Welch | |
| Pages: 311-313 | |
| doi>10.1145/383962.384050 | |
Full text: PDF
|
|
|
This paper presents a specification of a randomized shared queue that can lose some elements or return them out of order (not in FIFO), shows that the specification can be implemented over the probabilistic quorum algorithm of [4, 3], and analyzes the ...
expand
|
|
| Dynamic input/output automata, a formal model for dynamic systems | |
| Paul C. Attie, Nancy A. Lynch | |
| Pages: 314-316 | |
| doi>10.1145/383962.384051 | |
Full text: PDF
|
|
|
We present a mathematical state-machine model, the Dynamic I/O Automaton (DIOA) model, for defining and analyzing dynamic systems of interacting components. The systems we consider are dynamic in two senses: (1) components can be created ...
expand
|
|
| A framework for semantic reasoning about Byzantine quorum systems | |
| Evelyn Pierce, Lorenzo Alvisi | |
| Pages: 317-319 | |
| doi>10.1145/383962.384052 | |
Full text: PDF
|
|
|
We have defined a class of shared variables called TS-variables that includes those implemented by the various Byzantine quorum system constructions of Malkhi and Reiter, and developed a number of definitions and theorems enabling us to reason ...
expand
|
|
| An efficient communication strategy for ad-hoc mobile networks | |
| Ioannis Chatzigiannakis, Sotiris Nikoletseas, Paul Spirakis | |
| Pages: 320-322 | |
| doi>10.1145/383962.384053 | |
Full text: PDF
|
|
| Correction: practical implementations of non-blocking synchronization primitives | |
| Mark Moir | |
| Page: 323 | |
| doi>10.1145/383962.384054 | |
Full text: PDF
|
|
|
I describe a problem with an algorithm in my previous paper “Practical Implementations of Synchronization Primitives”, and its correction.
expand
|