Contact The DL Team Contact Us | Switch to tabbed view

top of pageABSTRACT

Process groups in distributed applications and services rely on failure detectors to detect process failures completely, and as quickly, accurately, and scalably as possible, even in the face of unreliable message deliveries. In this paper, we look at quantifying the optimal scalability, in terms of network load, (in messages per second, with messages having a size limit) of distributed, complete failure detectors as a function of application-specified requirements. These requirements are 1) quick failure detection by some non-faulty process, and 2) accuracy of failure detection. We assume a crash-recovery (non-Byzantine) failure model, and a network model that is probabilistically unreliable (w.r.t. message deliveries and process failures). First, we characterize, under certain independence assumptions, the optimum worst-case network load imposed by any failure detector that achieves an application's requirements. We then discuss why traditional heart beating schemes are inherently unscalable according to the optimal load. We also present a randomized, distributed, failure detector algorithm that imposes an equal expected load per group member. This protocol satisfies the application defined constraints of completeness and accuracy, and speed of detection on an average. It imposes a network load that differs frown the optimal by a sub-optimality factor that is much lower than that for traditional distributed heartbeating schemes. Moreover, this sub-optimality factor does not vary with group size (for large groups).

Advertisements



top of pageAUTHORS



Author image not provided  Indranil Gupta

No contact information provided yet.

Bibliometrics: publication history
Publication years1995-2016
Publication count90
Citation Count605
Available for download36
Downloads (6 Weeks)162
Downloads (12 Months)1,019
Downloads (cumulative)12,050
Average downloads per article334.72
Average citations per article6.72
View colleagues of Indranil Gupta


Author image not provided  Tushar D. Chandra

No contact information provided yet.

Bibliometrics: publication history
Publication years1990-2016
Publication count22
Citation Count1,451
Available for download12
Downloads (6 Weeks)104
Downloads (12 Months)584
Downloads (cumulative)13,514
Average downloads per article1,126.17
Average citations per article65.95
View colleagues of Tushar D. Chandra


Author image not provided  Germán S. Goldszmidt

No contact information provided yet.

Bibliometrics: publication history
Publication years1988-2010
Publication count22
Citation Count254
Available for download9
Downloads (6 Weeks)5
Downloads (12 Months)71
Downloads (cumulative)3,313
Average downloads per article368.11
Average citations per article11.55
View colleagues of Germán S. Goldszmidt

top of pageREFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
C. Almeida and P. Verissimo. Timing failure detection and real-time group communication in real-time systems. In Proceedings of 8th Euromicro Workshop on Real-Time Systems, June 1996.
3
 
4
5
 
6
 
7
S. A. Fakhouri, G. S. Goldszmidt, I. Gupta, M. Kalantar, and J. A. Pershing. Guffstream - a system for dynamic topology management in multi-domain server farms. Technical Report RC 21954, IBM T.J. Watson Research Center, February 2001.
8
9
 
10
 
11
J. M. Helary and M. Hurfin. Solving Agreement problems with failure detectors; a survey. Annals of Telecommunications, 52(9-10):447-464, September-October 1997.
12
 
13
 
14

top of pageCITED BY

33 Citations

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

top of pageINDEX TERMS

The ACM Computing Classification System (CCS rev.2012)

Note: Larger/Darker text within each node indicates a higher relevance of the materials to the taxonomic classification.

top of pagePUBLICATION

Title PODC '01 Proceedings of the twentieth annual ACM symposium on Principles of distributed computing table of contents
Chairmen Ajay Kshemkalyani Univ. of Illinois at Chicago, Chicago
Nir Shavit Tel-Aviv Univ.; and Sun Labs.
Pages 170-179
Publication Date2001-08-01 (yyyy-mm-dd)
Sponsors SIGACT ACM Special Interest Group on Algorithms and Computation Theory
SIGOPS ACM Special Interest Group on Operating Systems
PublisherACM New York, NY, USA ©2001
ISBN: 1-58113-383-9 Order Number: 536010 doi>10.1145/383962.384010
Conference PODCPrinciples of Distributed Computing PODC logo
Paper Acceptance Rate 39 of 118 submissions, 33%
Overall Acceptance Rate 942 of 3,170 submissions, 30%
Year Submitted Accepted Rate
PODC '94 133 67 50%
PODC '95 132 49 37%
PODC '96 117 69 59%
PODC '97 149 46 31%
PODC '00 117 32 27%
PODC '01 118 39 33%
PODC '02 149 43 29%
PODC '03 226 51 23%
PODC '04 314 75 24%
PODC '06 138 33 24%
PODC '07 204 32 16%
PODC '08 187 84 45%
PODC '09 110 27 25%
PODC '10 179 39 22%
PODC '11 129 34 26%
PODC '12 142 61 43%
PODC '13 145 37 26%
PODC '14 141 39 28%
PODC '15 191 45 24%
PODC '16 149 40 27%
Overall 3,170 942 30%

APPEARS IN
Networking
Software
Theory

top of pageREVIEWS


Reviews are not available for this item
Computing Reviews logo

top of pageCOMMENTS

Be the first to comment To Post a comment please sign in or create a free Web account

top of pageTable of Contents

Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
Table of Contents
Lamport on mutual exclusion: 27 years of planting seeds
James H. Anderson
Pages: 3-12
doi>10.1145/383962.383967
Full text: PDFPDF

Mutual exclusion is a topic that Leslie Lamport has returned to many times throughout his career. This article, which is being written in celebration of Lamport's sixtieth birthday, is an attempt to survey some of his many contributions to research on ...
expand
The ABCD's of Paxos
Butler Lampson
Page: 13
doi>10.1145/383962.383969
Full text: PDFPDF

We explain how consensus is used to implement replicated state machines, the general mechanism for fault-tolerance. We describe an abstract version of Lamport's Paxos algorithm for asynchronous consensus. Then we derive the Byzantine, classic, and disk ...
expand
Sticks and stones: a coding scheme for parameterized verification
Amir Pnueli
Page: 14
doi>10.1145/383962.383971
Full text: PDFPDF

We consider the problem of Uniform Algorithmic Verification of Parameterized Systems, which requires establishing in a single verification effort the correctness of a parameterized family of systems for any value of the parameter. As has been observed ...
expand
Leslie Lamport's properties and actions
Martín Abadi
Page: 15
doi>10.1145/383962.383973
Full text: PDFPDF

Since the 1970s, Leslie Lamport has done substantial work on specification and verification methods. This work might be regarded as complementary to his other celebrated research on concurrency and distributed computing, perhaps sometimes even as secondary. ...
expand
Implementing atomic objects in a dynamic environment
Nancy Lynch
Page: 16
doi>10.1145/383962.383977
Full text: PDFPDF

This talk will describe a new algorithm for implementing atomic objects in distributed settings where processes may fail (by stopping), and may also join and leave voluntarily. This strategy builds on Lamport's Paxos algorithm, and also on work by [Yeger-Lotem, ...
expand
The LATEX legacy: 2.09 and all that
Chris Rowley
Pages: 17-25
doi>10.1145/383962.383978
Full text: PDFPDF

The second edition of The Manual [23] begins: `LATEX is a system for typesetting documents. Its first widely available version, mysteriously numbered 2.09, appeared in 1985.' It is too early for a complete critical assessment of the impact of LATEX ...
expand
On beyond registers: wait-free readable objects
Maurice Herlihy
Pages: 26-42
doi>10.1145/383962.383979
Full text: PDFPDF

Leslie Lamport was the first to pose many of the fundamental questions about synchronization that drive much of our community's research, even today. In this paper, we revisit some of Lamport's classic questions in a modern context. In particular, we ...
expand
Restoration by path concatenation: fast recovery of MPLS paths
Anat Bremler-Barr, Yehuda Afek, Haim Kaplan, Edith Cohen, Michael Merritt
Pages: 43-52
doi>10.1145/383962.383980
Full text: PDFPDF

A new general theory about restoration of network paths is first introduced. The theory pertains to restoration of shortest paths in a network following failure, e.g., we prove that a shortest path in a network after removing k edges is ...
expand
Computing almost shortest paths
Michael Elkin
Pages: 53-62
doi>10.1145/383962.383983
Full text: PDFPDF

We study the s-sources almost shortest paths (shortly, s-ASP) problem. Given an unweighted graph G = (V, E), and a subset S ⊈ V of s nodes, the goal is to compute almost shortest paths between ...
expand
Distributed MST for constant diameter graphs
Zvi Lotker, Boaz Patt-Shamir, David Peleg
Pages: 63-71
doi>10.1145/383962.383984
Full text: PDFPDF

This paper considers the problem of distributively constructing a minimum-weight spanning tree (MST) for graphs of constant diameter in the bounded-messages model, where each message can contain at most B bits for some parameter B. It is ...
expand
String realizers of posets with applications to distributed computing
Vijay K. Garg, Chakarat Skawratananond
Pages: 72-80
doi>10.1145/383962.383988
Full text: PDFPDF

In this paper, we show the connection between vector clocks used in distributed computing and dimension theory of partially ordered sets. Based on this connection, we provide lower bounds on the number of coordinates for timestamping events in a distributed ...
expand
On the generalized dining philosophers problem
Oltea Mihaela Herescu, Catuscia Palamidessi
Pages: 81-89
doi>10.1145/383962.383994
Full text: PDFPDF

We consider a generalization of the dining philosophers problem to arbitrary connection topologies. We focus on symmetric, fully distributed systems, and we address the problem of guaranteeing progress and lockout-freedom, even in presence of adversary ...
expand
An improved lower bound for the time complexity of mutual exclusion
James H. Anderson, Yong-Jik Kim
Pages: 90-99
doi>10.1145/383962.383996
Full text: PDFPDF

We establish a lower bound of &OHgr;(log N/log log N) remote memory references for N-process mutual exclusion algorithms based on reads, writes, or comparison primitives. Our bound improves an earlier bound of &OHgr;(log log N/log ...
expand
A note on group mutual exclusion
Vassos Hadzilacos
Pages: 100-106
doi>10.1145/383962.383997
Full text: PDFPDF

Group mutual exclusion is a natural problem, formulated by Joung in 1998, that generalises the classical mutual exclusion problem. In group mutual exclusion a process requests a “session” before entering its critical section; processes are ...
expand
Nearly optimal perfectly-periodic schedules
Amotz Bar-Noy, Aviv Nisgav, Boaz Patt-Shamir
Pages: 107-116
doi>10.1145/383962.383998
Full text: PDFPDF

We consider the problem of scheduling a set of jobs on a single shared resource using time-multiplexing. A perfectly-periodic schedule is one where resource time is divided into equal size “time-slots” quanta, and each job gets a time ...
expand
The do-all problem in broadcast networks
Bogdan S. Chlebus, Dariusz R. Kowalski, Andrzej Lingas
Pages: 117-127
doi>10.1145/383962.384000
Full text: PDFPDF

The problem of performing t tasks in a distributed system on p failure-prone processors is one of the fundamental problems in distributed computing. If the tasks are similar and independent and the processors communicate by sending messages ...
expand
Competitive concurrent distributed queuing
Maurice Herlihy, Srikanta Tirthapura, Rogert Wattenhofer
Pages: 127-133
doi>10.1145/383962.384001
Full text: PDFPDF

Distributed queuing is a fundamental problem in distributed computing, arising in a variety of applications. The challenge in designing a distributed queuing algorithm is to minimize message traffic and delay. This paper gives a novel competitive analysis ...
expand
Bandwidth constrained placement in a WAN
Arun Venkataramani, Phoebe Weidmann, Mike Dahlin
Pages: 134-143
doi>10.1145/383962.384002
Full text: PDFPDF

In this paper, we examine the bandwidth-constrained placement problem, focusing on trade-offs appropriate for wide area network (WAN) environments. The goal is to place copies of objects at a collection of distributed caches to minimize expected ...
expand
Compressed bloom filters
Michael Mitzenmacher
Pages: 144-150
doi>10.1145/383962.384004
Full text: PDFPDF

A Bloom filter is a simple space-efficient randomized data structure for representing a set in order to support membership queries. Although Bloom filters allow false positives, for many applications the space savings outweigh this draw-back when the ...
expand
A hierarchy of conditions for consensus solvability
Achour Mostefaoui, Sergio Rajsbaum, Michel Raynal, Matthieu Roy
Pages: 151-160
doi>10.1145/383962.384006
Full text: PDFPDF

In a previous paper we introduced the condition-based approach, consisting of identifying sets of input vectors, called conditions, for which there exists an asynchronous protocol solving consensus despite the occurrence of up to f ...
expand
The concurrency hierarchy, and algorithms for unbounded concurrency
Eli Gafni, Michael Merritt, Gadi Taubenfeld
Pages: 161-169
doi>10.1145/383962.384008
Full text: PDFPDF

We study wait-free computation using (read/write) shared memory under a range of assumptions on the arrival pattern of processes. We distinguish first between bounded and infinite arrival patterns, and further distinguish these models by restricting ...
expand
On scalable and efficient distributed failure detectors
Indranil Gupta, Tushar D. Chandra, Germán S. Goldszmidt
Pages: 170-179
doi>10.1145/383962.384010
Full text: PDFPDF

Process groups in distributed applications and services rely on failure detectors to detect process failures completely, and as quickly, accurately, and scalably as possible, even in the face of unreliable message deliveries. In ...
expand
Average probe complexity in quorum systems
Yehuda Hassin, David Peleg
Pages: 180-189
doi>10.1145/383962.384014
Full text: PDFPDF

This paper discusses the probe complexity of randomized algorithms and the deterministic average case probe complexity for some classes of non-dominated coteries, including majority, crumbling walls, tree, wheel and hierarchical quorum systems, and presents ...
expand
Lock-free reference counting
David L. Detlefs, Paul A. Martin, Mark Moir, Guy L. Steele, Jr.
Pages: 190-199
doi>10.1145/383962.384016
Full text: PDFPDF

Assuming the existence of garbage collection makes it easier to design implementations of concurrent data structures. However, this assumption limits their applicability. We present a methodology that, for a significant class of data structures, allows ...
expand
Detecting distributed cycles of garbage in large-scale systems
Fabrice Le Fessant
Pages: 200-209
doi>10.1145/383962.384018
Full text: PDFPDF
The IceCube approach to the reconciliation of divergent replicas
Anne-Marie Kermarrec, Antony Rowstron, Marc Shapiro, Peter Druschel
Pages: 210-218
doi>10.1145/383962.384020
Full text: PDFPDF

We describe a novel approach to log-based reconciliation called IceCube. It is general and is parameterised by application and object semantics. IceCube considers more flexible orderings and is designed to ease the burden of reconciliation on the application ...
expand
Exploitng event stream interpretation in publish-subscribe systems
Yuanyuan Zhao, Rob Strom
Pages: 219-228
doi>10.1145/383962.384023
Full text: PDFPDF

Publish-subscribe messaging middleware typically offers limited and low-level options for quality of service, such as best-effort delivery versus reliable delivery, or ordered versus unordered. We propose a new, high-level approach to specifying quality ...
expand
Replicated condition monitoring
Yongqiang Huang, Hector Garcia-Molina
Pages: 229-237
doi>10.1145/383962.384026
Full text: PDFPDF

A condition monitoring system tracks real-world variables and alerts users when a predefined condition becomes true, e.g., when stock price drops, or when a nuclear reactor over-heats. Replication of monitoring servers can reduce the probability that ...
expand
Computing property-preserving behaviour abstractions from trace reductions: abstraction-based verification of linear-time properties under fairness
Simon St. James, Ulrich Ultes-Nitsche
Pages: 238-245
doi>10.1145/383962.384027
Full text: PDFPDF

Weakly continuation-closed abstractions are known to preserve properties satisfied within fairness, i.e. linear-time temporal properties under an abstract notion of fairness. Being defined on the complete behaviour of a distributed system, weakly continuation-closed ...
expand
Reliability and performance of hierarchical RAID with multiple controllers
Sung Hoon Baek, Bong Wan Kim, Eui Joung Joung, Chong Won Park
Pages: 246-254
doi>10.1145/383962.384036
Full text: PDFPDF

Redundant arrays of inexpensive disks (RAID) offer fault tolerance against disk failures. However a storage system having more disks suffers from less reliability and performance. A RAID architecture tolerating multiple disk failures shows severe performance ...
expand
Distributed multi-broadcast in unknown radio networks
Andrea E. F. Clementi, Angelo Monti, Riccardo Silvestri
Pages: 255-264
doi>10.1145/383962.384040
Full text: PDFPDF

One of the most frequent tasks in multi-hop synchronous radio networks is the multi-broadcast operation: it consists in performing r independent message broadcasts through a network of n nodes. We investigate the case in which messages ...
expand
Analysis of a cone-based distributed topology control algorithm for wireless multi-hop networks
Li Li, Joseph Y. Halpern, Paramvir Bahl, Yi-Min Wang, Rogert Wattenhofer
Pages: 264-273
doi>10.1145/383962.384043
Full text: PDFPDF

The topology of a wireless multi-hop network can be controlled by varying the transmission power at each node. In this paper, we give a detailed analysis of a cone-based distributed topology control algorithm. This algorithm, introduced in [16], does ...
expand
Practical multi-candidate election system
Olivier Baudron, Pierre-Alain Fouque, David Pointcheval, Jacques Stern, Guillaume Poupard
Pages: 274-283
doi>10.1145/383962.384044
Full text: PDFPDF

The aim of electronic voting schemes is to provide a set of protocols that allow voters to cast ballots while a group of authorities collect the votes and output the final tally. In this paper we describe a practical multi-candidate election scheme that ...
expand
An optimally robust hybrid mix network
Markus Jakobsson, Ari Juels
Pages: 284-292
doi>10.1145/383962.384046
Full text: PDFPDF

We present a mix network that achieves efficient integration of public-key and symmetric-key operations. This hybrid mix network is capable of natural processing of arbitrarily long input elements, and is fast in both practical and asymptotic ...
expand
Selective private function evaluation with applications to private statistics
Ran Canetti, Yuval Ishai, Ravi Kumar, Michael K. Reiter, Ronitt Rubinfeld, Rebecca N. Wright
Pages: 293-304
doi>10.1145/383962.384047
Full text: PDFPDF

Motivated by the application of private statistical analysis of large databases, we consider the problem of selective private function evaluation (SPFE). In this problem, a client interacts with one or more servers holding copies of a database ...
expand
Optimal scheduling for disconnected cooperation
Grzegorz Greg Malewicz, Alexander Russell, Alex Shvartsman
Pages: 305-307
doi>10.1145/383962.384048
Full text: PDFPDF

We consider a distributed environment consisting of n processors that need to perform t tasks. We assume that communication is initially unavailable and that processors begin work in isolation. At some unknown point of time an unknown collection ...
expand
Adding networks
Panagiota Fatourou, Maurice Herlihy
Pages: 308-310
doi>10.1145/383962.384049
Full text: PDFPDF

An adding network is a distributed data structure that supports a concurrent, lock-free, low-contention implementation of a fetch&add counter. We give a lower bound showing that adding networks have inherently high latency. We ...
expand
Randomized shared queues
Hyunyoung Lee, Jennifer L. Welch
Pages: 311-313
doi>10.1145/383962.384050
Full text: PDFPDF

This paper presents a specification of a randomized shared queue that can lose some elements or return them out of order (not in FIFO), shows that the specification can be implemented over the probabilistic quorum algorithm of [4, 3], and analyzes the ...
expand
Dynamic input/output automata, a formal model for dynamic systems
Paul C. Attie, Nancy A. Lynch
Pages: 314-316
doi>10.1145/383962.384051
Full text: PDFPDF

We present a mathematical state-machine model, the Dynamic I/O Automaton (DIOA) model, for defining and analyzing dynamic systems of interacting components. The systems we consider are dynamic in two senses: (1) components can be created ...
expand
A framework for semantic reasoning about Byzantine quorum systems
Evelyn Pierce, Lorenzo Alvisi
Pages: 317-319
doi>10.1145/383962.384052
Full text: PDFPDF

We have defined a class of shared variables called TS-variables that includes those implemented by the various Byzantine quorum system constructions of Malkhi and Reiter, and developed a number of definitions and theorems enabling us to reason ...
expand
An efficient communication strategy for ad-hoc mobile networks
Ioannis Chatzigiannakis, Sotiris Nikoletseas, Paul Spirakis
Pages: 320-322
doi>10.1145/383962.384053
Full text: PDFPDF
Correction: practical implementations of non-blocking synchronization primitives
Mark Moir
Page: 323
doi>10.1145/383962.384054
Full text: PDFPDF

I describe a problem with an algorithm in my previous paper “Practical Implementations of Synchronization Primitives”, and its correction.
expand

Powered by The ACM Guide to Computing Literature


The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2016 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us

Useful downloads: Adobe Reader    QuickTime    Windows Media Player    Real Player
Did you know the ACM DL App is now available?
Did you know your Organization can subscribe to the ACM Digital Library?
The ACM Guide to Computing Literature
All Tags
Export Formats
 
 
Save to Binder