skip to main content
10.1145/502034.502048acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
Article

Resilient overlay networks

Published: 21 October 2001 Publication History

Abstract

A Resilient Overlay Network (RON) is an architecture that allows distributed Internet applications to detect and recover from path outages and periods of degraded performance within several seconds, improving over today's wide-area routing protocols that take at least several minutes to recover. A RON is an application-layer overlay on top of the existing Internet routing substrate. The RON nodes monitor the functioning and quality of the Internet paths among themselves, and use this information to decide whether to route packets directly over the Internet or by way of other RON nodes, optimizing application-specific routing metrics.Results from two sets of measurements of a working RON deployed at sites scattered across the Internet demonstrate the benefits of our architecture. For instance, over a 64-hour sampling period in March 2001 across a twelve-node RON, there were 32 significant outages, each lasting over thirty minutes, over the 132 measured paths. RON's routing mechanism was able to detect, recover, and route around all of them, in less than twenty seconds on average, showing that its methods for fault detection and recovery work well at discovering alternate paths in the Internet. Furthermore, RON was able to improve the loss rate, latency, or throughput perceived by data transfers; for example, about 5% of the transfers doubled their TCP throughput and 5% of our transfers saw their loss probability reduced by 0.05. We found that forwarding packets via at most one intermediate RON node is sufficient to overcome faults and improve performance in most cases. These improvements, particularly in the area of fault detection and recovery, demonstrate the benefits of moving some of the control over routing into the hands of end-systems.

References

[1]
ANDERSEN, D. G. Resilient Overlay Networks. Master's thesis, Massachusetts Institute of Technology, May 2001.]]
[2]
BALAKRISHNAN, H., SESHAN, S., STEMM, M., AND KATZ, R. Analyzing Stability in Wide-Area Network Performance. In Proc. ACM SIGMETRICS (Seattle, WA, June 1997), pp. 2-12.]]
[3]
CHANDRA, B., DAHLIN, M., GAG, L., AND NAYATE, A. End-to-end WAN Service Availability. In Proc. 3rd USITS (San Francisco, CA, 2001), pp. 97-108.]]
[4]
CLARK, D. Policy Routing in Internet Protocols. Interact Engineering Task Force, May 1989. RFC 1102.]]
[5]
COLLINS, A. The Detour Framework for Packet Rerouting. Master's thesis, University of Washington, Oct. 1998.]]
[6]
ERIKSSON, H. Mbone: The Multicast Backbone. Communications of the ACM 37, 8 (1994), 54-60.]]
[7]
FLOYD, S., HANDLEY, M., PADHYE, J., AND WIDMER, J. Equation-Based Congestion Control for Unicast Applications. In Prec. ACM SIGCOMM (Stockholm, Sweden, Sept. 2000), pp. 43-54.]]
[8]
GOYAL, M., GUERIN, R., AND RAJAN, R. Predicting TCP Throughput From Non-invasive Data. (Unpublished, http : //www. seas. upenn, edu : 8080/~guerin/publ icat ions/TCP_model. pdf).]]
[9]
GUARDINI, I., FASANO, P., AND G1RARDI, G. IPv6 Operational Experience within the 6bone. In Prec. lnternet Society (INET) Conf. (Yokohama, Japan, July 2000). http://www.5.see.org/ inet2OOO/cdproceedings/le/le_l .htm.]]
[10]
HAGENS, R., HALL, N., AND ROSE, M. Use of the Internet as a Subnetwork for Experimentation with the OSI Network Layer. Interact Engineering Task Force, Feb 1989. RFC 1070.]]
[11]
KHANNA, A., AND ZINKY, J. The Revised ARPANET Routing Metric. In Prec. ACMSIGCOMM (Austin, TX, Sept. 1989), pp. 45-56.]]
[12]
LABOVITZ, C., AHUJA, A., BOSE, A., AND JAHANIAN, F. Delayed Interact Routing Convergence. In Prec. ACM SIGCOMM (Stockholm, Sweden, September 2000), pp. 175-I 87.]]
[13]
LABOVITZ, C., MALAN, R., AND JAHANIAN, F. Interact Routing Instability. IEEE/ACM Transactions on Networking 6, 5 (1998), 515-526.]]
[14]
MCCANNE, S., AND JACOBSON, W. The BSD Packet Filter: A New Architecture for User-Level Packet Capture. In Prec. Winter '93 USENIX Conference (San Diego, CA, Jan. 1993), pp. 259-269.]]
[15]
The North American Network Operators' Group mailing list archive. http : //www. cctec, com/maillists/nanog/.]]
[16]
PADHYE, J., FIROIU, V., TOWSLEY, D., AND KUROSE, J. Modeling TCP Throughput: A Simple Model and its Empirical Validation. In Prec. ACM SIGCOMM (Vancouver, Canada, September 1998), pp. 303-323.]]
[17]
PARTRIDGE, C. Using the Flow Label Field in 1Pv6. Internet Engineering Task Force, 1995. RFC 1809.]]
[18]
PAXSON, V. End-to-End Routing Behavior in the Internet. In Prec. ACM SIGCOMM '96 (Stanford, CA, Aug. 1996), pp. 25-38.]]
[19]
PAXSON, V. End-to-End Interact Packet Dynamics. In Prec. ACM SIGCOMM (Cannes, France, Sept. 1997), pp. 139-152.]]
[20]
POSTEL, J. B. Transmission Control Protocol. Interact Engineering Task Force, September 1981. RFC 793.]]
[21]
REKHTER, Y., AND LI, T. A Border Gateway Protocol 4 (BGP-4). Interact Engineering Task Force, 1995. RFC 1771.]]
[22]
SAVAGE, S., ANDERSON, T., ET AL. Detour: A Case for Informed Interact Routing and Transport. IEEEMicro 19, 1 (Jan. 1999), 50-59.]]
[23]
SAVAGE, S., COLLINS, A., HOFFMAN, E., SNELL, J., AND ANDERSON, T. The End-to-End Effects of lnternet Path Selection. In Proc. ACM SIGCOMM (Boston, MA, 1999), pp. 289-299.]]
[24]
SESHAN, S., STEMM, M., AND KATZ, R. H. SPAND: Shared Passive Network Performance Discovery. In Proc. 1st USITS (Monterey, CA, December 1997), pp. 135-146.]]
[25]
SHAIKH, A., KALAMPOUKAS, L., VARMA, A., AND DUBE, R. Routing Stability in Congested Networks: Experimentation and Analysis. In Proc. ACM SIGCOMM (Stockholm, Sweden, 2000), pp. 163-174.]]
[26]
TOUCH, J., AND HOTZ, S. The X-Bone. In Proc. 3rd Global Internet Mini-Conference (Sydney, Australia, Nov. 1998), pp. 75-83.]]

Cited By

View all
  • (2024)ROND: Rethinking Overlay Network Design with Underlay Network AwarenessProceedings of the ACM on Networking10.1145/36562982:CoNEXT2(1-22)Online publication date: 13-Jun-2024
  • (2024)Xaminer: An Internet Cross-Layer Resilience Analysis ToolProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36390428:1(1-37)Online publication date: 21-Feb-2024
  • (2024)HPETC: History Priority Enhanced Tensor Completion for Network Distance MeasurementIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.327430535:6(1012-1028)Online publication date: Jun-2024
  • Show More Cited By

Recommendations

Reviews

Alexandru Petrescu

In a world where peer-to-peer networks flourish, and where Internet path congestion and oscillations are daily events, the need for new efficient routing mechanisms is more and more pressing. Andersen, Balakrishnan, Kaashoek, and Morris present resilient overlay networks (RONs) as groups of nodes, distributed over large areas, whose users agree to engage in cooperative networking, and whose paths are formed over actual Internet routing paths. The main characteristics of a RON are that the number of participating nodes is small (up to 50), and that communication between sites follows paths that circumvent temporary failures of the actual Internet paths. This is achieved by continuous probing of the direct links between sites, and by employing a new link-state routing protocol (different than open shortest path first (OSPF) and border gateway protocol (BGP)). Simply put, when the underlying segments of a direct path between two RON nodes fail (BGP failures are often cited), the overlay network redirects the entire path toward an intermediary RON node, apparently lengthening the entire path, but still offering connectivity. RON nodes have addresses different than Internet protocol version 4 (IPv4) or Internet protocol version 6 (IPv6). Actual experiments, performed by the authors, included a 16-node deployment in the USA and Europe. As expected, another distinguishing trait of RON networking is the ability of applications at the uppermost layer to make routing decisions (traditionally, routing and application layers are separated, with the inconvenience of application interruption when routing fails). The authors pay detailed attention to motivating the overlaying routing approach. Not only do they describe an actual implementation, including simulation, test deployment, and performance measurements, but they also address, in a separate discussion section, tough questions on potential violation of the deployed Internet policy routing (presumably due to tunneling), limited RON size and scalability, and network address translation (NAT) traversal. Finally, one aspect whose treatment seems to be overlooked is one that lies at the very heart of a routing protocol: loop avoidance. While a description of path lookup and building by using link-state exchanges is given, proofs (at least conceptual) of loop avoidance are not mentioned at all. The paper provides a comprehensive bibliographical list. Many of the references are correctly used as explanations of the current BGP routing instabilities, as well as of how these influence transmission control protocol (TCP) applications; thus, they offer perfect motivations for the need to overlay network routing. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SOSP '01: Proceedings of the eighteenth ACM symposium on Operating systems principles
October 2001
254 pages
ISBN:1581133898
DOI:10.1145/502034
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2001

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SOSP01
Sponsor:
SOSP01: 18th Symposium on Operating System Principles
October 21 - 24, 2001
Alberta, Banff, Canada

Acceptance Rates

SOSP '01 Paper Acceptance Rate 17 of 85 submissions, 20%;
Overall Acceptance Rate 174 of 961 submissions, 18%

Upcoming Conference

SOSP '25
ACM SIGOPS 31st Symposium on Operating Systems Principles
October 13 - 16, 2025
Seoul , Republic of Korea

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)119
  • Downloads (Last 6 weeks)16
Reflects downloads up to 28 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)ROND: Rethinking Overlay Network Design with Underlay Network AwarenessProceedings of the ACM on Networking10.1145/36562982:CoNEXT2(1-22)Online publication date: 13-Jun-2024
  • (2024)Xaminer: An Internet Cross-Layer Resilience Analysis ToolProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36390428:1(1-37)Online publication date: 21-Feb-2024
  • (2024)HPETC: History Priority Enhanced Tensor Completion for Network Distance MeasurementIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.327430535:6(1012-1028)Online publication date: Jun-2024
  • (2024)Robust Routing Made Easy: Reinforcing Networks Against Non-Benign FaultsIEEE/ACM Transactions on Networking10.1109/TNET.2023.328318432:1(283-297)Online publication date: Feb-2024
  • (2024)PolyNet: Cost- and Performance-Aware Multi-Criteria Link Selection in Software-Defined Edge-to-Cloud Overlay Networks2024 IEEE 10th International Conference on Network Softwarization (NetSoft)10.1109/NetSoft60951.2024.10588920(127-135)Online publication date: 24-Jun-2024
  • (2023)A Novel Multipath Transmission Scheme for Information-Centric NetworkingFuture Internet10.3390/fi1502008015:2(80)Online publication date: 17-Feb-2023
  • (2023)DIT: A Dynamic Bandwidth Isolated Transmission System for Large-Scale Inter-DC Wireless Communication NetworkWireless Communications and Mobile Computing10.1155/2023/72094142023(1-17)Online publication date: 20-Feb-2023
  • (2023)Prevention of Security Against Attacks in Order to Balance the Syatem Using Optimisation2023 International Conference on Power Energy, Environment & Intelligent Control (PEEIC)10.1109/PEEIC59336.2023.10451286(1077-1082)Online publication date: 19-Dec-2023
  • (2023)Fast Reroute Algorithms for Satellite Network With Segment RoutingIEEE Access10.1109/ACCESS.2023.333598811(133509-133520)Online publication date: 2023
  • (2022)It takes two to tangoProceedings of the 21st ACM Workshop on Hot Topics in Networks10.1145/3563766.3564107(174-180)Online publication date: 14-Nov-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media