skip to main content
10.1145/502034.502048acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
Article

Resilient overlay networks

Published: 21 October 2001 Publication History
  • Get Citation Alerts
  • Abstract

    A Resilient Overlay Network (RON) is an architecture that allows distributed Internet applications to detect and recover from path outages and periods of degraded performance within several seconds, improving over today's wide-area routing protocols that take at least several minutes to recover. A RON is an application-layer overlay on top of the existing Internet routing substrate. The RON nodes monitor the functioning and quality of the Internet paths among themselves, and use this information to decide whether to route packets directly over the Internet or by way of other RON nodes, optimizing application-specific routing metrics.Results from two sets of measurements of a working RON deployed at sites scattered across the Internet demonstrate the benefits of our architecture. For instance, over a 64-hour sampling period in March 2001 across a twelve-node RON, there were 32 significant outages, each lasting over thirty minutes, over the 132 measured paths. RON's routing mechanism was able to detect, recover, and route around all of them, in less than twenty seconds on average, showing that its methods for fault detection and recovery work well at discovering alternate paths in the Internet. Furthermore, RON was able to improve the loss rate, latency, or throughput perceived by data transfers; for example, about 5% of the transfers doubled their TCP throughput and 5% of our transfers saw their loss probability reduced by 0.05. We found that forwarding packets via at most one intermediate RON node is sufficient to overcome faults and improve performance in most cases. These improvements, particularly in the area of fault detection and recovery, demonstrate the benefits of moving some of the control over routing into the hands of end-systems.

    References

    [1]
    ANDERSEN, D. G. Resilient Overlay Networks. Master's thesis, Massachusetts Institute of Technology, May 2001.]]
    [2]
    BALAKRISHNAN, H., SESHAN, S., STEMM, M., AND KATZ, R. Analyzing Stability in Wide-Area Network Performance. In Proc. ACM SIGMETRICS (Seattle, WA, June 1997), pp. 2-12.]]
    [3]
    CHANDRA, B., DAHLIN, M., GAG, L., AND NAYATE, A. End-to-end WAN Service Availability. In Proc. 3rd USITS (San Francisco, CA, 2001), pp. 97-108.]]
    [4]
    CLARK, D. Policy Routing in Internet Protocols. Interact Engineering Task Force, May 1989. RFC 1102.]]
    [5]
    COLLINS, A. The Detour Framework for Packet Rerouting. Master's thesis, University of Washington, Oct. 1998.]]
    [6]
    ERIKSSON, H. Mbone: The Multicast Backbone. Communications of the ACM 37, 8 (1994), 54-60.]]
    [7]
    FLOYD, S., HANDLEY, M., PADHYE, J., AND WIDMER, J. Equation-Based Congestion Control for Unicast Applications. In Prec. ACM SIGCOMM (Stockholm, Sweden, Sept. 2000), pp. 43-54.]]
    [8]
    GOYAL, M., GUERIN, R., AND RAJAN, R. Predicting TCP Throughput From Non-invasive Data. (Unpublished, http : //www. seas. upenn, edu : 8080/~guerin/publ icat ions/TCP_model. pdf).]]
    [9]
    GUARDINI, I., FASANO, P., AND G1RARDI, G. IPv6 Operational Experience within the 6bone. In Prec. lnternet Society (INET) Conf. (Yokohama, Japan, July 2000). http://www.5.see.org/ inet2OOO/cdproceedings/le/le_l .htm.]]
    [10]
    HAGENS, R., HALL, N., AND ROSE, M. Use of the Internet as a Subnetwork for Experimentation with the OSI Network Layer. Interact Engineering Task Force, Feb 1989. RFC 1070.]]
    [11]
    KHANNA, A., AND ZINKY, J. The Revised ARPANET Routing Metric. In Prec. ACMSIGCOMM (Austin, TX, Sept. 1989), pp. 45-56.]]
    [12]
    LABOVITZ, C., AHUJA, A., BOSE, A., AND JAHANIAN, F. Delayed Interact Routing Convergence. In Prec. ACM SIGCOMM (Stockholm, Sweden, September 2000), pp. 175-I 87.]]
    [13]
    LABOVITZ, C., MALAN, R., AND JAHANIAN, F. Interact Routing Instability. IEEE/ACM Transactions on Networking 6, 5 (1998), 515-526.]]
    [14]
    MCCANNE, S., AND JACOBSON, W. The BSD Packet Filter: A New Architecture for User-Level Packet Capture. In Prec. Winter '93 USENIX Conference (San Diego, CA, Jan. 1993), pp. 259-269.]]
    [15]
    The North American Network Operators' Group mailing list archive. http : //www. cctec, com/maillists/nanog/.]]
    [16]
    PADHYE, J., FIROIU, V., TOWSLEY, D., AND KUROSE, J. Modeling TCP Throughput: A Simple Model and its Empirical Validation. In Prec. ACM SIGCOMM (Vancouver, Canada, September 1998), pp. 303-323.]]
    [17]
    PARTRIDGE, C. Using the Flow Label Field in 1Pv6. Internet Engineering Task Force, 1995. RFC 1809.]]
    [18]
    PAXSON, V. End-to-End Routing Behavior in the Internet. In Prec. ACM SIGCOMM '96 (Stanford, CA, Aug. 1996), pp. 25-38.]]
    [19]
    PAXSON, V. End-to-End Interact Packet Dynamics. In Prec. ACM SIGCOMM (Cannes, France, Sept. 1997), pp. 139-152.]]
    [20]
    POSTEL, J. B. Transmission Control Protocol. Interact Engineering Task Force, September 1981. RFC 793.]]
    [21]
    REKHTER, Y., AND LI, T. A Border Gateway Protocol 4 (BGP-4). Interact Engineering Task Force, 1995. RFC 1771.]]
    [22]
    SAVAGE, S., ANDERSON, T., ET AL. Detour: A Case for Informed Interact Routing and Transport. IEEEMicro 19, 1 (Jan. 1999), 50-59.]]
    [23]
    SAVAGE, S., COLLINS, A., HOFFMAN, E., SNELL, J., AND ANDERSON, T. The End-to-End Effects of lnternet Path Selection. In Proc. ACM SIGCOMM (Boston, MA, 1999), pp. 289-299.]]
    [24]
    SESHAN, S., STEMM, M., AND KATZ, R. H. SPAND: Shared Passive Network Performance Discovery. In Proc. 1st USITS (Monterey, CA, December 1997), pp. 135-146.]]
    [25]
    SHAIKH, A., KALAMPOUKAS, L., VARMA, A., AND DUBE, R. Routing Stability in Congested Networks: Experimentation and Analysis. In Proc. ACM SIGCOMM (Stockholm, Sweden, 2000), pp. 163-174.]]
    [26]
    TOUCH, J., AND HOTZ, S. The X-Bone. In Proc. 3rd Global Internet Mini-Conference (Sydney, Australia, Nov. 1998), pp. 75-83.]]

    Cited By

    View all
    • (2024)Xaminer: An Internet Cross-Layer Resilience Analysis ToolProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36390428:1(1-37)Online publication date: 21-Feb-2024
    • (2024)HPETC: History Priority Enhanced Tensor Completion for Network Distance MeasurementIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.327430535:6(1012-1028)Online publication date: Jun-2024
    • (2024)Robust Routing Made Easy: Reinforcing Networks Against Non-Benign FaultsIEEE/ACM Transactions on Networking10.1109/TNET.2023.328318432:1(283-297)Online publication date: Feb-2024
    • Show More Cited By

    Recommendations

    Reviews

    Alexandru Petrescu

    In a world where peer-to-peer networks flourish, and where Internet path congestion and oscillations are daily events, the need for new efficient routing mechanisms is more and more pressing. Andersen, Balakrishnan, Kaashoek, and Morris present resilient overlay networks (RONs) as groups of nodes, distributed over large areas, whose users agree to engage in cooperative networking, and whose paths are formed over actual Internet routing paths. The main characteristics of a RON are that the number of participating nodes is small (up to 50), and that communication between sites follows paths that circumvent temporary failures of the actual Internet paths. This is achieved by continuous probing of the direct links between sites, and by employing a new link-state routing protocol (different than open shortest path first (OSPF) and border gateway protocol (BGP)). Simply put, when the underlying segments of a direct path between two RON nodes fail (BGP failures are often cited), the overlay network redirects the entire path toward an intermediary RON node, apparently lengthening the entire path, but still offering connectivity. RON nodes have addresses different than Internet protocol version 4 (IPv4) or Internet protocol version 6 (IPv6). Actual experiments, performed by the authors, included a 16-node deployment in the USA and Europe. As expected, another distinguishing trait of RON networking is the ability of applications at the uppermost layer to make routing decisions (traditionally, routing and application layers are separated, with the inconvenience of application interruption when routing fails). The authors pay detailed attention to motivating the overlaying routing approach. Not only do they describe an actual implementation, including simulation, test deployment, and performance measurements, but they also address, in a separate discussion section, tough questions on potential violation of the deployed Internet policy routing (presumably due to tunneling), limited RON size and scalability, and network address translation (NAT) traversal. Finally, one aspect whose treatment seems to be overlooked is one that lies at the very heart of a routing protocol: loop avoidance. While a description of path lookup and building by using link-state exchanges is given, proofs (at least conceptual) of loop avoidance are not mentioned at all. The paper provides a comprehensive bibliographical list. Many of the references are correctly used as explanations of the current BGP routing instabilities, as well as of how these influence transmission control protocol (TCP) applications; thus, they offer perfect motivations for the need to overlay network routing. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SOSP '01: Proceedings of the eighteenth ACM symposium on Operating systems principles
    October 2001
    254 pages
    ISBN:1581133898
    DOI:10.1145/502034
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 October 2001

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    SOSP01
    Sponsor:
    SOSP01: 18th Symposium on Operating System Principles
    October 21 - 24, 2001
    Alberta, Banff, Canada

    Acceptance Rates

    SOSP '01 Paper Acceptance Rate 17 of 85 submissions, 20%;
    Overall Acceptance Rate 131 of 716 submissions, 18%

    Upcoming Conference

    SOSP '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)119
    • Downloads (Last 6 weeks)18

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Xaminer: An Internet Cross-Layer Resilience Analysis ToolProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36390428:1(1-37)Online publication date: 21-Feb-2024
    • (2024)HPETC: History Priority Enhanced Tensor Completion for Network Distance MeasurementIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.327430535:6(1012-1028)Online publication date: Jun-2024
    • (2024)Robust Routing Made Easy: Reinforcing Networks Against Non-Benign FaultsIEEE/ACM Transactions on Networking10.1109/TNET.2023.328318432:1(283-297)Online publication date: Feb-2024
    • (2023)A Novel Multipath Transmission Scheme for Information-Centric NetworkingFuture Internet10.3390/fi1502008015:2(80)Online publication date: 17-Feb-2023
    • (2023)DIT: A Dynamic Bandwidth Isolated Transmission System for Large-Scale Inter-DC Wireless Communication NetworkWireless Communications and Mobile Computing10.1155/2023/72094142023(1-17)Online publication date: 20-Feb-2023
    • (2023)Prevention of Security Against Attacks in Order to Balance the Syatem Using Optimisation2023 International Conference on Power Energy, Environment & Intelligent Control (PEEIC)10.1109/PEEIC59336.2023.10451286(1077-1082)Online publication date: 19-Dec-2023
    • (2023)Fast Reroute Algorithms for Satellite Network With Segment RoutingIEEE Access10.1109/ACCESS.2023.333598811(133509-133520)Online publication date: 2023
    • (2022)It takes two to tangoProceedings of the 21st ACM Workshop on Hot Topics in Networks10.1145/3563766.3564107(174-180)Online publication date: 14-Nov-2022
    • (2022)Curvature-based Analysis of Network Connectivity in Private Backbone InfrastructuresProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35080256:1(1-32)Online publication date: 28-Feb-2022
    • (2022)Voyager: Revisiting Available Bandwidth Estimation With a New Class of Methods—Decreasing- Chirp-Train MethodsIEEE/ACM Transactions on Networking10.1109/TNET.2022.315217530:4(1717-1732)Online publication date: Aug-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media