skip to main content
research-article

LIFEGUARD: practical repair of persistent route failures

Published: 13 August 2012 Publication History
  • Get Citation Alerts
  • Abstract

    The Internet was designed to always find a route if there is a policy-compliant path. However, in many cases, connectivity is disrupted despite the existence of an underlying valid path. The research community has focused on short-term outages that occur during route convergence. There has been less progress on addressing avoidable long-lasting outages. Our measurements show that long-lasting events contribute significantly to overall unavailability.
    To address these problems, we develop LIFEGUARD, a system for automatic failure localization and remediation. LIFEGUARD uses active measurements and a historical path atlas to locate faults, even in the presence of asymmetric paths and failures. Given the ability to locate faults, we argue that the Internet protocols should allow edge ISPs to steer traffic to them around failures, without requiring the involvement of the network causing the failure. Although the Internet does not explicitly support this functionality today, we show how to approximate it using carefully crafted BGP messages. LIFEGUARD employs a set of techniques to reroute around failures with low impact on working routes. Deploying LIFEGUARD on the Internet, we find that it can effectively route traffic around an AS without causing widespread disruption.

    References

    [1]
    Abilene Internet2 network. http://www.internet2.edu/network/.
    [2]
    D. Andersen, H. Balakrishnan, F. Kaashoek, and R. Morris. Resilient overlay networks. In SOSP, 2001.
    [3]
    R. Austein, S. Bellovin, R. Bush, R. Housley, M. Lepinski, S. Kent, W. Kumari, D. Montgomery, K. Sriram, and S. Weiler. BGPSEC protocol. http://tools.ietf.org/html/draft-ietf-sidr-bgpsec-protocol.
    [4]
    The BGP Instability Report. http://bgpupdates.potaroo.net/instability/bgpupd.html.
    [5]
    BGPMux Transit Portal. http://tp.gtnoise.net/.
    [6]
    C. Bornstein, T. Canfield, and G. Miller. Akarouting: A better way to go. In MIT OpenCourseWare 18.996, 2002.
    [7]
    M. A. Brown, C. Hepner, and A. C. Popescu. Internet captivity and the de-peering menace. In NANOG, 2009.
    [8]
    R. Bush, O. Maennel, M. Roughan, and S. Uhlig. Internet optometry: assessing the broken glasses in Internet reachability. In IMC, 2009.
    [9]
    K. Chen, D. R. Choffnes, R. Potharaju, Y. Chen, F. E. Bustamante, D. Pei, and Y. Zhao. Where the sidewalk ends: Extending the Internet AS graph using traceroutes from P2P users. In CoNEXT, 2009.
    [10]
    L. Colitti. Internet Topology Discovery Using Active Probing. PhD thesis, University di "Roma Tre", 2006.
    [11]
    I. Cunha, R. Teixeira, and C. Diot. Predicting and tracking Internet path changes. In SIGCOMM, 2011.
    [12]
    B. Donnet, P. Raoult, T. Friedman, and M. Crovella. Efficient algorithms for large-scale topology discovery. In SIGMETRICS, 2005.
    [13]
    N. Feamster, D. G. Andersen, H. Balakrishnan, and M. F. Kaashoek. Measuring the effects of internet path faults on reactive routing. In SIGMETRICS, 2003.
    [14]
    A. Feldmann, O. Maennel, Z. M. Mao, A. Berger, and B. Maggs. Locating Internet routing instabilities. In SIGCOMM, 2004.
    [15]
    L. Gao. On inferring autonomous system relationships in the Internet. IEEE/ACM TON, 2001.
    [16]
    K. P. Gummadi, H. V. Madhyastha, S. D. Gribble, H. M. Levy, and D. Wetherall. Improving the reliability of Internet paths with one-hop source routing. In OSDI, 2004.
    [17]
    iPlane. http://iplane.cs.washington.edu.
    [18]
    J. P. John, E. Katz-Bassett, A. Krishnamurthy, T. Anderson, and A. Venkataramani. Consensus routing: The Internet as a distributed system. In NSDI, 2008.
    [19]
    E. Katz-Bassett, H. V. Madhyastha, V. K. Adhikari, C. Scott, J. Sherry, P. van Wesep, A. Krishnamurthy, and T. Anderson. Reverse traceroute. In NSDI, 2010.
    [20]
    E. Katz-Bassett, H. V. Madhyastha, J. P. John, A. Krishnamurthy, D. Wetherall, and T. Anderson. Studying black holes in the Internet with Hubble. In NSDI, 2008.
    [21]
    R. R. Kompella, J. Yates, A. Greenberg, and A. C. Snoeren. Detection and localization of network black holes. In INFOCOM, 2007.
    [22]
    N. Kushman, S. Kandula, and D. Katabi. R-BGP: Staying connected in a connected world. In NSDI, 2007.
    [23]
    C. Labovitz, A. Ahuja, A. Bose, and F. Jahanian. Delayed Internet routing convergence. In SIGCOMM, 2000.
    [24]
    K. K. Lakshminarayanan, M. C. Caesar, M. Rangan, T. Anderson, S. Shenker, and I. Stoica. Achieving convergence-free routing using failure-carrying packets. In SIGCOMM, 2007.
    [25]
    H. Madhyastha, E. Katz-Bassett, T. Anderson, A. Krishnamurthy, and A. Venkataramani. iPlane Nano: Path Prediction for Peer-to-Peer Applications. In NSDI, 2009.
    [26]
    D. Meyer. RouteViews. http://www.routeviews.org.
    [27]
    P. Mohapatra, J. Scudder, D. Ward, R. Bush, and R. Austein. BGP prefix origin validation. http://tools.ietf.org/html/draft-ietf-sidr-pfx-validate.
    [28]
    Outages mailing list. http://isotf.org/mailman/listinfo/outages.
    [29]
    Packet clearing house. http://www.pch.net/home/index.php.
    [30]
    B. Quoitin and O. Bonaventure. A survey of the utilization of the BGP community attribute. Internet draft, draft-quoitin-bgp-comm-survey-00, 2002.
    [31]
    RIPE RIS. http://www.ripe.net/ris/.
    [32]
    C. Scott. LIFEGUARD: Locating Internet Failures Effectively and Generating Usable Alternate Routes Dynamically. Technical report, Univ. of Washington, 2012.
    [33]
    UCLA Internet topology. http://irl.cs.ucla.edu/topology/.
    [34]
    W. Xu and J. Rexford. MIRO: Multi-path Interdomain ROuting. In SIGCOMM, 2006.
    [35]
    J. Yates and Z. Ge. Network Management: Fault Management, Performance Management and Planned Maintenance. Technical report, AT&T Labs, 2009.
    [36]
    M. Zhang, C. Zhang, V. Pai, L. Peterson, and R.Wang. PlanetSeer: Internet path failure monitoring and characterization in wide-area services. In OSDI, 2004.
    [37]
    Y. Zhang, V. Paxson, and S. Shenker. The stationarity of Internet path properties: Routing, loss, and throughput. ACIRI Technical Report, 2000.
    [38]
    Z. Zhang, M. Zhang, A. Greenberg, Y. C. Hu, R. Mahajan, and B. Christian. Optimizing cost and performance in online service provider networks. In NSDI, 2010.
    [39]
    Z. Zhang, Y. Zhang, Y. C. Hu, Z. M. Mao, and R. Bush. iSpy: detecting IP prefix hijacking on my own. In SIGCOMM, 2008.

    Cited By

    View all
    • (2024)Xaminer: An Internet Cross-Layer Resilience Analysis ToolProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36390428:1(1-37)Online publication date: 21-Feb-2024
    • (2022)A first step towards checking BGP routes in the dataplaneProceedings of the ACM SIGCOMM Workshop on Future of Internet Routing & Addressing10.1145/3527974.3545723(50-57)Online publication date: 22-Aug-2022
    • (2022)Automatic Inference of BGP Location CommunitiesProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35080236:1(1-23)Online publication date: 28-Feb-2022
    • Show More Cited By

    Index Terms

    1. LIFEGUARD: practical repair of persistent route failures

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM SIGCOMM Computer Communication Review
      ACM SIGCOMM Computer Communication Review  Volume 42, Issue 4
      Special october issue SIGCOMM '12
      October 2012
      538 pages
      ISSN:0146-4833
      DOI:10.1145/2377677
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 August 2012
      Published in SIGCOMM-CCR Volume 42, Issue 4

      Check for updates

      Author Tags

      1. availability
      2. bgp
      3. internet
      4. measurement
      5. outages
      6. repair
      7. routing

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)11
      • Downloads (Last 6 weeks)3

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Xaminer: An Internet Cross-Layer Resilience Analysis ToolProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36390428:1(1-37)Online publication date: 21-Feb-2024
      • (2022)A first step towards checking BGP routes in the dataplaneProceedings of the ACM SIGCOMM Workshop on Future of Internet Routing & Addressing10.1145/3527974.3545723(50-57)Online publication date: 22-Aug-2022
      • (2022)Automatic Inference of BGP Location CommunitiesProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35080236:1(1-23)Online publication date: 28-Feb-2022
      • (2021)Resilience Evaluation of Multi-Path Routing against Network Attacks and FailuresElectronics10.3390/electronics1011124010:11(1240)Online publication date: 24-May-2021
      • (2021)BGPeek-a-Boo: Active BGP-based Traceback for Amplification DDoS Attacks2021 IEEE European Symposium on Security and Privacy (EuroS&P)10.1109/EuroSP51992.2021.00036(423-439)Online publication date: Sep-2021
      • (2020)Observing BGP route poisoning in the wildProceedings of the SIGCOMM '20 Poster and Demo Sessions10.1145/3405837.3411403(94-96)Online publication date: 10-Aug-2020
      • (2020)TQAIOD: A Backup Technique to Surpassing the Internet Outage2020 International Conference on Computer Science and Software Engineering (CSASE)10.1109/CSASE48920.2020.9142118(255-258)Online publication date: Apr-2020
      • (2017)A Churn for the BetterProceedings of the 13th International Conference on emerging Networking EXperiments and Technologies10.1145/3143361.3143386(81-87)Online publication date: 28-Nov-2017
      • (2017)The Waterfall of LibertyProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security10.1145/3133956.3134075(2037-2052)Online publication date: 30-Oct-2017
      • (2017)Evaluating cloud deployment models based on security in EHR system2017 International Conference on Engineering and Technology (ICET)10.1109/ICEngTechnol.2017.8308142(1-6)Online publication date: Aug-2017
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media