Abstract
Recently, researchers have started exploring the design of route protection schemes that ensure networks can sustain traffic demand without congestion under failures. Existing approaches focus on ensuring worst-case performance over simultaneous f-failure scenarios is acceptable. Unfortunately, even a single bad scenario may render the schemes unable to protect against any f-failure scenario. In this paper, we present Lancet, a system designed to handle most failures when not all can be tackled. Lancet comprises three components: (i) an algorithm to analyze which failure scenarios the network can intrinsically handle which provides a benchmark for any protection routing scheme, and guides the design of new schemes; (ii) an approach to efficiently design a protection schemes for more general failure sets than all f-failure scenarios; and (iii) techniques to determine which of combinatorially many scenarios to design for. Our evaluations with real topologies and validations on an emulation testbed show that Lancet outperforms a worst-case approach by protecting against many more scenarios, and can even match the scenarios that can be handled by optimal network response.
- Topology zoo. http://www.topology-zoo.org/.Google Scholar
- Abilene traffic matrices. http://www.cs.utexas.edu/~yzhang/research/AbileneTM/, 2014.Google Scholar
- Inside AT&T's grand plans for SDN. https://www.networkworld.com/article/2866439/sdn/inside-atts-grand-plans-for-sdn.html, 2015.Google Scholar
- Cisco WAN automation engine (WAE), 2016. http://www.cisco.com/c/en/us/products/routers/wan-automation-engine/index.html.Google Scholar
- Building Express Backbone: Facebook's new long-haul network. https://code.facebook.com/posts/1782709872057497/building-express-backbone-facebook-s-new-long-haul-network/, 2017.Google Scholar
- Gustavo Angulo, Shabbir Ahmed, Santanu~S. Dey, and Volker Kaibel. Forbidden vertices. Mathematics of Operations Research, 40 (2): 350--360, 2015.Google Scholar
Cross Ref
- David Applegate and Edith Cohen. Making intra-domain routing robust to changing and uncertain traffic demands: Understanding fundamental tradeoffs. In Proceedings of ACM SIGCOMM, pages 313--324, 2003.Google Scholar
Digital Library
- David Applegate, Lee Breslau, and Edith Cohen. Coping with network failures: Routing strategies for optimal demand oblivious restoration. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '04/Performance '04, pages 270--281, 2004.Google Scholar
Digital Library
- Ajay~Kumar Bangla, Alireza Ghaffarkhah, Ben Preskill, Bikash Koley, Christoph Albrecht, Emilie Danna, Joe Jiang, and Xiaoxue Zhao. Capacity planning for the google backbone network. In ISMP 2015 (International Symposium on Mathematical Programming), 2015.Google Scholar
- Randeep~S. Bhatia, Murali Kodialam, T. V. Lakshman, and Sudipta Sengupta. Bandwidth guaranteed routing with fast restoration against link and node failures. IEEE/ACM Transactions on Networking, 16 (6): 1321--1330, December 2008.Google Scholar
Digital Library
- Martin Birk, Gagan Choudhury, Bruce Cortez, Alvin Goddard, Narayan Padi, Aswatnarayan Raghuram, Kathy Tse, Simon Tse, Andrew Wallace, and Kang Xi. Evolving to an SDN-enabled isp backbone: key technologies and applications. IEEE Communications Magazine, 54 (10): 129--135, 2016.Google Scholar
Digital Library
- Jeremy Bogle, Nikhil Bhatia, Manya Ghobadi, Ishai Menache, Nikolaj Bjorner, Asaf Valadarsky, and Michael Schapira. Teavar: Striking the right utilization-availability balance in wan traffic engineering. In Proceedings of ACM SIGCOMM, 2019. (to appear).Google Scholar
- Michael Borokhovich, Yvonne-Anne Pignolet, Stefan Schmid, and Gilles Tredan. Load-optimal local fast rerouting for dense networks. IEEE/ACM Transactions on Networking, 26 (6): 2583--2597, 2018.Google Scholar
Digital Library
- Yiyang Chang, Sanjay Rao, and Mohit Tawarmalani. Robust validation of network designs under uncertain demands and failures. In 14$^th$ USENIX Symposium on Networked Systems Design and Implementation (NSDI), pages 347--362, 2017.Google Scholar
- Michele Conforti, Gerard Cornuejols, and Giacomo Zambelli. Integer Programming. Springer Publishing Company, Incorporated, 2014.Google Scholar
- Klaus-Tycho Foerster, Yvonne-Anne Pignolet, Stefan Schmid, and Gilles Tredan. Casa: congestion and stretch aware static fast rerouting. In Proceedings of IEEE INFOCOM, pages 469--477, 2019.Google Scholar
Cross Ref
- Bernard Fortz and Mikkel Thorup. Robust optimization of OSPF/IS-IS weights. In Proceedings of International Network Optimization Conference, pages 225--230, 2003.Google Scholar
- Monia Ghobadi and Ratul Mahajan. Optical layer failures in a large backbone. In Proceedings of the 2016 Internet Measurement Conference, pages 461--467, 2016.Google Scholar
Digital Library
- Phillipa Gill, Navendu Jain, and Nachiappan Nagappan. Understanding network failures in data centers: Measurement, analysis, and implications. In Proceedings of ACM SIGCOMM, pages 350--361, 2011.Google Scholar
Digital Library
- Ramesh Govindan, Ina Minei, Mahesh Kallahalla, Bikash Koley, and Amin Vahdat. Evolve or die: High-availability design principles drawn from googles network infrastructure. In Proceedings of ACM SIGCOMM, pages 58--72, 2016.Google Scholar
Digital Library
- Fang Hao, Murali Kodialam, and T. V. Lakshman. Optimizing restoration with segment routing. In Proceedings of IEEE INFOCOM, pages 1--9, April 2016.Google Scholar
Digital Library
- Chi-Yao Hong, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Vijay Gill, Mohan Nanduri, and Roger Wattenhofer. Achieving high utilization with software-driven wan. In Proceedings of ACM SIGCOMM, pages 15--26, 2013.Google Scholar
Digital Library
- Chi-Yao Hong, Subhasree Mandal, Mohammad Al-Fares, Min Zhu, Richard Alimi, Kondapa~Naidu B., Chandan Bhagat, Sourabh Jain, Jay Kaimal, Shiyu Liang, Kirill Mendelev, Steve Padgett, Faro Rabe, Saikat Ray, Malveeka Tewari, Matt Tierney, Monika Zahn, Jonathan Zolla, Joon Ong, and Amin Vahdat. B4 and after: Managing hierarchy, partitioning, and asymmetry for availability and scale in google's software-defined wan. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, pages 74--87, 2018.Google Scholar
Digital Library
- Gurobi~Optimization Inc. Gurobi optimizer reference manual, 2016. http://www.gurobi.com.Google Scholar
- le, Stuart, and Vahdat]b4Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh, Subbaiah Venkata, Jim Wanderer, Junlan Zhou, Min Zhu, Jon Zolla, Urs Hölzle, Stephen Stuart, and Amin Vahdat. B4: Experience with a globally-deployed software defined wan. In Proceedings of ACM SIGCOMM, pages 3--14, 2013.Google Scholar
- semi_oblivious_nsdi18Praveen Kumar, Yang Yuan, Chris Yu, Nate Foster, Robert Kleinberg, Petr Lapukhov, Chiun~Lin Lim, and Robert Soulé. Semi-oblivious traffic engineering: The road not taken. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18), pages 157--170, 2018.Google Scholar
- n, and Zhang]TONProtection11Kin-Wah Kwong, Lixin Gao, Roch Guérin, and Zhi-Li Zhang. On the feasibility and efficacy of protection routing in ip networks. IEEE/ACM Transactions on Networking, 19 (5): 1543--1556, October 2011.Google Scholar
Digital Library
- Karthik Lakshminarayanan, Matthew Caesar, Murali Rangan, Tom Anderson, Scott Shenker, and Ion Stoica. Achieving convergence-free routing using failure-carrying packets. In Proceedings of ACM SIGCOMM, pages 241--252, 2007.Google Scholar
Digital Library
- Hongqiang~Harry Liu, Srikanth Kandula, Ratul Mahajan, Ming Zhang, and David Gelernter. Traffic engineering with forward fault correction. In Proceedings of ACM SIGCOMM, pages 527--538, 2014.Google Scholar
Digital Library
- Athina Markopoulou, Gianluca Iannaccone, Supratik Bhattacharyya, Chen-Nee Chuah, Yashar Ganjali, and Christophe Diot. Characterization of failures in an operational ip backbone network. IEEE/ACM Trans. Netw., 16 (4): 749--762, 2008.Google Scholar
Digital Library
- P. Pan, G. Swallow, and A. Atlas. Fast Reroute Extensions to RSVP-TE for LSP Tunnels. RFC 4090, May 2005.Google Scholar
- and Medhi(2004)]MedhiBookMichal Pióro and Deepankar Medhi. Routing, Flow, and Capacity Design in Communication and Computer Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2004. ISBN 0125571895.Google Scholar
- Rahul Potharaju and Navendu Jain. When the network crumbles: An empirical study of cloud network failures and their impact on services. In Proceedings of the 4th Annual Symposium on Cloud Computing, SOCC '13, pages 15:1--15:17, 2013.Google Scholar
Digital Library
- M. Shand and S. Bryant. IP Fast Reroute Framework. RFC 5714, January 2010.Google Scholar
- R. K. Sinha, F. Ergun, K. N. Oikonomou, and K. K. Ramakrishnan. Network design for tolerating multiple link failures using Fast Re-route (FRR). In 2014 10th International Conference on the Design of Reliable Communication Networks (DRCN), pages 1--8, April 2014.Google Scholar
Cross Ref
- Martin Suchara, Dahai Xu, Robert Doverspike, David Johnson, and Jennifer Rexford. Network architecture for joint failure recovery and traffic engineering. SIGMETRICS Perform. Eval. Rev., 39 (1): 97--108, 2011.Google Scholar
Digital Library
- Daniel Turner, Kirill Levchenko, Alex~C. Snoeren, and Stefan Savage. California fault lines: Understanding the causes and impact of network failures. In Proceedings of the ACM SIGCOMM 2010 Conference, pages 315--326, 2010.Google Scholar
Digital Library
- Hao Wang, Haiyong Xie, Lili Qiu, Yang~Richard Yang, Yin Zhang, and Albert Greenberg. COPE: Traffic engineering in dynamic networks. In Proceedings of ACM SIGCOMM, pages 99--110, 2006.Google Scholar
Digital Library
- Ye~Wang, Hao Wang, Ajay Mahimkar, Richard Alimi, Yin Zhang, Lili Qiu, and Yang~Richard Yang. R3: Resilient routing reconfiguration. In Proceedings of ACM SIGCOMM, pages 291--302, 2010.Google Scholar
- R.Kevin Wood. Deterministic network interdiction. Mathematical and Computer Modelling, 17 (2): 1--18, January 1993.Google Scholar
Digital Library
- B. Yang, J. Liu, S. Shenker, J. Li, and K. Zheng. Keep forwarding: Towards k-link failure resilient routing. In Proceedings of IEEE INFOCOM, pages 1617--1625, April 2014.Google Scholar
Cross Ref
- Zhang, Ge, Kurose, Liu, and Towsley]TrafficMultiMatrixC. Zhang, Zihui Ge, J. Kurose, Y. Liu, and D. Towsley. Optimal routing with multiple traffic matrices tradeoff between average and worst case performance. In Network Protocols, 2005. ICNP 2005. 13th IEEE International Conference on, 2005a.Google Scholar
- Zhang, Ge, Greenberg, and Roughan]gravity_modelYin Zhang, Zihui Ge, Albert Greenberg, and Matthew Roughan. Network anomography. In Proceedings of the 5th ACM SIGCOMM Conference on Internet Measurement, pages 30--30, 2005b.Google Scholar
Digital Library
- Jiaqi Zheng, Hong Xu, Xiaojun Zhu, Guihai Chen, and Yanhui Geng. We've got you covered: Failure recovery with backup tunnels in traffic engineering. In 2016 IEEE 24th International Conference on Network Protocols (ICNP), pages 1--10, 2016.Google Scholar
Cross Ref
Index Terms
Lancet: Better Network Resilience by Designing for Pruned Failure Sets
Recommendations
Lancet: Better network resilience by designing for pruned
Recently, researchers have started exploring the design of route protection schemes that ensure networks can sustain traffic demand without congestion under failures. Existing approaches focus on ensuring worst-case performance over simultaneous f -...
Lancet: Better network resilience by designing for pruned failure sets
SIGMETRICS '20: Abstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer SystemsRecently, researchers have started exploring the design of route protection schemes that ensure networks can sustain traffic demand without congestion under failures. Existing approaches focus on ensuring worst-case performance over simultaneous f-...
Survivable Network Capacity Allocation and Topology Design Using Multi-period Network Augmentation
This paper examines the effect of incorporating multi-period network augmentation into the survivable network design process. The framework presented can handle variable restorability requirements, potential economies of scale, and technology shifts. By ...






Comments