Abstract
The control plane of most computer networks runs distributed routing protocols that determine if and how traffic is forwarded. Errors in the configuration of network control planes frequently knock down critical online services, leading to economic damage for service providers and significant hardship for users. Validation via ahead-of-time simulation can help find configuration errors but such techniques are expensive or even intractable for large industrial networks. We explore the use of abstract interpretation to address this fundamental scaling challenge and find that the right abstractions can reduce the asymptotic complexity of network simulation. Based on this observation, we build a tool called ShapeShifter for reachability analysis. On a suite of 127 production networks from a large cloud provider, ShapeShifter provides an asymptotic improvement in runtime and memory over the state-of-the-art simulator. These gains come with a minimal loss in precision. Our abstract analysis accurately predicts reachability for all destinations for 95% of the networks and for most destinations for the remaining 5%. We also find that abstract interpretation of network control planes not only speeds up existing analyses but also facilitates new kinds of analyses. We illustrate this advantage through a new destination "hijacking" analysis for the border gateway protocol (BGP), the globally-deployed routing protocol.
Supplemental Material
- Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. 2008. A Scalable, Commodity Data Center Network Architecture. In SIGCOMM.Google Scholar
- Kalev Alpernas, Roman Manevich, Aurojit Panda, Mooly Sagiv, Scott Shenker, Sharon Shoham, and Yaron Velner. 2018. Abstract Interpretation of Stateful Networks. In Static Analysis Symposium.Google Scholar
- Carolyn Jane Anderson, Nate Foster, Arjun Guha, Jean-Baptiste Jeannin, Dexter Kozen, Cole Schlesinger, and David Walker. 2014. NetKAT: Semantic Foundations for Networks. In POPL.Google Scholar
- Thomas Ball, Rupak Majumdar, Todd D. Millstein, and Sriram K. Rajamani. 2001. Automatic Predicate Abstraction of C Programs. In PLDI. 203–213.Google Scholar
Digital Library
- Ryan Beckett, Aarti Gupta, Ratul Mahajan, and David Walker. 2017. A General Approach to Network Configuration Verification. In SIGCOMM.Google Scholar
- Ryan Beckett, Aarti Gupta, Ratul Mahajan, and David Walker. 2018. Control Plane Compression. In SIGCOMM. 476–489.Google Scholar
- Bruno Blanchet, Patrick Cousot, Radhia Cousot, Jérôme Feret, Laurent Mauborgne, Antoine Miné, David Monniaux, and Xavier Rival. 2003. A static analyzer for large safety-critical software. In PLDI. 196–207.Google Scholar
- Randal E. Bryant. 1986. Graph-Based Algorithms for Boolean Function Manipulation. IEEE Trans. Computers 35, 8 (1986), 677–691.Google Scholar
Digital Library
- Edmund M. Clarke, Orna Grumberg, Somesh Jha, Yuan Lu, and Helmut Veith. 2000. Counterexample-Guided Abstraction Refinement. In Computer Aided Verification, 12th International Conference, CAV, Proceedings. 154–169.Google Scholar
- Clos Network 2019. Clos network. https://en .wikipedia.org/wiki/Clos n etwork .Google Scholar
- Patrick Cousot and Radhia Cousot. 1976. Static determination of dynamic properties of programs. In Proceedings of the 2nd International Symposium on Programming, Paris, France. Dunod, 106–130.Google Scholar
- Patrick Cousot and Radhia Cousot. 1977. Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints. In Conference Record of the Fourth ACM Symposium on Principles of Programming Languages. 238–252.Google Scholar
Digital Library
- Patrick Cousot and Nicolas Halbwachs. 1978. Automatic Discovery of Linear Restraints Among Variables of a Program. In POPL. 84–96.Google Scholar
- Matthew L. Daggitt, Alexander J. T. Gurney, and Timothy G. Griffin. 2018a. Asynchronous Convergence of Policy-rich Distributed Bellman-ford Routing Protocols. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication. 103–116.Google Scholar
- Matthew L. Daggitt, Alexander J. T. Gurney, and Timothy G. Griffin. 2018b. Asynchronous Convergence of Policy-rich Distributed Bellman-ford Routing Protocols. In SIGCOMM. 103–116.Google Scholar
- Xenofontas A. Dimitropoulos and George F. Riley. 2006. Efficient Large-scale BGP Simulations. Comput. Netw. 50, 12 (August 2006), 2013–2027.Google Scholar
- A. Fabrikant, U. Syed, and J. Rexford. 2011. There’s something about MRAI: Timing diversity can exponentially worsen BGP convergence. In INFOCOM. 2975–2983.Google Scholar
- Seyed K. Fayaz, Tushar Sharma, Ari Fogel, Ratul Mahajan, Todd Millstein, Vyas Sekar, and George Varghese. 2016. Efficient Network Reachability Analysis using a Succinct Control Plane Representation. In OSDI.Google Scholar
- Nick Feamster. 2005. Proactive Techniques for Correct and Predictable Internet Routing. Ph.D. Dissertation. Massachusetts Institute of Technology.Google Scholar
- Gilberto Filé and Francesco Ranzato. 1999. The powerset operator on abstract interpretations. Theoretical Computer Science 222, 1 (1999), 77 – 111.Google Scholar
Digital Library
- Ari Fogel, Stanley Fung, Luis Pedrosa, Meg Walraed-Sullivan, Ramesh Govindan, Ratul Mahajan, and Todd Millstein. 2015. A General Approach to Network Configuration Analysis. In NSDI.Google Scholar
- Nate Foster, Dexter Kozen, Konstantinos Mamouras, Mark Reitblatt, and Alexandra Silva. 2016. Probabilistic NetKAT. In Proceedings of the 25th European Symposium on Programming Languages and Systems - Volume 9632. 282–309.Google Scholar
- Lixin Gao and Jennifer Rexford. 2000. Stable Internet Routing Without Global Coordination. In SIGMETRICS.Google Scholar
- Aaron Gember-Jacobson, Raajay Viswanathan, Aditya Akella, and Ratul Mahajan. 2016. Fast Control Plane Analysis Using an Abstract Representation. In SIGCOMM.Google Scholar
- Nick Giannarakis, Ryan Beckett, Ratul Mahajan, and David Walker. 2019. Efficient Verification of Network Fault Tolerance via Counterexample-Guided Refinement. 305–323.Google Scholar
- Timothy G. Griffin, F. Bruce Shepherd, and Gordon Wilfong. 2002. The Stable Paths Problem and Interdomain Routing. IEEE/ACM Trans. Networking 10, 2 (2002).Google Scholar
Digital Library
- Timothy G. Griffin and Joäo Luís Sobrinho. 2005. Metarouting. In Proceedings of the 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications. 1–12.Google Scholar
- Peyman Kazemian, Michael Chang, Hongyi Zeng, George Varghese, Nick McKeown, and Scott Whyte. 2013. Real Time Network Policy Checking Using Header Space Analysis. In NSDI. 99–112.Google Scholar
- Peyman Kazemian, George Varghese, and Nick McKeown. 2012. Header Space Analysis: Static Checking for Networks. In NSDI.Google Scholar
- Ahmed Khurshid, Xuan Zou, Wenxuan Zhou, Matthew Caesar, and P. Brighten Godfrey. 2013. VeriFlow: Verifying NetworkWide Invariants in Real Time. In NSDI.Google Scholar
- Matthew L. Daggitt and Timothy Griffin. 2018. Rate of Convergence of Increasing Path-Vector Routing Protocols. In ICNP. 335–345.Google Scholar
- P. Lapukhov, A. Premji, and J. Mitchell. 2015. Use of BGP for routing in large-scale data centers. Internet draft.Google Scholar
- Olle Liljenzin. 2013. Confluently Persistent Sets and Maps. CoRR abs/1301.3388 (2013).Google Scholar
- Nuno Lopes, Nikolaj Bjorner, Patrice Godefroid, Karthick Jayaraman, and George Varghese. 2015. Checking Beliefs in Dynamic Networks. In NSDI.Google Scholar
- Nuno P. Lopes and Andrey Rybalchenko. 2019. Fast BGP Simulation of Large Datacenters. In Proc. of the 20th International Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI).Google Scholar
- Haohui Mai, Ahmed Khurshid, Rachit Agarwal, Matthew Caesar, P. Brighten Godfrey, and Samuel Talmadge King. 2011. Debugging the Data Plane with Anteater. In SIGCOMM.Google Scholar
- Antoine Miné. 2001. The Octagon Abstract Domain. In Proceedings of the Eighth Working Conference on Reverse Engineering, (WCRE). 310–319.Google Scholar
Cross Ref
- Raphaël Monat and Antoine Miné. 2017. Precise Thread-Modular Abstract Interpretation of Concurrent Programs Using Relational Interference Abstractions. In Verification, Model Checking, and Abstract Interpretation, Ahmed Bouajjani and David Monniaux (Eds.). Springer International Publishing, 386–404.Google Scholar
- Sanjay Narain, Dana Chee, Brian Coan, Ben Falchuk, Samuel Gordon, Jaewon Kang, Jonathan Kirsch, Aditya Naidu, Kaustubh Sinkar, and Simon Tsang. 2016. A Science of Network Configuration. Journal of Cyber Security and Information Systems 1, 4 (2016).Google Scholar
- Gordon D. Plotkin, Nikolaj Bjørner, Nuno P. Lopes, Andrey Rybalchenko, and George Varghese. 2016. Scaling Network Verification Using Symmetry and Surgery. In POPL.Google Scholar
- Santhosh Prabhu, Ali Kheradmand, Brighten Godfrey, and Matthew Caesar. 2017. Predicting Network Futures with Plankton. In Proceedings of the First Asia-Pacific Workshop on Networking (APNet’17). 92–98.Google Scholar
Digital Library
- Redistributing Routing Protocols 2012. Redistributing Routing Protocols. https://www .cisco.com/c/en/us/support/docs/ip/ enhanced- interior- gateway- routing- protocol- eigrp/8606- redist .html .Google Scholar
- D Roberts. 2018. It’s been a week and customers are still mad at BB&T. https://www .charlotteobserver.com/news/business/ banking/article202616124 .html .Google Scholar
- Simon Sharwood. 2016. Google cloud wobbles as workers patch wrong routers. http://www .theregister.co.uk/2016/03/01/ google c loud w obbles a s w orkers p atc w rong r outers/ .Google Scholar
- João Luís Sobrinho. 2005. An Algebraic Theory of Dynamic Network Routing. IEEE/ACM Trans. Netw. 13, 5 (October 2005), 1160–1173.Google Scholar
Digital Library
- Yevgenly Sverdlik. 2012. Microsoft: misconfigured network device led to Azure outage. http:// www .datacenterdynamics.com/content-tracks/servers-storage/microsoft-misconfigured-network-device-ledto- azure- outage/68312 .fullarticle .Google Scholar
- Y Sverdlik. 2017. United Says IT Outage Resolved, Dozen Flights Canceled Monday. https://www .datacenterknowledge.com/ archives/2017/01/23/united- says- it- outage- resolved- dozen- flights- canceled- monday .Google Scholar
- Dylan Tweney. 2013. 5-minute outage costs Google $545,000 in revenue. https://venturebeat .com/2013/08/16/3-minuteoutage- costs- google- 545000- in- revenue/ .Google Scholar
- Kannan Varadhan, Ramesh Govindan, and Deborah Estrin. 1996. Persistent route oscillations in inter-domain routing. Technical Report. Computer Networks.Google Scholar
- Anduo Wang, Limin Jia, Wenchao Zhou, Yiqing Ren, Boon Thau Loo, Jennifer Rexford, Vivek Nigam, Andre Scedrov, and Carolyn L. Talcott. 2012. FSR: Formal Analysis and Implementation Toolkit for Safe Inter-domain Routing. IEEE/ACM Trans. Networking 20, 6 (2012).Google Scholar
Digital Library
- Konstantin Weitz, Doug Woos, Emina Torlak, Michael D. Ernst, Arvind Krishnamurthy, and Zachary Tatlock. 2016. Formal Semantics and Automated Verification for the Border Gateway Protocol. In NetPL.Google Scholar
Index Terms
Abstract interpretation of distributed network control planes
Recommendations
NV: an intermediate language for verification of network control planes
PLDI 2020: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and ImplementationNetwork misconfiguration has caused a raft of high-profile outages over the past decade, spurring researchers to develop a variety of network analysis and verification tools. Unfortunately, developing and maintaining such tools is an enormous challenge ...
Control plane compression
SIGCOMM '18: Proceedings of the 2018 Conference of the ACM Special Interest Group on Data CommunicationWe develop an algorithm capable of compressing large networks into smaller ones with similar control plane behavior: For every stable routing solution in the large, original network, there exists a corresponding solution in the compressed network, and ...
ProbNV: probabilistic verification of network control planes
ProbNV is a new framework for probabilistic network control plane verification that strikes a balance between generality and scalability. ProbNV is general enough to encode a wide range of features from the most common protocols (eBGP and OSPF) and yet ...






Comments