ABSTRACT
Microsecond-scale congestion events, known as microbursts, are a main cause of packet loss and poor application performance in today's datacenters. Given the low network utilization in datacenters, one would expect packet deflection, in-situ re-routing of packets that arrive at a full buffer to a different port, to effectively prevent packet loss. However, if deployed naively, deflection leads to excessive packet re-ordering, exacerbated congestion, and head-of-the-line blocking in switch buffers. In this study, we resolve the above challenges by selectively deflecting the packets that cause persistent congestion in the network. To enable this, we augment the end-host network stacks with a transport-independent extension that tracks and marks flows with their remaining bytes. Our in-network deflection component uses the flow size information to re-route packets from flows with more data to send. Finally, an extension to the receive-side of end-host stacks retrieves the correct ordering of packets before passing them to transport and higherlevel protocols. We evaluate our design, Vertigo, under diverse datacenter workloads and show that it is effective in managing microbursts under light and heavy loads and when combined with various congestion control algorithms. For example, in a leaf-spine network under 85% load, Vertigo reduces the mean incast query completion times by 3.5x, 3.3x, 5x compared to ECMP, DRILL, and DIBS when using TCP, 3x, 3.5x, 4.5x alongside DCTCP, and 43x, 33x, 16x when using Swift, respectively.
Supplemental Material
References
- 2020. INET Framework. https://inet.omnetpp.org/.Google Scholar
- 2020. OMNeT++ Simulator. https://omnetpp.org/.Google Scholar
- 2020. Open Tofino. https://github.com/barefootnetworks/Open-Tofino.Google Scholar
- Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. 2008. A Scalable, Commodity Data Center Network Architecture. In SIGCOMM.Google Scholar
- Mohammad Alizadeh, Tom Edsall, Sarang Dharmapurikar, Ramanan Vaidyanathan, Kevin Chu, Andy Fingerhut, Vinh The Lam, Francis Matus, Rong Pan, Navindra Yadav, and George Varghese. 2014. CONGA: distributed congestion-aware load balancing for datacenters. In SIGCOMM.Google Scholar
- Mohammad Alizadeh, Albert Greenberg, David A Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data Center TCP (DCTCP). In SIGCOMM.Google Scholar
- Mohammad Alizadeh, Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, and Scott Shenker. 2013. pFabric: minimal near-optimal data-center transport. In SIGCOMM.Google Scholar
- Mark Allman and Ethan Blanton. 2005. Notes on burst mitigation for transport protocols. SIGCOMM CCR (2005).Google Scholar
- Behnaz Arzani, Selim Ciraci, Luiz Chamon, Yibo Zhu, Hongqiang Harry Liu, Jitu Padhye, Boon Thau Loo, and Geoff Outhred. 2018. 007: Democratically finding the cause of packet drops. In NSDI.Google Scholar
- Behnaz Arzani, Selim Ciraci, Boon Thau Loo, Assaf Schuster, and Geoff Outhred. 2016. Taking the Blame Game out of Data Centers Operations with NetPoirot. In SIGCOMM.Google Scholar
- Z. Abbasi G. Gibson B. Mueller J. Small J. Zelenka B. Welch, M. Unangst and B. Zhou. 2008. Scalable Performance of the Panasas Parallel File System. In FAST.Google Scholar
- Wei Bai, Li Chen, Kai Chen, and Haitao Wu. 2016. Enabling ECN in multi-service multi-queue data centers. In NSDI.Google Scholar
- Neda Beheshti, Petr Lapukhov, and Yashar Ganjali. 2019. Buffer Sizing Experiments at Facebook. In ACM BS.Google Scholar
- Ran Ben Basat, Sivaramakrishnan Ramanathan, Yuliang Li, Gianni Antichi, Minian Yu, and Michael Mitzenmacher. 2020. PINT: Probabilistic In-Band Network Telemetry. In SIGCOMM '20.Google Scholar
Digital Library
- Theophilus Benson, Aditya Akella, and David A Maltz. 2010. Network Traffic Characteristics of Data Centers in the Wild. In IMC.Google Scholar
- Steven Blake, David Black, Mark Carlson, Elwyn Davies, Zheng Wang, and Walter Weiss. 1998. An architecture for differentiated services. RFC 2475 (1998).Google Scholar
Digital Library
- Alberto Bononi, Fabrizio Forghieri, and Paul R Prucnal. 1993. Analysis of one-buffer deflection routing in ultra-fast optical mesh networks. In INFOCOM.Google Scholar
- Flaminio Borgonovo, Luigi Fratta, and Joseph Bannister. 1993. Unslotted deflection routing in all-optical networks. In GLOBECOM.Google Scholar
- Flaminio Borgonovo, Luigi Fratta, and Joseph A Bannister. 1994. On the design of optical deflection-routing networks. In INFOCOM.Google Scholar
- Xiaoqi Chen, Shir Landau Feibish, Yaron Koral, Jennifer Rexford, Ori Rottenstreich, Steven A Monetti, and Tzuu-Yi Wang. 2019. Fine-Grained Queue Measurement in the Data Plane. In CoNEXT.Google Scholar
- Yang Chen, Hongyi Wu, Dahai Xu, and Chunming Qiao. 2003. Performance analysis of optical burst switched node with deflection routing. In IEEE International Conference on Communications, Vol. 2.Google Scholar
- Inho Cho, Keon Jang, and Dongsu Han. 2017. Credit-Scheduled Delay-Bounded Congestion Control for Datacenters. In SIGCOMM.Google Scholar
- David D Clark, Scott Shenker, and Lixia Zhang. 1992. Supporting real-time applications in an integrated services packet network: Architecture and mechanism. In SIGCOMM CCR.Google Scholar
- Alan Demers, Srinivasan Keshav, and Scott Shenker. 1989. Analysis and simulation of a fair queueing algorithm. SIGCOMM CCR (1989).Google Scholar
- Dmitry Duplyakin, Robert Ricci, Aleksander Maricq, Gary Wong, Jonathon Duerig, Eric Eide, Leigh Stoller, Mike Hibler, David Johnson, Kirk Webb, Aditya Akella, Kuangching Wang, Glenn Ricart, Larry Landweber, Chip Elliott, Michael Zink, Emmanuel Cecchet, Snigdhaswin Kar, and Prabodh Mishra. 2019. The Design and Operation of CloudLab. In ATC.Google Scholar
- Chris Fallin, Greg Nazario, Xiangyao Yu, Kevin Chang, Rachata Ausavarungnirun, and Onur Mutlu. 2012. MinBD: Minimally-buffered deflection routing for energy-efficient interconnect. In IEEE/ACM International Symposium on Networks-on-Chip.Google Scholar
Digital Library
- Bin Fan, Dave G Andersen, Michael Kaminsky, and Michael D Mitzenmacher. 2014. Cuckoo Filter: Practically Better Than Bloom. In CoNEXT.Google Scholar
Digital Library
- Sally Floyd, Andrei Gurtov, and Tom Henderson. 2004. The NewReno Modification to TCP's Fast Recovery Algorithm. RFC 3782.Google Scholar
- S. Floyd and V. Jacobson. 1994. The synchronization of periodic routing messages. IEEE/ACM Transactions on Networking (1994).Google Scholar
- Peter X Gao, Akshay Narayan, Gautam Kumar, Rachit Agarwal, Sylvia Ratnasamy, and Scott Shenker. 2015. pHost: distributed near-optimal datacenter transport over commodity network fabric. In CoNEXT.Google Scholar
- Yilong Geng, Vimalkumar Jeyakumar, Abdul Kabbani, and Mohammad Alizadeh. 2016. Juggler: a practical reordering resilient network stack for datacenters. In EuroSys.Google Scholar
- Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google file system. In SOSP.Google Scholar
- Soudeh Ghorbani, Zibin Yang, P Brighten Godfrey, Yashar Ganjali, and Amin Firoozshahian. 2017. DRILL: Micro Load Balancing for Low-latency Data Center Networks. In SIGCOMM.Google Scholar
Digital Library
- Soroush Haeri and Ljiljana Trajković. 2014. Intelligent deflection routing in buffer-less networks. IEEE Transactions on Cybernetics 45, 2 (2014).Google Scholar
- Mark Handley, Costin Raiciu, Alexandru Agache, Andrei Voinescu, Andrew W Moore, Gianni Antichi, and Marcin Wójcik. 2017. Re-architecting datacenter networks and stacks for low latency and high performance. In SIGCOMM.Google Scholar
- Keqiang He, Eric Rozner, Kanak Agarwal, Wes Felter, John Carter, and Aditya Akella. 2015. Presto: Edge-based Load Balancing for Fast Datacenter Networks. In SIGCOMM.Google Scholar
- Chi-Yao Hong, Matthew Caesar, and P Brighten Godfrey. 2012. Finishing flows quickly with preemptive scheduling. In SIGCOMM.Google Scholar
- Ching-Fang Hsu, Te-Lung Liu, and Nen-Fu Huang. 2002. Performance analysis of deflection routing in optical burst-switched networks. In Annual Joint Conference of the IEEE Computer and Communications Societies, Vol. 1.Google Scholar
- Shuihai Hu, Wei Bai, Gaoxiong Zeng, Zilong Wang, Baochen Qiao, Kai Chen, Kun Tan, and Yi Wang. 2020. Aeolus: A Building Block for Proactive Transport in Datacenters. In SIGCOMM.Google Scholar
- Hao Jiang and Constantinos Dovrolis. 2003. Source-Level IP Packet Bursts: Causes and Effects. In IMC.Google Scholar
- Hao Jiang and Constantinos Dovrolis. 2005. Why is the Internet Traffic Bursty in Short Time Scales? SIGMETRICS Perform. Eval. Rev. (2005).Google Scholar
- Raj Joshi, Ting Qu, Mun Choon Chan, Ben Leong, and Boon Thau Loo. 2018. BurstRadar: Practical Real-Time Microburst Monitoring for Datacenter Networks. In APSys.Google Scholar
- Srikanth Kandula, Sudipta Sengupta, Albert Greenberg, Parveen Patel, and Ronnie Chaiken. 2009. The Nature of Data Center Traffic: Measurements & Analysis. In IMC.Google Scholar
- Rishi Kapoor, Alex C Snoeren, Geoffrey M Voelker, and George Porter. 2013. Bullet trains: a study of NIC burst behavior at microsecond timescales. In CoNEXT.Google Scholar
- Kazuki Kawanabe and Tatsuro Takahashi. 2007. Effective deflection control method in optical packet switching networks with shared buffers. Electronics and Communications in Japan (Part I: Communications) 90, 9 (2007).Google Scholar
- Changhoon Kim, Anirudh Sivaraman, Naga Katta, Antonin Bas, Advait Dixit, and Lawrence J Wobker. 2015. In-band network telemetry via programmable dataplanes. In SIGCOMM.Google Scholar
- Gautam Kumar, Nandita Dukkipati, Keon Jang, Hassan M G Wassel, Xian Wu, Behnam Montazeri, Yaogong Wang, Kevin Springborn, Christopher Alfeld, Michael Ryan, David Wetherall, and Amin Vahdat. 2020. Swift: Delay is Simple and Effective for Congestion Control in the Datacenter. In SIGCOMM.Google Scholar
- Yuliang Li, Rui Miao, Hongqiang Harry Liu, Yan Zhuang, Fei Feng, Lingbo Tang, Zheng Cao, Ming Zhang, Frank Kelly, Mohammad Alizadeh, and Minlan Yu. 2019. HPCC: high precision congestion control. In SIGCOMM.Google Scholar
- Hwijoon Lim, Wei Bai, Yibo Zhu, Youngmok Jung, and Dongsu Han. 2021. Towards timeout-less transport in commodity datacenter networks. In EuroSys.Google Scholar
- Zhonghai Lu, Mingchen Zhong, and Axel Jantsch. 2006. Evaluation of on-chip networks using deflection routing. In ACM Great Lakes symposium on VLSI.Google Scholar
Digital Library
- Srihari Makineni, Ravi Iyer, Partha Sarangam, Donald Newell, Li Zhao, Ramesh Illikkal, and Jaideep Moses. 2006. Receive Side Coalescing for Accelerating TCP/IP Processing. In HiPC.Google Scholar
- Jonatas Marques, Kirill Levchenko, and Luciano Gaspary. 2020. IntSight: Diagnosing SLO Violations with in-Band Network Telemetry. In CoNEXT.Google Scholar
- Michael Marty, Marc de Kruijf, Jacob Adriaens, Christopher Alfeld, Sean Bauer, Carlo Contavalli, Michael Dalton, Nandita Dukkipati, William C Evans, Steve Gribble, Nicholas Kidd, Roman Kononov, Gautam Kumar, Carl Mauer, Emily Musick, Lena Olson, Erik Rubow, Michael Ryan, Kevin Springborn, Paul Turner, Valas Valancius, Xi Wang, and Amin Vahdat. 2019. Snap: A Microkernel Approach to Host Networking. In SOSP.Google Scholar
- Radhika Mittal, Vinh The Lam, Nandita Dukkipati, Emily Blem, Hassan Wassel, Monia Ghobadi, Amin Vahdat, Yaogong Wang, David Wetherall, and David Zats. 2015. TIMELY: RTT-based Congestion Control for the Datacenter. In SIGCOMM.Google Scholar
- Michael Mitzenmacher, AndrÃl'a W. Richa, and Ramesh Sitaraman. 2000. The Power of Two Random Choices: A Survey of Techniques and Results. In Handbook of Randomized Computing.Google Scholar
- Behnam Montazeri, Yilong Li, Mohammad Alizadeh, and John Ousterhout. 2018. Homa: A Receiver-driven Low-latency Transport Protocol Using Network Priorities. In SIGCOMM.Google Scholar
Digital Library
- Ali Munir, Ghufran Baig, Syed M Irteza, Ihsan A Qazi, Alex X Liu, and Fahad R Dogar. 2014. Friends, not foes: synthesizing existing transport strategies for data center networks. In SIGCOMM.Google Scholar
- Aisha Mushtaq, Radhika Mittal, James McCauley, Mohammad Alizadeh, Sylvia Ratnasamy, and Scott Shenker. 2019. Datacenter congestion control: identifying what is essential and making it practical. SIGCOMM CCR (2019).Google Scholar
Digital Library
- S Narayana, A Sivaraman, V Nathan, P Goyal, and others. 2017. Language-directed hardware design for network performance monitoring. In SIGCOMM.Google Scholar
- Abhay K Parekh and Robert G Gallager. 1993. A generalized processor sharing approach to flow control in integrated services networks: the single-node case. IEEE/ACM transactions on networking (1993).Google Scholar
- Jonathan Perry, Amy Ousterhout, Hari Balakrishnan, Devavrat Shah, and Hans Fugal. 2014. Fastpass: A centralized" zero-queue" datacenter network. In SIGCOMM.Google Scholar
Digital Library
- Arjun Roy, Hongyi Zeng, Jasmeet Bagga, George Porter, and Alex C Snoeren. 2015. Inside the Social Network's (Datacenter) Network. In SIGCOMM.Google Scholar
- D. Shan, F. Ren, P. Cheng, R. Shu, and C. Guo. 2018. Micro-Burst in Data Centers: Observations, Analysis, and Mitigations. In IEEE ICNP.Google Scholar
- Naveen Kr Sharma, Chenxingyu Zhao, Ming Liu, Pravein G Kannan, Changhoon Kim, Arvind Krishnamurthy, and Anirudh Sivaraman. 2020. Programmable calendar queues for high-speed packet scheduling. In NSDI.Google Scholar
- X. Shi, L. Wang, F. Zhang, K. Zheng, and Z. Liu. 2017. PABO: Congestion mitigation via packet bounce. In IEEE ICC.Google Scholar
- Madhavapeddi Shreedhar and George Varghese. 1995. Efficient fair queueing using deficit round robin. SIGCOMM CCR.Google Scholar
- Vishal Shrivastav. 2019. Fast, scalable, and programmable packet scheduler in hardware. In SIGCOMM.Google Scholar
- Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2015. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network. In SIGCOMM.Google Scholar
Digital Library
- Anirudh Sivaraman, Suvinay Subramanian, Mohammad Alizadeh, Sharad Chole, Shang-Tse Chuang, Anurag Agrawal, Hari Balakrishnan, Tom Edsall, Sachin Katti, and Nick McKeown. 2016. Programmable Packet Scheduling at Line Rate. In SIGCOMM.Google Scholar
- Renata Teixeira, Aman Shaikh, Tim Griffin, and Jennifer Rexford. 2004. Dynamics of hot-potato routing in IP networks. In International Conference on Measurement and Modeling of Computer Systems.Google Scholar
Digital Library
- Vojislav Dukić, Sangeetha Abdu Jyothi, Bojan Karlaš, Muhsen Owaida, Ce Zhang, and Ankit Singla. 2019. Is advance knowledge of flow sizes a plausible assumption?. In NSDI.Google Scholar
- Erico Vanini, Rong Pan, Mohammad Alizadeh, Parvin Taheri, and Tom Edsall. 2017. Let it flow: Resilient asymmetric load balancing with flowlet switching. In NSDI.Google Scholar
- J Woodruff, A W Moore, and N Zilberman. 2019. Measuring Burstiness in Data Center Applications. In BS.Google Scholar
- Liangcheng Yu, John Sonchack, and Vincent Liu. 2020. Mantis: Reactive Programmable Switches. In SIGCOMM.Google Scholar
Digital Library
- Kyriakos Zarifis, Rui Miao, Matt Calder, Ethan Katz-Bassett, Minlan Yu, and Jitendra Padhye. 2014. DIBS: just-in-time congestion mitigation for data centers. In Eurosys.Google Scholar
- Qiao Zhang, Vincent Liu, Hongyi Zeng, and Arvind Krishnamurthy. 2017. Highresolution measurement of data center microbursts. In IMC.Google Scholar
- Yu Zhou, Chen Sun, Hongqiang Harry Liu, Rui Miao, Shi Bai, Bo Li, Zhilong Zheng, Lingjun Zhu, Zhen Shen, Yongqing Xi, Pengcheng Zhang, Dennis Cai, Ming Zhang, and Mingwei Xu. 2020. Flow Event Telemetry on Programmable Data Plane. In SIGCOMM.Google Scholar
Index Terms
Burst-tolerant datacenter networks with Vertigo





Comments