skip to main content
research-article

Cerberus: The Power of Choices in Datacenter Topology Design - A Throughput Perspective

Authors Info & Claims
Published:15 December 2021Publication History
Skip Abstract Section

Abstract

The bandwidth and latency requirements of modern datacenter applications have led researchers to propose various topology designs using static, dynamic demand-oblivious (rotor), and/or dynamic demand-aware switches. However, given the diverse nature of datacenter traffic, there is little consensus about how these designs would fare against each other. In this work, we analyze the throughput of existing topology designs under different traffic patterns and study their unique advantages and potential costs in terms of bandwidth and latency ''tax''. To overcome the identified inefficiencies, we propose Cerberus, a unified, two-layer leaf-spine optical datacenter design with three topology types. Cerberus systematically matches different traffic patterns with their most suitable topology type: e.g., latency-sensitive flows are transmitted via a static topology, all-to-all traffic via a rotor topology, and elephant flows via a demand-aware topology. We show analytically and in simulations that Cerberus can improve throughput significantly compared to alternative approaches and operate datacenters at higher loads while being throughput-proportional.

References

  1. H. Ballani, P. Costa, R. Behrendt, D. Cletheroe, I. Haller, K. Jozwik, F. Karinou, S. Lange, K. Shi, B. Thomsen, et al., Sirius: A flat datacenter network with nanosecond optical switching," in Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication, pp. 782--797, 2020.Google ScholarGoogle Scholar
  2. X. Zhou, Z. Zhang, Y. Zhu, Y. Li, S. Kumar, A. Vahdat, B. Y. Zhao, and H. Zheng, Mirror mirror on the ceiling: Flexible wireless links for data centers," Proc. ACM SIGCOMM Computer Communication Review (CCR), vol. 42, no. 4, pp. 443--454, 2012.Google ScholarGoogle Scholar
  3. S. Kandula, J. Padhye, and P. Bahl, Flyways to de-congest data center networks," in Proc. ACM Workshop on Hot Topics in Networks (HotNets), 2009.Google ScholarGoogle Scholar
  4. W. M. Mellette, R. McGuinness, A. Roy, A. Forencich, G. Papen, A. C. Snoeren, and G. Porter, Rotornet: A scalable, low-complexity, optical datacenter network," in Proceedings of the Conference of the ACM Special Interest Group on Data Communication, pp. 267--280, ACM, 2017.Google ScholarGoogle Scholar
  5. W. M. Mellette, R. Das, Y. Guo, R. McGuinness, A. C. Snoeren, and G. Porter, Expanding across time to deliver bandwidth efficiency and low latency," in 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), pp. 1--18, 2020.Google ScholarGoogle Scholar
  6. N. Farrington, G. Porter, S. Radhakrishnan, H. H. Bazzaz, V. Subramanya, Y. Fainman, G. Papen, and A. Vahdat, Helios: a hybrid electrical/optical switch architecture for modular data centers," ACM SIGCOMM Computer Communication Review, vol. 41, no. 4, pp. 339--350, 2011.Google ScholarGoogle Scholar
  7. N. Hamedazimi, Z. Qazi, H. Gupta, V. Sekar, S. R. Das, J. P. Longtin, H. Shah, and A. Tanwer, Firefly: A reconfigurable wireless data center fabric using free-space optics," in ACM SIGCOMM Computer Communication Review, vol. 44, pp. 319--330, ACM, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Chen, K. Chen, Z. Zhu, M. Yu, G. Porter, C. Qiao, and S. Zhong, Enabling wide-spread communications on optical fabric with megaswitch," in Proceedings of the 14th USENIX Conference on Networked Systems Design and Implementation, NSDI'17, (USA), pp. 577--593, USENIX Association, 2017.Google ScholarGoogle Scholar
  9. Y. J. Liu, P. X. Gao, B. Wong, and S. Keshav, Quartz: A new design element for low-latency dcns," SIGCOMM Comput. Commun. Rev., vol. 44, pp. 283--294, Aug. 2014.Google ScholarGoogle Scholar
  10. K. Chen, A. Singla, A. Singh, K. Ramachandran, L. Xu, Y. Zhang, X. Wen, and Y. Chen, Osa: An optical switching architecture for data center networks with unprecedented flexibility," IEEE/ACM Transactions on Networking (TON), vol. 22, no. 2, pp. 498--511, 2014.Google ScholarGoogle Scholar
  11. M. Ghobadi, R. Mahajan, A. Phanishayee, N. Devanur, J. Kulkarni, G. Ranade, P.-A. Blanche, H. Rastegarfar, M. Glick, and D. Kilper, Projector: Agile reconfigurable data center interconnect," in Proceedings of the 2016 ACM SIGCOMM Conference, pp. 216--229, ACM, 2016.Google ScholarGoogle Scholar
  12. G. Wang, D. G. Andersen, M. Kaminsky, K. Papagiannaki, T. Ng, M. Kozuch, and M. Ryan, c-through: Part-time optics in data centers," ACM SIGCOMM Computer Communication Review, vol. 41, no. 4, pp. 327--338, 2011.Google ScholarGoogle Scholar
  13. S. Schmid, C. Avin, C. Scheideler, M. Borokhovich, B. Haeupler, and Z. Lotker, Splaynet: Towards locally self-adjusting networks," IEEE/ACM Transactions on Networking (ToN), vol. 24, no. 3, pp. 1421--1433, 2016.Google ScholarGoogle Scholar
  14. S. B. Venkatakrishnan, M. Alizadeh, and P. Viswanath, Costly circuits, submodular schedules and approximate carathéodory theorems," Queueing Systems, vol. 88, no. 3--4, pp. 311--347, 2018.Google ScholarGoogle Scholar
  15. R. Schwartz, M. Singh, and S. Yazdanbod, Online and offline greedy algorithms for routing with switching costs," arXiv preprint arXiv:1905.02800, 2019.Google ScholarGoogle Scholar
  16. A. Singla, A. Singh, K. Ramachandran, L. Xu, and Y. Zhang, Proteus: a topology malleable data center network," in Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks, p. 8, ACM, 2010.Google ScholarGoogle Scholar
  17. K. Chen, A. Singla, A. Singh, K. Ramachandran, L. Xu, Y. Zhang, X. Wen, and Y. Chen, Osa: An optical switching architecture for data center networks with unprecedented flexibility," IEEE/ACM Transactions on Networking, vol. 22, pp. 498--511, April 2014.Google ScholarGoogle Scholar
  18. M. Hampson, Reconfigurable optical networks will move supercomputerdata 100x faster," in IEEE Spectrum, 2021.Google ScholarGoogle Scholar
  19. F. Douglis, S. Robertson, E. Van den Berg, J. Micallef, M. Pucci, A. Aiken, M. Hattink, M. Seok, and K. Bergman, Fleet-fast lanes for expedited execution at 10 terabits: Program overview," IEEE Internet Computing, 2021.Google ScholarGoogle Scholar
  20. C. Avin, M. Ghobadi, C. Griner, and S. Schmid, On the complexity of traffic traces and implications," in Proc. ACM SIGMETRICS, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Benson, A. Akella, and D. A. Maltz, Network traffic characteristics of data centers in the wild," in Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, pp. 267--280, ACM, 2010.Google ScholarGoogle Scholar
  22. A. Roy, H. Zeng, J. Bagga, G. Porter, and A. C. Snoeren, Inside the social network's (datacenter) network," in ACM SIGCOMM Computer Communication Review, vol. 45, pp. 123--137, ACM, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Roy, H. Zeng, J. Bagga, G. Porter, and A. C. Snoeren, Inside the social network's (datacenter) network," in Proc. ACM SIGCOMM Computer Communication Review (CCR), vol. 45, pp. 123--137, ACM, 2015.Google ScholarGoogle Scholar
  24. M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan, Data Center TCP (DCTCP)," in SIGCOMM, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Sergeev and M. D. Balso, Horovod: fast and easy distributed deep learning in tensorflow," CoRR, vol. abs/1802.05799, 2018.Google ScholarGoogle Scholar
  26. A. Faraj, P. Patarasuk, and X. Yuan, A study of process arrival patterns for mpi collective operations," International Journal of Parallel Programming, vol. 36, no. 6, pp. 543--570, 2008.Google ScholarGoogle Scholar
  27. C. Yang, Tree-based allreduce communication on mxnet," 2018.Google ScholarGoogle Scholar
  28. A. Singla, Fat-free topologies," in Proc. 15th ACM Workshop on Hot Topics in Networks (HotNets), pp. 64--70, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. N. Hall, K.-T. Foerster, S. Schmid, and R. Durairajan, A survey of reconfigurable optical networks," in Optical Switching and Networking (OSN), Elsevier, 2021.Google ScholarGoogle Scholar
  30. S. Kassing, A. Valadarsky, G. Shahaf, M. Schapira, and A. Singla, Beyond fat-trees without antennae, mirrors, and disco-balls," in Proceedings of the Conference of the ACM Special Interest Group on Data Communication, pp. 281--294, ACM, 2017.Google ScholarGoogle Scholar
  31. M. Al-Fares, A. Loukissas, and A. Vahdat, A scalable, commodity data center network architecture," in ACM SIGCOMM Computer Communication Review, vol. 38, pp. 63--74, ACM, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Kim, W. J. Dally, S. Scott, and D. Abts, Technology-driven, highly-scalable dragonfly topology," in 2008 International Symposium on Computer Architecture, pp. 77--88, IEEE, 2008.Google ScholarGoogle Scholar
  33. C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu, Bcube: a high performance, server-centric network architecture for modular data centers," ACM SIGCOMM Computer Communication Review, vol. 39, no. 4, pp. 63--74, 2009.Google ScholarGoogle Scholar
  34. H. Wu, G. Lu, D. Li, C. Guo, and Y. Zhang, Mdcube: a high performance network structure for modular data center interconnection," in Proceedings of the 5th international conference on Emerging networking experiments and technologies, pp. 25--36, ACM, 2009.Google ScholarGoogle Scholar
  35. A. Singla, C.-Y. Hong, L. Popa, and P. B. Godfrey, Jellyfish: Networking data centers, randomly.," in Proc. USENIX Symposium on Networked Systems Design and Implementation (NSDI), vol. 12, pp. 17--17, 2012.Google ScholarGoogle Scholar
  36. A. Singh, J. Ong, A. Agarwal, G. Anderson, A. Armistead, R. Bannon, S. Boving, G. Desai, B. Felderman, P. Germano, et al., Jupiter rising: A decade of clos topologies and centralized control in google's datacenter network," ACM SIGCOMM computer communication review, vol. 45, no. 4, pp. 183--197, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. V. Liu, D. Halperin, A. Krishnamurthy, and T. Anderson, F10: A fault-tolerant engineered network," in Presented as part of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13), pp. 399--412, 2013.Google ScholarGoogle Scholar
  38. C. Avin, K. Mondal, and S. Schmid, Demand-aware network designs of bounded degree," in Proc. International Symposium on Distributed Computing (DISC), 2017.Google ScholarGoogle Scholar
  39. M. Y. Teh, Z. Wu, and K. Bergman, Flexspander: augmenting expander networks in high-performance systems with optical bandwidth steering," IEEE/OSA Journal of Optical Communications and Networking, vol. 12, no. 4, pp. B44--B54, 2020.Google ScholarGoogle Scholar
  40. S. A. Jyothi, A. Singla, P. B. Godfrey, and A. Kolla, Measuring and understanding throughput of network topologies," in SC'16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 761--772, IEEE, 2016.Google ScholarGoogle Scholar
  41. M. Khani, M. Ghobadi, M. Alizadeh, Z. Zhu, M. Glick, K. Bergman, A. Vahdat, B. Klenk, and E. Ebrahimi, SiP-ML: High-Bandwidth Optical Network Interconnects for Machine Learning Training," SIGCOMM, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta, Vl2: a scalable and flexible data center network," in Proceedings of the ACM SIGCOMM 2009 conference on Data communication, pp. 51--62, 2009.Google ScholarGoogle Scholar
  43. MEMS-Optical-Switches. http://www.diconfiber.com/products/mems_matrix_optical_switches.php.Google ScholarGoogle Scholar
  44. Edge 64 Optical Circuit Switch. " https://www.calient.net/products/edge640-optical-circuit-switch/.Google ScholarGoogle Scholar
  45. P. Namyar, S. Supittayapornpong, M. Zhang, M. Yu, and R. Govindan, A throughput-centric view of the performance of datacenter topologies," in To appear in Proceedings of the ACM SIGCOMM 2021 conference, 2021.Google ScholarGoogle Scholar
  46. M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan, "Data center tcp (dctcp)," in Proceedings of the ACM SIGCOMM 2010 conference, pp. 63--74, 2010.Google ScholarGoogle Scholar
  47. A. Singla, P. B. Godfrey, and A. Kolla, "High throughput data center topology design.," in Proc. USENIX Symposium on Networked Systems Design and Implementation (NSDI), pp. 29--41, 2014.Google ScholarGoogle Scholar
  48. N. Jain, A. Bhatele, X. Ni, N. J. Wright, and L. V. Kale, "Maximizing throughput on a dragonfly network," in SC'14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 336--347, IEEE, 2014.Google ScholarGoogle Scholar
  49. Y. S. Fainman, J. Ford, W. M. Mellette, S. M. G. Porter, A. C. Snoeren, G. Papen, S. Saeedi, J. Cunningham, A. Krishnamoorthy, M. Gehl, C. T. DeRose, P. S. Davids, D. C. Trotter, A. L. Starbuck, C. M. Dallo, D. Hood, A. Pomerene, and A. Lentine, "Leed: A lightwave energy-efficient datacenter," in 2019 Optical Fiber Communications Conference and Exhibition (OFC), pp. 1--3, 2019.Google ScholarGoogle Scholar
  50. Y. Ben-Itzhak, C. Caba, L. Schour, and S. Vargaftik, "C-share: Optical circuits sharing for software-defined data-centers," arXiv preprint arXiv:1609.04521, 2016.Google ScholarGoogle Scholar
  51. L. Chen, J. Lingys, K. Chen, and F. Liu, "Auto: Scaling deep reinforcement learning for datacenter-scale automatic traffic optimization," in Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, pp. 191--205, 2018.Google ScholarGoogle Scholar
  52. W. Bai, L. Chen, K. Chen, D. Han, C. Tian, and H. Wang, "Information-agnostic flow scheduling for commodity data centers," in 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15), pp. 455--468, 2015.Google ScholarGoogle Scholar
  53. J. S. Rosenthal, "Convergence rates for markov chains," Siam Review, vol. 37, no. 3, pp. 387--405, 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. H. Ballani, P. Costa, R. Behrendt, D. Cletheroe, I. Haller, K. Jozwik, F. Karinou, S. Lange, K. Shi, B. Thomsen, et al., "Sirius: A flat datacenter network with nanosecond optical switching," in Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication, pp. 782--797, 2020.Google ScholarGoogle Scholar
  55. L. G. Valiant, "A scheme for fast parallel communication," SIAM journal on computing, vol. 11, no. 2, pp. 350--361, 1982.Google ScholarGoogle Scholar
  56. X. S. Huang, X. S. Sun, and T. E. Ng, "Sunflow: Efficient optical circuit scheduling for coflows," in Proceedings of the 12th International on Conference on emerging Networking EXperiments and Technologies, pp. 297--311, 2016.Google ScholarGoogle Scholar
  57. O. Goldreich, "Basic facts about expander graphs," in Studies in Complexity and Cryptography. Miscellanea on the Interplay between Randomness and Computation, pp. 451--464, Springer, 2011.Google ScholarGoogle Scholar
  58. T. J. Seok, N. Quack, S. Han, R. S. Muller, and M. C. Wu, "Large-scale broadband digital silicon photonic switches with vertical adiabatic couplers," Optica, vol. 3, pp. 64--70, Jan 2016.Google ScholarGoogle ScholarCross RefCross Ref
  59. S. Kandula, S. Sengupta, A. Greenberg, P. Patel, and R. Chaiken, "The nature of data center traffic: measurements & analysis," in Proc. 9th ACM SIGCOMM conference on Internet measurement, pp. 202--208, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. J. C. Mogul and L. Popa, "What we talk about when we talk about cloud network performance," ACM SIGCOMM Computer Communication Review, vol. 42, no. 5, pp. 44--48, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. S. Zou, X. Wen, K. Chen, S. Huang, Y. Chen, Y. Liu, Y. Xia, and C. Hu, "Virtualknotter: Online virtual machine shuffling for congestion resolving in virtualized datacenter," Computer Networks, vol. 67, pp. 141--153, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  62. Q. Zhang, V. Liu, H. Zeng, and A. Krishnamurthy, "High-resolution measurement of data center microbursts," in Proceedings of the 2017 Internet Measurement Conference, IMC '17, (New York, NY, USA), pp. 78--85, ACM, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. J. Kulkarni, S. Schmid, and P. Schmidt, "Scheduling opportunistic links in two-tiered reconfigurable datacenters," in 33rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2021.Google ScholarGoogle Scholar
  64. G. Porter, R. Strong, N. Farrington, A. Forencich, P. Chen-Sun, T. Rosing, Y. Fainman, G. Papen, and A. Vahdat, "Integrating microsecond circuit switching into the data center," in Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM, SIGCOMM '13, (New York, NY, USA), pp. 447--458, Association for Computing Machinery, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. C. Avin and S. Schmid, "Renets: Statically-optimal demand-aware networks," in Proc. SIAM Symposium on Algorithmic Principles of Computer Systems (APOCS), 2021.Google ScholarGoogle ScholarCross RefCross Ref
  66. N. McKeown, "The islip scheduling algorithm for input-queued switches," IEEE/ACM transactions on networking, vol. 7, no. 2, pp. 188--201, 1999.Google ScholarGoogle Scholar
  67. M. Dinitz and B. Moseley, "Scheduling for weighted flow and completion times in reconfigurable networks," in IEEE Conference on Computer Communications (INFOCOM), pp. 1043--1052, 2020.Google ScholarGoogle Scholar
  68. M. Bienkowski, D. Fuchssteiner, J. Marcinkowski, and S. Schmid, "Online dynamic b-matching with applications to reconfigurable datacenter networks," in Proc. 38th International Symposium on Computer Performance, Modeling, Measurements and Evaluation (PERFORMANCE), 2020.Google ScholarGoogle Scholar
  69. S. Hoory, N. Linial, and A. Wigderson, "Expander graphs and their applications," Bulletin of the American Mathematical Society, vol. 43, no. 4, pp. 439--561, 2006.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Cerberus: The Power of Choices in Datacenter Topology Design - A Throughput Perspective

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!