skip to main content
research-article

CONGA: distributed congestion-aware load balancing for datacenters

Published: 17 August 2014 Publication History
  • Get Citation Alerts
  • Abstract

    We present the design, implementation, and evaluation of CONGA, a network-based distributed congestion-aware load balancing mechanism for datacenters. CONGA exploits recent trends including the use of regular Clos topologies and overlays for network virtualization. It splits TCP flows into flowlets, estimates real-time congestion on fabric paths, and allocates flowlets to paths based on feedback from remote switches. This enables CONGA to efficiently balance load and seamlessly handle asymmetry, without requiring any TCP modifications. CONGA has been implemented in custom ASICs as part of a new datacenter fabric. In testbed experiments, CONGA has 5x better flow completion times than ECMP even with a single link failure and achieves 2-8x better throughput than MPTCP in Incast scenarios. Further, the Price of Anarchy for CONGA is provably small in Leaf-Spine topologies; hence CONGA is nearly as effective as a centralized scheduler while being able to react to congestion in microseconds. Our main thesis is that datacenter fabric load balancing is best done in the network, and requires global schemes such as CONGA to handle asymmetry.

    References

    [1]
    M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable, commodity data center network architecture. In SIGCOMM, 2008.
    [2]
    M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat. Hedera: Dynamic Flow Scheduling for Data Center Networks. In NSDI, 2010.
    [3]
    M. Alizadeh et al. CONGA: Distributed Congestion-Aware Load Balancing for Datacenters. http://simula.stanford.edu/ alizade/papers/conga-techreport.pdf.
    [4]
    M. Alizadeh et al. Data center TCP (DCTCP). In SIGCOMM, 2010.
    [5]
    M. Alizadeh et al. pFabric: Minimal Near-optimal Datacenter Transport. In SIGCOMM, 2013.
    [6]
    R. Banner and A. Orda. Bottleneck Routing Games in Communication Networks. Selected Areas in Communications, IEEE Journal on, 25(6):1173--1179, 2007.
    [7]
    M. Beck and M. Kagan. Performance Evaluation of the RDMA over Ethernet (RoCE) Standard in Enterprise Data Centers Infrastructure. In DC-CaVES, 2011.
    [8]
    T. Benson, A. Akella, and D. A. Maltz. Network Traffic Characteristics of Data Centers in the Wild. In SIGCOMM, 2010.
    [9]
    T. Benson, A. Anand, A. Akella, and M. Zhang. MicroTE: Fine Grained Traffic Engineering for Data Centers. In CoNEXT, 2011.
    [10]
    J. Cao et al. Per-packet Load-balanced, Low-latency Routing for Clos-based Data Center Networks. In CoNEXT, 2013.
    [11]
    Y. Cao, M. Xu, X. Fu, and E. Dong. Explicit Multipath Congestion Control for Data Center Networks. In CoNEXT, 2013.
    [12]
    Y. Chen, R. Griffith, J. Liu, R. H. Katz, and A. D. Joseph. Understanding TCP Incast Throughput Collapse in Datacenter Networks. In WREN, 2009.
    [13]
    N. Dukkipati and N. McKeown. Why Flow-completion Time is the Right Metric for Congestion Control. SIGCOMM Comput. Commun. Rev., 2006.
    [14]
    A. Elwalid, C. Jin, S. Low, and I. Widjaja. MATE: MPLS adaptive traffic engineering. In INFOCOM, 2001.
    [15]
    B. Fortz and M. Thorup. Internet traffic engineering by optimizing OSPF weights. In INFOCOM, 2000.
    [16]
    R. Gallager. A Minimum Delay Routing Algorithm Using Distributed Computation. Communications, IEEE Transactions on, 1977.
    [17]
    P. Gill, N. Jain, and N. Nagappan. Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications. In SIGCOMM, 2011.
    [18]
    A. Greenberg et al. VL2: a scalable and flexible data center network. In SIGCOMM, 2009.
    [19]
    Apache Hadoop. http://hadoop.apache.org/.
    [20]
    C.-Y. Hong, M. Caesar, and P. B. Godfrey. Finishing Flows Quickly with Preemptive Scheduling. In SIGCOMM, 2012.
    [21]
    C.-Y. Hong et al. Achieving High Utilization with Software-driven WAN. In SIGCOMM, 2013.
    [22]
    R. Jain and S. Paul. Network virtualization and software defined networking for cloud computing: a survey. Communications Magazine, IEEE, 51(11):24--31, 2013.
    [23]
    S. Jain et al. B4: Experience with a Globally-deployed Software Defined Wan. In SIGCOMM, 2013.
    [24]
    S. Jansen and A. McGregor. Performance, Validation and Testing with the Network Simulation Cradle. In MASCOTS, 2006.
    [25]
    V. Jeyakumar et al. EyeQ: Practical Network Performance Isolation at the Edge. In NSDI, 2013.
    [26]
    S. Kandula, D. Katabi, B. Davie, and A. Charny. Walking the Tightrope: Responsive Yet Stable Traffic Engineering. In SIGCOMM, 2005.
    [27]
    S. Kandula, D. Katabi, S. Sinha, and A. Berger. Dynamic Load Balancing Without Packet Reordering. SIGCOMM Comput. Commun. Rev., 37(2):51--62, Mar. 2007.
    [28]
    S. Kandula, S. Sengupta, A. Greenberg, P. Patel, and R. Chaiken. The nature of data center traffic: measurements & analysis. In IMC, 2009.
    [29]
    R. Kapoor et al. Bullet Trains: A Study of NIC Burst Behavior at Microsecond Timescales. In CoNEXT, 2013.
    [30]
    P. Key, L. Massoulié, and D. Towsley. Path Selection and Multipath Congestion Control. Commun. ACM, 54(1):109--116, Jan. 2011.
    [31]
    A. Khanna and J. Zinky. The Revised ARPANET Routing Metric. In SIGCOMM, 1989.
    [32]
    M. Kodialam, T. V. Lakshman, J. B. Orlin, and S. Sengupta. Oblivious Routing of Highly Variable Traffic in Service Overlays and IP Backbones. IEEE/ACM Trans. Netw., 17(2):459--472, Apr. 2009.
    [33]
    T. Koponen et al. Network Virtualization in Multi-tenant Datacenters. In NSDI, 2014.
    [34]
    V. Liu, D. Halperin, A. Krishnamurthy, and T. Anderson. F10: A Fault-tolerant Engineered Network. In NSDI, 2013.
    [35]
    M. Mahalingam et al. VXLAN: A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks. http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-06, 2013.
    [36]
    N. Michael, A. Tang, and D. Xu. Optimal link-state hop-by-hop routing. In ICNP, 2013.
    [37]
    MultiPath TCP - Linux Kernel implementation. http://www.multipath-tcp.org/.
    [38]
    T. Narten et al. Problem Statement: Overlays for Network Virtualization. http://tools.ietf.org/html/draft-ietf-nvo3-overlay-problem-statement-04, 2013.
    [39]
    J. Ousterhout et al. The case for RAMCloud. Commun. ACM, 54, July 2011.
    [40]
    C. Papadimitriou. Algorithms, Games, and the Internet. In Proc. of STOC, 2001.
    [41]
    C. Raiciu et al. Improving datacenter performance and robustness with multipath tcp. In SIGCOMM, 2011.
    [42]
    M. Roughan, M. Thorup, and Y. Zhang. Traffic engineering with estimated traffic matrices. In IMC, 2003.
    [43]
    T. Roughgarden. Selfish Routing and the Price of Anarchy. The MIT Press, 2005.
    [44]
    S. Sen, D. Shue, S. Ihm, and M. J. Freedman. Scalable, Optimal Flow Routing in Datacenters via Local Link Balancing. In CoNEXT, 2013.
    [45]
    M. Sridharan et al. NVGRE: Network Virtualization using Generic Routing Encapsulation. http://tools.ietf.org/html/draft-sridharan-virtualization-nvgre-03, 2013.
    [46]
    A. Varga et al. The OMNeT+ discrete event simulation system. In ESM, 2001.
    [47]
    V. Vasudevan et al. Safe and effective fine-grained TCP retransmissions for datacenter communication. In SIGCOMM, 2009.
    [48]
    S. Vutukury and J. J. Garcia-Luna-Aceves. A Simple Approximation to Minimum-delay Routing. In SIGCOMM, 1999.
    [49]
    H. Wang et al. COPE: Traffic Engineering in Dynamic Networks. In SIGCOMM, 2006.
    [50]
    D. Wischik, C. Raiciu, A. Greenhalgh, and M. Handley. Design, Implementation and Evaluation of Congestion Control for Multipath TCP. In NSDI, 2011.
    [51]
    D. Xu, M. Chiang, and J. Rexford. Link-state Routing with Hop-by-hop Forwarding Can Achieve Optimal Traffic Engineering. IEEE/ACM Trans. Netw., 19(6):1717--1730, Dec. 2011.
    [52]
    D. Zats, T. Das, P. Mohan, D. Borthakur, and R. H. Katz. DeTail: Reducing the Flow Completion Time Tail in Datacenter Networks. In SIGCOMM, 2012.

    Cited By

    View all
    • (2024)Impossibility Results for Data-Center Routing with Congestion Control and Unsplittable FlowsProceedings of the 43rd ACM Symposium on Principles of Distributed Computing10.1145/3662158.3662777(358-368)Online publication date: 17-Jun-2024
    • (2024)Learned Load BalancingTheoretical Computer Science10.1016/j.tcs.2024.114611(114611)Online publication date: May-2024
    • (2024)HeavySeparation: A Generic framework for stream processing faster and more accurateComputer Communications10.1016/j.comcom.2024.04.036223(36-43)Online publication date: Jul-2024
    • Show More Cited By

    Index Terms

    1. CONGA: distributed congestion-aware load balancing for datacenters

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM SIGCOMM Computer Communication Review
      ACM SIGCOMM Computer Communication Review  Volume 44, Issue 4
      SIGCOMM'14
      October 2014
      672 pages
      ISSN:0146-4833
      DOI:10.1145/2740070
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 August 2014
      Published in SIGCOMM-CCR Volume 44, Issue 4

      Check for updates

      Author Tags

      1. datacenter fabric
      2. distributed
      3. load balancing

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)766
      • Downloads (Last 6 weeks)124

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Impossibility Results for Data-Center Routing with Congestion Control and Unsplittable FlowsProceedings of the 43rd ACM Symposium on Principles of Distributed Computing10.1145/3662158.3662777(358-368)Online publication date: 17-Jun-2024
      • (2024)Learned Load BalancingTheoretical Computer Science10.1016/j.tcs.2024.114611(114611)Online publication date: May-2024
      • (2024)HeavySeparation: A Generic framework for stream processing faster and more accurateComputer Communications10.1016/j.comcom.2024.04.036223(36-43)Online publication date: Jul-2024
      • (2024)Efficient Multi-tunnel Flow Scheduling for Traffic EngineeringAlgorithms and Architectures for Parallel Processing10.1007/978-981-97-0859-8_27(456-473)Online publication date: 27-Feb-2024
      • (2024)Adaptive Routing for Datacenter Networks Using Ant Colony OptimizationAlgorithms and Architectures for Parallel Processing10.1007/978-981-97-0798-0_17(290-309)Online publication date: 1-Mar-2024
      • (2024)Enabling Traffic-Differentiated Load Balancing for Datacenter NetworksAlgorithms and Architectures for Parallel Processing10.1007/978-981-97-0798-0_15(250-269)Online publication date: 1-Mar-2024
      • (2023)CrossBal: Data and Control Plane Cooperation for Efficient and Scalable Network Load Balancing2023 19th International Conference on Network and Service Management (CNSM)10.23919/CNSM59352.2023.10327790(1-9)Online publication date: 30-Oct-2023
      • (2023)PDLB: Path Diversity-aware Load Balancing with adaptive granularity in data center networksJournal of Cloud Computing10.1186/s13677-023-00548-x12:1Online publication date: 7-Dec-2023
      • (2023)Mistill: Distilling Distributed Network Protocols From ExamplesIEEE Transactions on Network and Service Management10.1109/TNSM.2023.326352920:4(4110-4125)Online publication date: 1-Dec-2023
      • (2023)Dynamic and Load-Aware Flowlet for Load-Balancing in Data Center Networks2023 IEEE International Performance, Computing, and Communications Conference (IPCCC)10.1109/IPCCC59175.2023.10253875(204-209)Online publication date: 17-Nov-2023
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media