skip to main content
10.5555/2930611.2930638guideproceedingsArticle/Chapter ViewAbstractPublication PagesnsdiConference Proceedingsconference-collections
Article

HUG: multi-resource fairness for correlated and elastic demands

Published: 16 March 2016 Publication History
  • Get Citation Alerts
  • Abstract

    In this paper, we study how to optimally provide isolation guarantees in multi-resource environments, such as public clouds, where a tenant's demands on different resources (links) are correlated. Unlike prior work such as Dominant Resource Fairness (DRF) that assumes static and fixed demands, we consider elastic demands. Our approach generalizes canonical max-min fairness to the multi-resource setting with correlated demands, and extends DRF to elastic demands. We consider two natural optimization objectives: isolation guarantee from a tenant's viewpoint and system utilization (work conservation) from an operator's perspective. We prove that in non-cooperative environments like public cloud networks, there is a strong tradeoff between optimal isolation guarantee and work conservation when demands are elastic. Even worse, work conservation can even decrease network utilization instead of improving it when demands are inelastic. We identify the root cause behind the tradeoff and present a provably optimal allocation algorithm, High Utilization with Guarantees (HUG), to achieve maximum attainable network utilization without sacrificing the optimal isolation guarantee, strategy-proofness, and other useful properties of DRF. In cooperative environments like private datacenter networks, HUG achieves both the optimal isolation guarantee and work conservation. Analyses, simulations, and experiments show that HUG provides better isolation guarantees, higher system utilization, and better tenant-level performance than its counterparts.

    References

    [1]
    Amazon CloudWatch. http://aws.amazon.com/cloudwatch.
    [2]
    Amazon EC2. http://aws.amazon.com/ec2.
    [3]
    Apache Hadoop. http://hadoop.apache.org.
    [4]
    AWS Innovation at Scale. http://goo.gl/Py2Ueo.
    [5]
    Google Compute Engine. https://cloud.google.com/compute.
    [6]
    Google Container Engine. http://kubernetes.io.
    [7]
    A look inside Google's data center networks. http://goo.gl/u0vZCY.
    [8]
    Microsoft Azure. http://azure.microsoft.com.
    [9]
    Storm: Distributed and fault-tolerant realtime computation. http://storm-project.net.
    [10]
    Trident: Stateful Stream Processing on Storm. http://goo.gl/cKsvbj.
    [11]
    M. Alizadeh, T. Edsall, S. Dharmapurikar, R. Vaidyanathan, K. Chu, A. Fingerhut, F. Matus, R. Pan, N. Yadav, and G. Varghese. CONGA: Distributed congestion-aware load balancing for datacenters. In SIGCOMM, 2014.
    [12]
    M. Alizadeh, S. Yang, M. Sharif, S. Katti, N. Mckeown, B. Prabhakar, and S. Shenker. pFabric: Minimal near-optimal datacenter transport. In SIGCOMM, 2013.
    [13]
    S. Angel, H. Ballani, T. Karagiannis, G. OShea, and E. Thereska. End-to-end performance isolation through virtual datacenters. In OSDI, 2014.
    [14]
    H. Ballani, P. Costa, T. Karagiannis, and A. Rowstron. Towards predictable datacenter networks. In SIGCOMM, 2011.
    [15]
    H. Ballani, K. Jang, T. Karagiannis, C. Kim, D. Gunawardena, and G. OShea. Chatty tenants and the cloud network sharing problem. In NSDI, 2013.
    [16]
    J. Bennett and H. Zhang. WF2Q: Worst-case fair weighted fair queueing. In INFOCOM, 1996.
    [17]
    T. Benson, A. Akella, and D. A. Maltz. Network traffic characteristics of data centers in the wild. In IMC, 2010.
    [18]
    T. Benson, A. Anand, A. Akella, and M. Zhang. MicroTE: Fine grained traffic engineering for data centers. In CoNEXT, 2011.
    [19]
    P. Bodik, I. Menache, M. Chowdhury, P. Mani, D. Maltz, and I. Stoica. Surviving failures in bandwidth-constrained datacenters. In SIGCOMM, 2012.
    [20]
    M. Chowdhury, S. Kandula, and I. Stoica. Leveraging endpoint flexibility in data-intensive clusters. In SIGCOMM, 2013.
    [21]
    M. Chowdhury and I. Stoica. Efficient coflow scheduling without prior knowledge. In SIGCOMM, 2015.
    [22]
    M. Chowdhury, M. Zaharia, J. Ma, M. I. Jordan, and I. Stoica. Managing data transfers in computer clusters with Orchestra. In SIGCOMM, 2011.
    [23]
    M. Chowdhury, Y. Zhong, and I. Stoica. Efficient coflow scheduling with Varys. In SIGCOMM, 2014.
    [24]
    J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, 2004.
    [25]
    A. Demers, S. Keshav, and S. Shenker. Analysis and simulation of a fair queueing algorithm. In SIGCOMM, 1989.
    [26]
    F. Dogar, T. Karagiannis, H. Ballani, and A. Rowstron. Decentralized task-aware scheduling for data center networks. In SIGCOMM, 2014.
    [27]
    D. Dolev, D. G. Feitelson, J. Y. Halpern, R. Kupferman, and N. Linial. No justified complaints: On fair sharing of multiple resources. In ITCS, 2012.
    [28]
    N. G. Duffield, P. Goyal, A. Greenberg, P. Mishra, K. K. Ramakrishnan, and J. E. van der Merive. A flexible model for resource management in virtual private networks. In SIGCOMM, 1999.
    [29]
    N. Dukkipati. Rate Control Protocol (RCP): Congestion control to make flows complete quickly. PhD thesis, Stanford University, 2007.
    [30]
    M. M. Flood. Some experimental games. Management Science, 5(1):5-26, 1958.
    [31]
    L. R. Ford and D. R. Fulkerson. Maximal flow through a network. Canadian Journal of Mathematics, 8(3):399-404, 1956.
    [32]
    A. Ghodsi, V. Sekar, M. Zaharia, and I. Stoica. Multi-resource fair queueing for packet processing. SIGCOMM, 2012.
    [33]
    A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica. Dominant resource fairness: Fair allocation of multiple resource types. In NSDI, 2011.
    [34]
    S. J. Golestani. Network delay analysis of a class of fair queueing algorithms. IEEE JSAC, 13(6):1057-1070, 1995.
    [35]
    P. Goyal, H. M. Vin, and H. Chen. Start-time fair queueing: A scheduling algorithm for integrated services packet switching networks. In SIGCOMM, 1996.
    [36]
    R. Grandl, G. Ananthanarayanan, S. Kandula, S. Rao, and A. Akella. Multi-resource packing for cluster schedulers. In SIGCOMM, 2014.
    [37]
    C. Guo, G. Lu, H. J. Wang, S. Yang, C. Kong, P. Sun, W. Wu, and Y. Zhang. SecondNet: A data center network virtualization architecture with bandwidth guarantees. In CoNEXT, 2010.
    [38]
    A. Gutman and N. Nisan. Fair allocation without trade. In AAMAS, 2012.
    [39]
    M. Hajjat, X. Sun, Y.-W. E. Sung, D. Maltz, S. Rao, K. Sripanidkulchai, and M. Tawarmalani. Cloudward bound: planning for beneficial migration of enterprise applications to the cloud. In SIGCOMM, 2010.
    [40]
    B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. In NSDI, 2011.
    [41]
    C.-Y. Hong, M. Caesar, and P. B. Godfrey. Finishing flows quickly with preemptive scheduling. In SIGCOMM, 2012.
    [42]
    M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In EuroSys, 2007.
    [43]
    J. M. Jaffe. Bottleneck flow control. IEEE Transactions on Communications, 29(7):954-962, 1981.
    [44]
    K. Jang, J. Sherry, H. Ballani, and T. Moncaster. Silo: Predictable message completion time in the cloud. In SIGCOMM, 2015.
    [45]
    V. Jeyakumar, M. Alizadeh, D. Mazieres, B. Prabhakar, C. Kim, and A. Greenberg. EyeQ: Practical network performance isolation at the edge. In NSDI, 2013.
    [46]
    C. Joe-Wong, S. Sen, T. Lan, and M. Chiang. Multiresource allocation: Fairness-efficiency tradeoffs in a unifying framework. In INFOCOM, 2012.
    [47]
    S. Kandula, D. Katabi, B. Davie, and A. Charny. Walking the tightrope: Responsive yet stable traffic engineering. In SIGCOMM, 2005.
    [48]
    S. Kandula, S. Sengupta, A. Greenberg, P. Patel, and R. Chaiken. The nature of datacenter traffic: Measurements and analysis. In IMC, 2009.
    [49]
    N. Kang, Z. Liu, J. Rexford, and D. Walker. Optimizing the "One Big Switch" abstraction in Software-Defined Networks. In CoNEXT, 2013.
    [50]
    K. LaCurts, J. Mogul, H. Balakrishnan, and Y. Turner. Cicada: Introducing predictive guarantees for cloud networks. In HotCloud, 2014.
    [51]
    V. T. Lam, S. Radhakrishnan, A. Vahdat, G. Varghese, and R. Pan. NetShare and stochastic NetShare: Predictable bandwidth allocation for data centers. SIGCOMM CCR, 42(3):5-11, 2012.
    [52]
    J. Lee, Y. Turner, M. Lee, L. Popa, S. Banerjee, J.-M. Kang, and P. Sharma. Application-driven bandwidth guarantees in datacenters. In SIGCOMM, 2014.
    [53]
    J. C. Mogul and L. Popa. What we talk about when we talk about cloud network performance. SIGCOMM CCR, 42(5):44-48, 2012.
    [54]
    J. F. Nash Jr. The bargaining problem. Econometrica: Journal of the Econometric Society, pages 155-162, 1950.
    [55]
    A. K. Parekh and R. G. Gallager. A generalized processor sharing approach to flow control in integrated services networks: The single-node case. IEEE/ACM ToN, 1(3):344-357, 1993.
    [56]
    D. C. Parkes, A. D. Procaccia, and N. Shah. Beyond dominant resource fairness: extensions, limitations, and indivisibilities. In EC, 2012.
    [57]
    J. Perry, A. Ousterhout, H. Balakrishnan, D. Shah, and H. Fugal. Fastpass: A centralized "zero-queue" datacenter network. In SIGCOMM, 2014.
    [58]
    L. Popa, G. Kumar, M. Chowdhury, A. Krishnamurthy, S. Ratnasamy, and I. Stoica. FairCloud: Sharing the network in cloud computing. In SIGCOMM, 2012.
    [59]
    L. Popa, P. Yalagandula, S. Banerjee, J. C. Mogul, Y. Turner, and J. R. Santos. ElasticSwitch: Practical work-conserving bandwidth guarantees for cloud computing. In SIGCOMM, 2013.
    [60]
    Q. Pu, G. Ananthanarayanan, P. Bodik, S. Kandula, A. Akella, V. Bahl, and I. Stoica. Low latency geo-distributed data analytics. In SIGCOMM, 2015.
    [61]
    H. Rodrigues, J. R. Santos, Y. Turner, P. Soares, and D. Guedes. Gatekeeper: Supporting bandwidth guarantees for multi-tenant datacenter networks. In USENIX WIOV, 2011.
    [62]
    A. Shieh, S. Kandula, A. Greenberg, and C. Kim. Sharing the data center network. In NSDI, 2011.
    [63]
    M. Shreedhar and G. Varghese. Efficient fair queueing using deficit round robin. In SIGCOMM, 1995.
    [64]
    I. Stoica, H. Zhang, and T. Ng. A hierarchical fair service curve algorithm for link-sharing, real-time, and priority services. In SIGCOMM, 1997.
    [65]
    H. R. Varian. Equity, envy, and efficiency. Journal of economic theory, 9(1):63-91, 1974.
    [66]
    A. Vulimiri, C. Curino, B. Godfrey, J. Padhye, and G. Varghese. Global analytics in the face of bandwidth and regulatory constraints. In NSDI, 2015.
    [67]
    D. Xie, N. Ding, Y. C. Hu, and R. Kompella. The only constant is change: Incorporating time-varying network reservations in data centers. In SIGCOMM, 2012.
    [68]
    M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, 2012.
    [69]
    M. Zaharia, T. Das, H. Li, S. Shenker, and I. Stoica. Discretized streams: Fault-tolerant stream computation at scale. In SOSP, 2013.

    Cited By

    View all
    • (2023)Ditto: Efficient Serverless Analytics with Elastic ParallelismProceedings of the ACM SIGCOMM 2023 Conference10.1145/3603269.3604816(406-419)Online publication date: 10-Sep-2023
    • (2022)AequitasProceedings of the ACM SIGCOMM 2022 Conference10.1145/3544216.3544271(1-18)Online publication date: 22-Aug-2022
    • (2022)Predictable vFabric on informative data planeProceedings of the ACM SIGCOMM 2022 Conference10.1145/3544216.3544241(615-632)Online publication date: 22-Aug-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    NSDI'16: Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation
    March 2016
    699 pages
    ISBN:9781931971294

    Sponsors

    • VMware
    • Google Inc.
    • Microsoft Research: Microsoft Research
    • Facebook: Facebook

    Publisher

    USENIX Association

    United States

    Publication History

    Published: 16 March 2016

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Ditto: Efficient Serverless Analytics with Elastic ParallelismProceedings of the ACM SIGCOMM 2023 Conference10.1145/3603269.3604816(406-419)Online publication date: 10-Sep-2023
    • (2022)AequitasProceedings of the ACM SIGCOMM 2022 Conference10.1145/3544216.3544271(1-18)Online publication date: 22-Aug-2022
    • (2022)Predictable vFabric on informative data planeProceedings of the ACM SIGCOMM 2022 Conference10.1145/3544216.3544241(615-632)Online publication date: 22-Aug-2022
    • (2022)OwlProceedings of the 13th Symposium on Cloud Computing10.1145/3542929.3563470(78-93)Online publication date: 7-Nov-2022
    • (2022)Multi-resource fair allocation for consolidated flash-based caching systemsProceedings of the 23rd ACM/IFIP International Middleware Conference10.1145/3528535.3565245(202-215)Online publication date: 7-Nov-2022
    • (2021)Rethinking networking abstractions for cloud tenantsProceedings of the Workshop on Hot Topics in Operating Systems10.1145/3458336.3465303(41-48)Online publication date: 1-Jun-2021
    • (2021)SatoriProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00031(292-305)Online publication date: 14-Jun-2021
    • (2020)HiveDProceedings of the 14th USENIX Conference on Operating Systems Design and Implementation10.5555/3488766.3488795(515-532)Online publication date: 4-Nov-2020
    • (2020)AlloXProceedings of the Fifteenth European Conference on Computer Systems10.1145/3342195.3387547(1-16)Online publication date: 15-Apr-2020
    • (2019)Fair resource allocation in consolidated flash systemsProceedings of the 11th USENIX Conference on Hot Topics in Storage and File Systems10.5555/3357062.3357091(22-22)Online publication date: 8-Jul-2019
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media