skip to main content
research-article
Public Access

Optimal Load Balancing with Locality Constraints

Published:15 June 2021Publication History
Skip Abstract Section

Abstract

Applications in cloud platforms motivate the study of efficient load balancing under job-server constraints and server heterogeneity. In this paper, we study load balancing on a bipartite graph where left nodes correspond to job types and right nodes correspond to servers, with each edge indicating that a job type can be served by a server. Thus edges represent locality constraints, i.e., an arbitrary job can only be served at servers which contain certain data and/or machine learning (ML) models. Servers in this system can have heterogeneous service rates. In this setting, we investigate the performance of two policies named Join-the-Fastest-of-the-Shortest-Queue (JFSQ) and Join-the-Fastest-of-the-Idle-Queue (JFIQ), which are simple variants of Join-the-Shortest-Queue and Join-the-Idle-Queue, where ties are broken in favor of the fastest servers. Under a "well-connected'' graph condition, we show that JFSQ and JFIQ are asymptotically optimal in the mean response time when the number of servers goes to infinity. In addition to asymptotic optimality, we also obtain upper bounds on the mean response time for finite-size systems. We further show that the well-connectedness condition can be satisfied by a random bipartite graph construction with relatively sparse connectivity.

References

  1. Amazon. 2020. Amazon Web Services (AWS) Cloud Computing Services. Retrieved August 3, 2020 from https://aws. amazon.comGoogle ScholarGoogle Scholar
  2. George Amvrosiadis, JunWoo Park, Gregory R Ganger, Garth A Gibson, Elisabeth Baseman, and Nathan DeBardeleben. 2018. On the diversity of cluster workloads and its impact on research results. In Proc. USENIX Ann. Technical Conf. (ATC). 533--546. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Rami Atar. 2012. A diffusion regime with nondegenerate slowdown. Operations Research 60, 2 (2012), 490--500. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Sayan Banerjee and Debankur Mukherjee. 2019. Join-the-shortest queue diffusion limit in halfin--whitt regime: Tail asymptotics and scaling of extrema. Ann. Appl. Probab. 29, 2 (2019), 1262--1309.Google ScholarGoogle ScholarCross RefCross Ref
  5. Dimitris Bertsimas, David Gamarnik, and John N. Tsitsiklis. 2001. Performance of Multiclass Markovian Queueing Networks Via Piecewise Linear Lyapunov Functions. Ann. Appl. Probab. 11, 4 (11 2001), 1384--1428.Google ScholarGoogle Scholar
  6. Anton Braverman. 2020. Steady-state analysis of the join-the-shortest-queue model in the halfin--whitt regime. Math. Oper. Res. (2020).Google ScholarGoogle Scholar
  7. Anton Braverman, JG Dai, and Jiekun Feng. 2017. Stein's method for steady-state diffusion approximations: an introduction through the Erlang-A and Erlang-C models. Stochastic Systems 6, 2 (2017), 301--366.Google ScholarGoogle ScholarCross RefCross Ref
  8. Amarjit Budhiraja, Debankur Mukherjee, and Ruoyu Wu. 2019. Supermarket model on graphs. The Annals of Applied Probability 29, 3 (2019), 1740--1777.Google ScholarGoogle ScholarCross RefCross Ref
  9. Ellen Cardinaels, Sem Borst, and Johan S. H. van Leeuwaarden. 2020. Redundancy Scheduling with Locally Stable Compatibility Graphs. arXiv:2005.14566 [math.PR]Google ScholarGoogle Scholar
  10. Ellen Cardinaels, Sem C Borst, and Johan SH van Leeuwaarden. 2019. Job assignment in large-scale service systems with affinity relations. Queueing Syst. 93, 3--4 (2019), 227--268.Google ScholarGoogle ScholarCross RefCross Ref
  11. James Cruise, Matthieu Jonckheere, and Seva Shneer. 2020. Stability of JSQ in queues with general server-job class compatibilities. Queueing Syst. (2020), 1--9.Google ScholarGoogle Scholar
  12. Jeffrey Dean and Luiz André Barroso. 2013. The tail at scale. Commun. ACM 56, 2 (2013), 74--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Atilla Eryilmaz and R. Srikant. 2012. Asymptotically tight steady-state queue length bounds implied by drift conditions. Queueing Syst. 72, 3--4 (2012), 311--359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Patrick Eschenfeldt and David Gamarnik. 2018. Join the shortest queue with many servers. The heavy-traffic asymptotics. Math. Oper. Res. 43, 3 (2018), 867--886.Google ScholarGoogle ScholarCross RefCross Ref
  16. Serguei Foss and Natalia Chernova. 1998. On the stability of a partially accessible multi-station queue with statedependent routing. Queueing Syst. 29, 1 (1998), 55--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. David Gamarnik, John N Tsitsiklis, and Martin Zubeldia. 2018. Delay, memory, and messaging tradeoffs in distributed service systems. Stoch. Syst. 8, 1 (2018), 45--74. Proc. ACM Meas. Anal. Comput. Syst., Vol. 4, No. 3, Article 45. Publication date: December 2020. Optimal Load Balancing with Locality Constraints 45:23Google ScholarGoogle ScholarCross RefCross Ref
  18. David Gamarnik, John N Tsitsiklis, and Martin Zubeldia. 2020. A lower bound on the queueing delay in resource constrained load balancing. Annals of Applied Probability 30, 2 (2020), 870--901.Google ScholarGoogle ScholarCross RefCross Ref
  19. Kristen Gardner, Jazeem Abdul Jaleel, Alexander Wickeham, and Sherwin Doroudi. 2020. Scalable Load Balancing in the Presence of Heterogeneous Servers. arXiv preprint arXiv:2006.13987 (2020).Google ScholarGoogle Scholar
  20. Kristen Gardner and Rhonda Righter. 2020. Product Forms for FCFS Queueing Models with Arbitrary Server-Job Compatibilities: An Overview. arXiv preprint arXiv:2006.05979 (2020).Google ScholarGoogle Scholar
  21. Nicolas Gast. 2015. The Power of Two Choices on Graphs: the Pair-Approximation is Accurate? ACM SIGMETRICS Performance Evaluation Review 43, 2 (2015), 69--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Google. 2020. Google Cloud Cloud Computing Services. Retrieved August 3, 2020 from https://cloud.google.comGoogle ScholarGoogle Scholar
  23. Google. 2020. Google Search. Retrieved August 3, 2020 from https://www.google.com/searchGoogle ScholarGoogle Scholar
  24. Arpan Gujarati, Sameh Elnikety, Yuxiong He, Kathryn S McKinley, and Björn B Brandenburg. 2017. Swayam: distributed autoscaling to meet SLAs of machine learning inference services with resource efficiency. In Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference. 109--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Varun Gupta and Neil Walton. 2019. Load balancing in the nondegenerate slowdown regime. Operations Research 67, 1 (2019), 281--294.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Itai Gurvich. 2014. Diffusion models and steady-state approximations for exponentially ergodic Markovian queues. The Annals of Applied Probability 24, 6 (2014), 2527--2559.Google ScholarGoogle ScholarCross RefCross Ref
  27. Bruce Hajek. 1982. Hitting-time and occupation-time bounds implied by drift analysis with applications. Advances in Applied probability (1982), 502--525.Google ScholarGoogle Scholar
  28. Daniela Hurtado-Lange and Siva Theja Maguluri. 2020. Load balancing system under join the shortest queue: Manyserver- heavy-traffic asymptotics. arXiv preprint arXiv:2004.04826 (2020).Google ScholarGoogle Scholar
  29. Daniela Hurtado-Lange and Siva Theja Maguluri. 2020. Throughput and Delay Optimality of Power-of-d Choices in Inhomogeneous Load Balancing Systems. arXiv preprint arXiv:2004.00538 (2020).Google ScholarGoogle Scholar
  30. Xin Liu, Kang Gong, and Lei Ying. 2020. Steady-State Analysis of Load Balancing with Coxian-2 Distributed Service Times. arXiv preprint arXiv:2005.09815 (2020).Google ScholarGoogle Scholar
  31. Xin Liu and Lei Ying. 2018. On achieving zero delay with power-of-d-choices load balancing. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications. IEEE, 297--305.Google ScholarGoogle ScholarCross RefCross Ref
  32. Xin Liu and Lei Ying. 2019. On universal scaling of distributed queues under load balancing. arXiv preprint arXiv:1912.11904 (2019).Google ScholarGoogle Scholar
  33. Xin Liu and Lei Ying. 2020. Steady-state analysis of load-balancing algorithms in the sub-Halfin--Whitt regime. J. Appl. Probab. 57, 2 (2020), 578--596.Google ScholarGoogle ScholarCross RefCross Ref
  34. Yi Lu, Qiaomin Xie, Gabriel Kliot, Alan Geller, James R Larus, and Albert Greenberg. 2011. Join-Idle-Queue: A novel load balancing algorithm for dynamically scalable web services. Performance Evaluation 68, 11 (2011), 1056--1071. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Siva Theja Maguluri and R Srikant. 2016. Heavy traffic queue length behavior in a switch under the MaxWeight algorithm. Stochastic Systems 6, 1 (2016), 211--250.Google ScholarGoogle ScholarCross RefCross Ref
  36. Microsoft. 2020. Microsoft Azure Cloud Computing Services. Retrieved August 3, 2020 from https://azure.microsoft. com/en-us/Google ScholarGoogle Scholar
  37. Michael Mitzenmacher. 2001. The power of two choices in randomized load balancing. IEEE Transactions on Parallel and Distributed Systems 12, 10 (2001), 1094--1104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Sharayu Moharir, Sujay Sanghavi, and Sanjay Shakkottai. 2015. Online load balancing under graph constraints. IEEE/ACM Transactions on Networking 24, 3 (2015), 1690--1703. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Debankur Mukherjee, Sem C Borst, and Johan SH Van Leeuwaarden. 2018. Asymptotically optimal load balancing topologies. Proceedings of the ACM on Measurement and Analysis of Computing Systems 2, 1 (2018), 1--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Debankur Mukherjee, Sem C Borst, Johan SH Van Leeuwaarden, and Philip A Whiting. 2018. Universality of power-of-d load balancing in many-server systems. Stoch. Syst. 8, 4 (2018), 265--292.Google ScholarGoogle ScholarCross RefCross Ref
  41. Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. 2013. Sparrow: distributed, low latency scheduling. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. 69--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Daan Rutten and Debankur Mukherjee. 2020. Load Balancing Under Strict Compatibility Constraints. (2020).Google ScholarGoogle Scholar
  43. Scott Shenker and Abel Weinrib. 1989. The optimal control of heterogeneous queueing systems: A paradigm for load-sharing and routing. IEEE Trans. Comput. 38, 12 (1989), 1724--1735. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Alexander L Stolyar. 2015. Pull-based load distribution in large-scale heterogeneous service systems. Queueing Syst. 80, 4 (2015), 341--361. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Alexander L Stolyar. 2015. Tightness of stationary distributions of a flexible-server system in the Halfin-Whitt asymptotic regime. Stochastic Systems 5, 2 (2015), 239--267.Google ScholarGoogle ScholarCross RefCross Ref
  46. John N. Tsitsiklis and Kuang Xu. 2017. Flexible Queueing Architectures. Oper. Res. 65, 5 (2017), 1398--1413. https: //doi.org/10.1287/opre.2017.1620 arXiv:https://doi.org/10.1287/opre.2017.1620 Proc. ACM Meas. Anal. Comput. Syst., Vol. 4, No. 3, Article 45. Publication date: December 2020. 45:24 Wentao Weng, et al.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Stephen RE Turner. 1998. The effect of increasing routing choice on resource pooling. Probability in the Engineering and Informational Sciences 12, 1 (1998), 109--124.Google ScholarGoogle ScholarCross RefCross Ref
  48. Jeremy Visschers, Ivo Adan, and Gideon Weiss. 2012. A product form solution to a system with multi-type jobs and multi-type servers. Queueing Syst. 70, 3 (2012), 269--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Nikita Dmitrievna Vvedenskaya, Roland Lvovich Dobrushin, and Fridrikh Izrailevich Karpelevich. 1996. Queueing system with selection of the shortest of two queues: An asymptotic approach. Problemy Peredachi Informatsii 32, 1 (1996), 20--34.Google ScholarGoogle Scholar
  50. Weina Wang, Siva Theja Maguluri, R Srikant, and Lei Ying. 2018. Heavy-traffic delay insensitivity in connection-level models of data transfer with proportionally fair bandwidth sharing. In Proc. ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems, Vol. 45. ACM, 232--245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Weina Wang, Kai Zhu, Lei Ying, Jian Tan, and Li Zhang. 2014. Maptask scheduling in mapreduce with data locality: Throughput and heavy-traffic optimality. IEEE/ACM Transactions On Networking 24, 1 (2014), 190--203. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Richard R Weber. 1978. On the optimal assignment of customers to parallel servers. J. Appl. Probab. 15, 2 (1978), 406--413.Google ScholarGoogle ScholarCross RefCross Ref
  53. Wentao Weng and Weina Wang. 2020. Dispatching Parallel Jobs to Achieve Zero Queuing Delay. arXiv preprint arXiv:2004.02081 (2020).Google ScholarGoogle Scholar
  54. Qiaomin Xie and Yi Lu. 2015. Priority algorithm for near-data scheduling: Throughput and heavy-traffic optimality. In 2015 IEEE Conference on Computer Communications (INFOCOM). IEEE, 963--972.Google ScholarGoogle ScholarCross RefCross Ref
  55. Qiaomin Xie, Ali Yekkehkhany, and Yi Lu. 2016. Scheduling with multi-level data locality: Throughput and heavy-traffic optimality. In IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications. IEEE, 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  56. Lei Ying. 2017. Stein's method for mean field approximations in light and heavy traffic regimes. Proceedings of the ACM on Measurement and Analysis of Computing Systems 1, 1 (2017), 1--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Lei Ying, R. Srikant, and Xiaohan Kang. 2017. The power of slightly more than one sample in randomized load balancing. Math. Oper. Res. 42, 3 (2017), 692--722. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Xingyu Zhou and Ness Shroff. 2020. A Note on Load Balancing in Many-Server Heavy-Traffic Regime. arXiv preprint arXiv:2004.09574 (2020).Google ScholarGoogle Scholar
  59. Xingyu Zhou, Jian Tan, and Ness Shroff. 2018. Flexible load balancing with multi-dimensional state-space collapse: Throughput and heavy-traffic delay optimality. Performance Evaluation 127 (2018), 176--193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Xingyu Zhou, Jian Tan, and Ness Shroff. 2018. Heavy-traffic delay optimality in pull-based load balancing systems: Necessary and sufficient conditions. Proceedings of the ACM on Measurement and Analysis of Computing Systems 2, 3 (2018), 1--33. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimal Load Balancing with Locality Constraints

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!