Abstract
Applications in cloud platforms motivate the study of efficient load balancing under job-server constraints and server heterogeneity. In this paper, we study load balancing on a bipartite graph where left nodes correspond to job types and right nodes correspond to servers, with each edge indicating that a job type can be served by a server. Thus edges represent locality constraints, i.e., an arbitrary job can only be served at servers which contain certain data and/or machine learning (ML) models. Servers in this system can have heterogeneous service rates. In this setting, we investigate the performance of two policies named Join-the-Fastest-of-the-Shortest-Queue (JFSQ) and Join-the-Fastest-of-the-Idle-Queue (JFIQ), which are simple variants of Join-the-Shortest-Queue and Join-the-Idle-Queue, where ties are broken in favor of the fastest servers. Under a "well-connected'' graph condition, we show that JFSQ and JFIQ are asymptotically optimal in the mean response time when the number of servers goes to infinity. In addition to asymptotic optimality, we also obtain upper bounds on the mean response time for finite-size systems. We further show that the well-connectedness condition can be satisfied by a random bipartite graph construction with relatively sparse connectivity.
- Amazon. 2020. Amazon Web Services (AWS) Cloud Computing Services. Retrieved August 3, 2020 from https://aws. amazon.comGoogle Scholar
- George Amvrosiadis, JunWoo Park, Gregory R Ganger, Garth A Gibson, Elisabeth Baseman, and Nathan DeBardeleben. 2018. On the diversity of cluster workloads and its impact on research results. In Proc. USENIX Ann. Technical Conf. (ATC). 533--546. Google Scholar
Digital Library
- Rami Atar. 2012. A diffusion regime with nondegenerate slowdown. Operations Research 60, 2 (2012), 490--500. Google Scholar
Digital Library
- Sayan Banerjee and Debankur Mukherjee. 2019. Join-the-shortest queue diffusion limit in halfin--whitt regime: Tail asymptotics and scaling of extrema. Ann. Appl. Probab. 29, 2 (2019), 1262--1309.Google Scholar
Cross Ref
- Dimitris Bertsimas, David Gamarnik, and John N. Tsitsiklis. 2001. Performance of Multiclass Markovian Queueing Networks Via Piecewise Linear Lyapunov Functions. Ann. Appl. Probab. 11, 4 (11 2001), 1384--1428.Google Scholar
- Anton Braverman. 2020. Steady-state analysis of the join-the-shortest-queue model in the halfin--whitt regime. Math. Oper. Res. (2020).Google Scholar
- Anton Braverman, JG Dai, and Jiekun Feng. 2017. Stein's method for steady-state diffusion approximations: an introduction through the Erlang-A and Erlang-C models. Stochastic Systems 6, 2 (2017), 301--366.Google Scholar
Cross Ref
- Amarjit Budhiraja, Debankur Mukherjee, and Ruoyu Wu. 2019. Supermarket model on graphs. The Annals of Applied Probability 29, 3 (2019), 1740--1777.Google Scholar
Cross Ref
- Ellen Cardinaels, Sem Borst, and Johan S. H. van Leeuwaarden. 2020. Redundancy Scheduling with Locally Stable Compatibility Graphs. arXiv:2005.14566 [math.PR]Google Scholar
- Ellen Cardinaels, Sem C Borst, and Johan SH van Leeuwaarden. 2019. Job assignment in large-scale service systems with affinity relations. Queueing Syst. 93, 3--4 (2019), 227--268.Google Scholar
Cross Ref
- James Cruise, Matthieu Jonckheere, and Seva Shneer. 2020. Stability of JSQ in queues with general server-job class compatibilities. Queueing Syst. (2020), 1--9.Google Scholar
- Jeffrey Dean and Luiz André Barroso. 2013. The tail at scale. Commun. ACM 56, 2 (2013), 74--80. Google Scholar
Digital Library
- Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107--113. Google Scholar
Digital Library
- Atilla Eryilmaz and R. Srikant. 2012. Asymptotically tight steady-state queue length bounds implied by drift conditions. Queueing Syst. 72, 3--4 (2012), 311--359. Google Scholar
Digital Library
- Patrick Eschenfeldt and David Gamarnik. 2018. Join the shortest queue with many servers. The heavy-traffic asymptotics. Math. Oper. Res. 43, 3 (2018), 867--886.Google Scholar
Cross Ref
- Serguei Foss and Natalia Chernova. 1998. On the stability of a partially accessible multi-station queue with statedependent routing. Queueing Syst. 29, 1 (1998), 55--73. Google Scholar
Digital Library
- David Gamarnik, John N Tsitsiklis, and Martin Zubeldia. 2018. Delay, memory, and messaging tradeoffs in distributed service systems. Stoch. Syst. 8, 1 (2018), 45--74. Proc. ACM Meas. Anal. Comput. Syst., Vol. 4, No. 3, Article 45. Publication date: December 2020. Optimal Load Balancing with Locality Constraints 45:23Google Scholar
Cross Ref
- David Gamarnik, John N Tsitsiklis, and Martin Zubeldia. 2020. A lower bound on the queueing delay in resource constrained load balancing. Annals of Applied Probability 30, 2 (2020), 870--901.Google Scholar
Cross Ref
- Kristen Gardner, Jazeem Abdul Jaleel, Alexander Wickeham, and Sherwin Doroudi. 2020. Scalable Load Balancing in the Presence of Heterogeneous Servers. arXiv preprint arXiv:2006.13987 (2020).Google Scholar
- Kristen Gardner and Rhonda Righter. 2020. Product Forms for FCFS Queueing Models with Arbitrary Server-Job Compatibilities: An Overview. arXiv preprint arXiv:2006.05979 (2020).Google Scholar
- Nicolas Gast. 2015. The Power of Two Choices on Graphs: the Pair-Approximation is Accurate? ACM SIGMETRICS Performance Evaluation Review 43, 2 (2015), 69--71. Google Scholar
Digital Library
- Google. 2020. Google Cloud Cloud Computing Services. Retrieved August 3, 2020 from https://cloud.google.comGoogle Scholar
- Google. 2020. Google Search. Retrieved August 3, 2020 from https://www.google.com/searchGoogle Scholar
- Arpan Gujarati, Sameh Elnikety, Yuxiong He, Kathryn S McKinley, and Björn B Brandenburg. 2017. Swayam: distributed autoscaling to meet SLAs of machine learning inference services with resource efficiency. In Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference. 109--120. Google Scholar
Digital Library
- Varun Gupta and Neil Walton. 2019. Load balancing in the nondegenerate slowdown regime. Operations Research 67, 1 (2019), 281--294.Google Scholar
Digital Library
- Itai Gurvich. 2014. Diffusion models and steady-state approximations for exponentially ergodic Markovian queues. The Annals of Applied Probability 24, 6 (2014), 2527--2559.Google Scholar
Cross Ref
- Bruce Hajek. 1982. Hitting-time and occupation-time bounds implied by drift analysis with applications. Advances in Applied probability (1982), 502--525.Google Scholar
- Daniela Hurtado-Lange and Siva Theja Maguluri. 2020. Load balancing system under join the shortest queue: Manyserver- heavy-traffic asymptotics. arXiv preprint arXiv:2004.04826 (2020).Google Scholar
- Daniela Hurtado-Lange and Siva Theja Maguluri. 2020. Throughput and Delay Optimality of Power-of-d Choices in Inhomogeneous Load Balancing Systems. arXiv preprint arXiv:2004.00538 (2020).Google Scholar
- Xin Liu, Kang Gong, and Lei Ying. 2020. Steady-State Analysis of Load Balancing with Coxian-2 Distributed Service Times. arXiv preprint arXiv:2005.09815 (2020).Google Scholar
- Xin Liu and Lei Ying. 2018. On achieving zero delay with power-of-d-choices load balancing. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications. IEEE, 297--305.Google Scholar
Cross Ref
- Xin Liu and Lei Ying. 2019. On universal scaling of distributed queues under load balancing. arXiv preprint arXiv:1912.11904 (2019).Google Scholar
- Xin Liu and Lei Ying. 2020. Steady-state analysis of load-balancing algorithms in the sub-Halfin--Whitt regime. J. Appl. Probab. 57, 2 (2020), 578--596.Google Scholar
Cross Ref
- Yi Lu, Qiaomin Xie, Gabriel Kliot, Alan Geller, James R Larus, and Albert Greenberg. 2011. Join-Idle-Queue: A novel load balancing algorithm for dynamically scalable web services. Performance Evaluation 68, 11 (2011), 1056--1071. Google Scholar
Digital Library
- Siva Theja Maguluri and R Srikant. 2016. Heavy traffic queue length behavior in a switch under the MaxWeight algorithm. Stochastic Systems 6, 1 (2016), 211--250.Google Scholar
Cross Ref
- Microsoft. 2020. Microsoft Azure Cloud Computing Services. Retrieved August 3, 2020 from https://azure.microsoft. com/en-us/Google Scholar
- Michael Mitzenmacher. 2001. The power of two choices in randomized load balancing. IEEE Transactions on Parallel and Distributed Systems 12, 10 (2001), 1094--1104. Google Scholar
Digital Library
- Sharayu Moharir, Sujay Sanghavi, and Sanjay Shakkottai. 2015. Online load balancing under graph constraints. IEEE/ACM Transactions on Networking 24, 3 (2015), 1690--1703. Google Scholar
Digital Library
- Debankur Mukherjee, Sem C Borst, and Johan SH Van Leeuwaarden. 2018. Asymptotically optimal load balancing topologies. Proceedings of the ACM on Measurement and Analysis of Computing Systems 2, 1 (2018), 1--29. Google Scholar
Digital Library
- Debankur Mukherjee, Sem C Borst, Johan SH Van Leeuwaarden, and Philip A Whiting. 2018. Universality of power-of-d load balancing in many-server systems. Stoch. Syst. 8, 4 (2018), 265--292.Google Scholar
Cross Ref
- Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. 2013. Sparrow: distributed, low latency scheduling. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. 69--84. Google Scholar
Digital Library
- Daan Rutten and Debankur Mukherjee. 2020. Load Balancing Under Strict Compatibility Constraints. (2020).Google Scholar
- Scott Shenker and Abel Weinrib. 1989. The optimal control of heterogeneous queueing systems: A paradigm for load-sharing and routing. IEEE Trans. Comput. 38, 12 (1989), 1724--1735. Google Scholar
Digital Library
- Alexander L Stolyar. 2015. Pull-based load distribution in large-scale heterogeneous service systems. Queueing Syst. 80, 4 (2015), 341--361. Google Scholar
Digital Library
- Alexander L Stolyar. 2015. Tightness of stationary distributions of a flexible-server system in the Halfin-Whitt asymptotic regime. Stochastic Systems 5, 2 (2015), 239--267.Google Scholar
Cross Ref
- John N. Tsitsiklis and Kuang Xu. 2017. Flexible Queueing Architectures. Oper. Res. 65, 5 (2017), 1398--1413. https: //doi.org/10.1287/opre.2017.1620 arXiv:https://doi.org/10.1287/opre.2017.1620 Proc. ACM Meas. Anal. Comput. Syst., Vol. 4, No. 3, Article 45. Publication date: December 2020. 45:24 Wentao Weng, et al.Google Scholar
Digital Library
- Stephen RE Turner. 1998. The effect of increasing routing choice on resource pooling. Probability in the Engineering and Informational Sciences 12, 1 (1998), 109--124.Google Scholar
Cross Ref
- Jeremy Visschers, Ivo Adan, and Gideon Weiss. 2012. A product form solution to a system with multi-type jobs and multi-type servers. Queueing Syst. 70, 3 (2012), 269--298. Google Scholar
Digital Library
- Nikita Dmitrievna Vvedenskaya, Roland Lvovich Dobrushin, and Fridrikh Izrailevich Karpelevich. 1996. Queueing system with selection of the shortest of two queues: An asymptotic approach. Problemy Peredachi Informatsii 32, 1 (1996), 20--34.Google Scholar
- Weina Wang, Siva Theja Maguluri, R Srikant, and Lei Ying. 2018. Heavy-traffic delay insensitivity in connection-level models of data transfer with proportionally fair bandwidth sharing. In Proc. ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems, Vol. 45. ACM, 232--245. Google Scholar
Digital Library
- Weina Wang, Kai Zhu, Lei Ying, Jian Tan, and Li Zhang. 2014. Maptask scheduling in mapreduce with data locality: Throughput and heavy-traffic optimality. IEEE/ACM Transactions On Networking 24, 1 (2014), 190--203. Google Scholar
Digital Library
- Richard R Weber. 1978. On the optimal assignment of customers to parallel servers. J. Appl. Probab. 15, 2 (1978), 406--413.Google Scholar
Cross Ref
- Wentao Weng and Weina Wang. 2020. Dispatching Parallel Jobs to Achieve Zero Queuing Delay. arXiv preprint arXiv:2004.02081 (2020).Google Scholar
- Qiaomin Xie and Yi Lu. 2015. Priority algorithm for near-data scheduling: Throughput and heavy-traffic optimality. In 2015 IEEE Conference on Computer Communications (INFOCOM). IEEE, 963--972.Google Scholar
Cross Ref
- Qiaomin Xie, Ali Yekkehkhany, and Yi Lu. 2016. Scheduling with multi-level data locality: Throughput and heavy-traffic optimality. In IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications. IEEE, 1--9.Google Scholar
Cross Ref
- Lei Ying. 2017. Stein's method for mean field approximations in light and heavy traffic regimes. Proceedings of the ACM on Measurement and Analysis of Computing Systems 1, 1 (2017), 1--27. Google Scholar
Digital Library
- Lei Ying, R. Srikant, and Xiaohan Kang. 2017. The power of slightly more than one sample in randomized load balancing. Math. Oper. Res. 42, 3 (2017), 692--722. Google Scholar
Digital Library
- Xingyu Zhou and Ness Shroff. 2020. A Note on Load Balancing in Many-Server Heavy-Traffic Regime. arXiv preprint arXiv:2004.09574 (2020).Google Scholar
- Xingyu Zhou, Jian Tan, and Ness Shroff. 2018. Flexible load balancing with multi-dimensional state-space collapse: Throughput and heavy-traffic delay optimality. Performance Evaluation 127 (2018), 176--193. Google Scholar
Digital Library
- Xingyu Zhou, Jian Tan, and Ness Shroff. 2018. Heavy-traffic delay optimality in pull-based load balancing systems: Necessary and sufficient conditions. Proceedings of the ACM on Measurement and Analysis of Computing Systems 2, 3 (2018), 1--33. Google Scholar
Digital Library
Index Terms
Optimal Load Balancing with Locality Constraints
Recommendations
Optimal Load Balancing with Locality Constraints
SIGMETRICS '21: Abstract Proceedings of the 2021 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer SystemsApplications in cloud platforms motivate the study of efficient load balancing under job-server constraints and server heterogeneity. In this paper, we study load balancing on a bipartite graph where left nodes correspond to job types and right nodes ...
Optimal Load Balancing with Locality Constraints
SIGMETRICS '21Applications in cloud platforms motivate the study of efficient load balancing under job-server constraints and server heterogeneity. In this paper, we study load balancing on a bipartite graph where left nodes correspond to job types and right nodes ...
Asymptotically Optimal Load Balancing Topologies
SIGMETRICS '18We consider a system of N ~servers inter-connected by some underlying graph topology~G N . Tasks with unit-mean exponential processing times arrive at the various servers as independent Poisson processes of rate lambda. Each incoming task is irrevocably ...






Comments