skip to main content
research-article

Hyper-Scalable JSQ with Sparse Feedback

Authors Info & Claims
Published:26 March 2019Publication History
Skip Abstract Section

Abstract

Load balancing algorithms play a vital role in enhancing performance in data centers and cloud networks. Due to the massive size of these systems, scalability challenges, and especially the communication overhead associated with load balancing mechanisms, have emerged as major concerns. Motivated by these issues, we introduce and analyze a novel class of load balancing schemes where the various servers provide occasional queue updates to guide the load assignment.

We show that the proposed schemes strongly outperform JSQ( d ) strategies with comparable communication overhead per job, and can achieve a vanishing waiting time in the many-server limit with just one message per job, just like the popular JIQ scheme. The proposed schemes are particularly geared however towards the sparse feedback regime with less than one message per job, where they outperform corresponding sparsified JIQ versions.

We investigate fluid limits for synchronous updates as well as asynchronous exponential update intervals. The fixed point of the fluid limit is identified in the latter case, and used to derive the queue length distribution. We also demonstrate that in the ultra-low feedback regime the mean stationary waiting time tends to a constant in the synchronous case, but grows without bound in the asynchronous case.

References

  1. N Alon, O Gurel-Gurevich, and E Lubetzky. 2010. Choice-memory tradeoff in allocations. Ann. Appl. Probab. , Vol. 20, 4 (08 2010), 1470--1511.Google ScholarGoogle ScholarCross RefCross Ref
  2. J Anselmi and F Dufour. 2018. Power-of- d -Choices with Memory: Fluid Limit and Optimality. arXiv preprint arXiv:1802.06566 (2018).Google ScholarGoogle Scholar
  3. R Badonnel and M Burgess. 2008. Dynamic pull-based load balancing for autonomic servers. In Network Operations and Management Symposium, 2008. NOMS 2008. IEEE. IEEE, 751--754.Google ScholarGoogle ScholarCross RefCross Ref
  4. M Bramson, Y Lu, and B Prabhakar. 2010. Randomized load balancing with general service time distributions. In ACM SIGMETRICS Performance Evaluation Review, Vol. 38(1). ACM, 275--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M Bramson, Y Lu, and B Prabhakar. 2012. Asymptotic independence of queues under randomized load balancing. Queueing Systems , Vol. 71, 3 (2012), 247--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A Ephremides, P Varaiya, and J Walrand. 1980. A simple dynamic routing problem. IEEE transactions on Automatic Control , Vol. 25, 4 (1980), 690--693.Google ScholarGoogle Scholar
  7. D Gamarnik, J N Tsitsiklis, and M Zubeldia. 2016. Delay, memory, and messaging tradeoffs in distributed service systems. ACM SIGMETRICS Performance Evaluation Review , Vol. 44, 1 (2016), 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R Gandhi, H H Liu, Y C Hu, G Lu, J Padhye, L Yuan, and M Zhang. 2014. Duet: Cloud scale load balancing with hardware and software. ACM SIGCOMM Computer Communication Review , Vol. 44, 4 (2014), 27--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N Gast. 2017. Expected values estimated via mean-field approximation are 1/N-accurate. Proceedings of the ACM on Measurement and Analysis of Computing Systems , Vol. 1, 1 (2017), 17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. V Gupta and N Walton. 2019. Load Balancing in the Nondegenerate Slowdown Regime. Operations Research (2019).Google ScholarGoogle Scholar
  11. P J Hunt and T G Kurtz. 1994. Large loss networks. Stochastic Processes and their Applications , Vol. 53, 2 (1994), 363--378.Google ScholarGoogle Scholar
  12. Y Lu, Q Xie, G Kliot, A Geller, J R Larus, and A Greenberg. 2011. Join-Idle-Queue: A novel load balancing algorithm for dynamically scalable web services. Performance Evaluation , Vol. 68, 11 (2011), 1056--1071. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M J Luczak and J R Norris. 2013. Averaging over fast variables in the fluid limit for Markov chains: application to the supermarket model with memory. The Annals of Applied Probability , Vol. 23, 3 (2013), 957--986.Google ScholarGoogle ScholarCross RefCross Ref
  14. S T Maguluri, R Srikant, and L Ying. 2012. Stochastic models of load balancing and scheduling in cloud computing clusters. In INFOCOM, 2012 Proceedings IEEE. IEEE, 702--710.Google ScholarGoogle ScholarCross RefCross Ref
  15. M Mitzenmacher. 2000. How useful is old information? IEEE Transactions on Parallel and Distributed Systems , Vol. 11, 1 (2000), 6--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M Mitzenmacher. 2001. The power of two choices in randomized load balancing. IEEE Transactions on Parallel and Distributed Systems , Vol. 12, 10 (2001), 1094--1104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M Mitzenmacher, B Prabhakar, and D Shah. 2002. Load balancing with memory. In The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings. 799--808. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D Mukherjee, S C Borst, J SH Van Leeuwaarden, and P A Whiting. 2016. Universality of load balancing schemes on the diffusion scale. Journal of Applied Probability , Vol. 53, 4 (2016), 1111--1124.Google ScholarGoogle ScholarCross RefCross Ref
  19. A Mukhopadhyay, A Karthik, and R R Mazumdar. 2016. Randomized assignment of jobs to servers in heterogeneous clusters of shared servers for low delay. Stochastic Systems , Vol. 6, 1 (2016), 90--131.Google ScholarGoogle ScholarCross RefCross Ref
  20. A Mukhopadhyay, A Karthik, R R Mazumdar, and F Guillemin. 2015. Mean field and propagation of chaos in multi-class heterogeneous loss models. Performance Evaluation , Vol. 91 (2015), 117--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A Mukhopadhyay and R R Mazumdar. 2014. Randomized routing schemes for large processor sharing systems with multiple service rates. In ACM SIGMETRICS Performance Evaluation Review , Vol. 42(1). ACM, 555--556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G Pang, R Talreja, and W Whitt. 2007. Martingale proofs of many-server heavy-traffic limits for Markovian queues. Probability Surveys , Vol. 4 (2007), 193--267.Google ScholarGoogle ScholarCross RefCross Ref
  23. P Patel, D Bansal, L Yuan, A Murthy, A Greenberg, D A Maltz, R Kern, H Kumar, M Zikos, H Wu, K Changhoon, and N Karri. 2013. Ananta: Cloud scale load balancing. ACM SIGCOMM Computer Communication Review , Vol. 43, 4 (2013), 207--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A L Stolyar. 2015. Pull-based load distribution in large-scale heterogeneous service systems. Queueing Systems , Vol. 80, 4 (2015), 341--361. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J N Tsitsiklis and K Xu. 2012. On the power of (even a little) resource pooling. Stochastic Systems , Vol. 2, 1 (2012), 1--66.Google ScholarGoogle ScholarCross RefCross Ref
  26. J N Tsitsiklis and K Xu. 2013. Queueing system topologies with limited flexibility. In ACM SIGMETRICS Performance Evaluation Review, Vol. 41(1). ACM, 167--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. N D Vvedenskaya, R L Dobrushin, and F I Karpelevich. 1996. Queueing system with selection of the shortest of two queues: An asymptotic approach. Problemy Peredachi Informatsii , Vol. 32, 1 (1996), 20--34.Google ScholarGoogle Scholar
  28. W Winston. 1977. Optimality of the shortest line discipline. Journal of Applied Probability , Vol. 14, 1 (1977), 181--189.Google ScholarGoogle ScholarCross RefCross Ref
  29. Q Xie, X Dong, Y Lu, and R Srikant. 2015. Power of d choices for large-scale bin packing: A loss model. ACM SIGMETRICS Performance Evaluation Review , Vol. 43, 1 (2015), 321--334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. L Ying. 2016. On the approximation error of mean-field models. In ACM SIGMETRICS Performance Evaluation Review , Vol. 44(1). ACM, 285--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L Ying, R Srikant, and X Kang. 2015. The power of slightly more than one sample in randomized load balancing. In Computer Communications (INFOCOM), 2015 IEEE Conference on. IEEE, 1131--1139.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Hyper-Scalable JSQ with Sparse Feedback

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!