skip to main content
research-article
Open Access

Zero Queueing for Multi-Server Jobs

Published:22 February 2021Publication History
Skip Abstract Section

Abstract

Cloud computing today is dominated by multi-server jobs. These are jobs that request multiple servers simultaneously and hold onto all of these servers for the duration of the job. Multi-server jobs add a lot of complexity to the traditional one-server-per-job model: an arrival might not "fit'' into the available servers and might have to queue, blocking later arrivals and leaving servers idle. From a queueing perspective, almost nothing is understood about multi-server job queueing systems; even understanding the exact stability region is a very hard problem. In this paper, we investigate a multi-server job queueing model under scaling regimes where the number of servers in the system grows. Specifically, we consider a system with multiple classes of jobs, where jobs from different classes can request different numbers of servers and have different service time distributions, and jobs are served in first-come-first-served order. The multi-server job model opens up new scaling regimes where both the number of servers that a job needs and the system load scale with the total number of servers. Within these scaling regimes, we derive the first results on stability, queueing probability, and the transient analysis of the number of jobs in the system for each class. In particular we derive sufficient conditions for zero queueing. Our analysis introduces a novel way of extracting information from the Lyapunov drift, which can be applicable to a broader scope of problems in queueing systems.

References

  1. Mart'in Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In Proc. USENIX Conf. Operating Systems Design and Implementation (OSDI). Savannah, GA, 265--283.Google ScholarGoogle Scholar
  2. Larisa Afanaseva, Elena Bashtova, and Svetlana Grishunina. 2019. Stability analysis of a multi-server model with simultaneous service and a regenerative input flow. Methodology and Computing in Applied Probability (2019), 1--17.Google ScholarGoogle Scholar
  3. E. Arthurs and J. Kaufman. 1979. Sizing a Message Store Subject to Blocking Criteria. In Proc. Int. Symp. Computer Performance, Modeling, Measurements and Evaluation (IFIP Performance) . 547--564.Google ScholarGoogle Scholar
  4. François Baccelli and Serguei Foss. 1995. On the Saturation Rule for the Stability of Queues. J. Appl. Probab. , Vol. 32, 2 (1995), 494--507.Google ScholarGoogle ScholarCross RefCross Ref
  5. N. G. Bean, R. J. Gibbens, and S. Zachary. 1995. Asymptotic Analysis of Single Resource Loss Systems in Heavy Traffic, with Applications to Integrated Networks. Adv. Appl. Probab. , Vol. 27, 1 (March 1995), 273--292.Google ScholarGoogle Scholar
  6. N. Benameur, S. Ben Fredj, F. Delcoigne, S. Oueslati-Boulahia, and J.W. Roberts. 2001. Integrated Admission Control for Streaming and Elastic Traffic. In Int. Workshop Quality of Future Internet Services (QofIS). 69--81.Google ScholarGoogle Scholar
  7. Dimitris Bertsimas, David Gamarnik, and John N. Tsitsiklis. 2001. Performance of Multiclass Markovian Queueing Networks Via Piecewise Linear Lyapunov Functions. Ann. Appl. Probab. , Vol. 11, 4 (11 2001), 1384--1428.Google ScholarGoogle Scholar
  8. Thomas Bonald and Céline Comte. 2017. Balanced fair resource sharing in computer clusters. Perform. Eval. , Vol. 116 (2017), 70 -- 83.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Anton Braverman, J. G. Dai, and Jiekun Feng. 2017. Stein's method for steady-state diffusion approximations: an introduction through the Erlang-A and Erlang-C models. Stoch. Syst. , Vol. 6, 2 (2017), 301--366.Google ScholarGoogle ScholarCross RefCross Ref
  10. Percy H. Brill and Linda Green. 1984. Queues in Which Customers Receive Simultaneous Service from a Random Number of Servers: A System Point Approach. Manage. Sci. , Vol. 30, 1 (1984), 51--68.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Dasylva and R. Srikant. 1999. Bounds on the Performance of Admission Control and Routing Policies for General Topology Networks with Multiple Call Centers. In Proc. IEEE Int. Conf. Computer Communications (INFOCOM), Vol. 2. New York, NY, 505--512.Google ScholarGoogle Scholar
  12. Moez Draief and Laurent Massoulié. 2009. Epidemics and Rumours in Complex Networks .Cambridge University Press.Google ScholarGoogle Scholar
  13. Atilla Eryilmaz and R. Srikant. 2012. Asymptotically Tight Steady-state Queue Length Bounds Implied by Drift Conditions. Queueing Syst. , Vol. 72, 3--4 (Dec. 2012), 311--359.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Filippopoulos and H. Karatza. 2007. An M/M/2 parallel system model with pure space sharing among rigid jobs. Mathematical and Computer Modelling , Vol. 45, 5 (2007), 491--530.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Isaac Grosof, Mor Harchol-Balter, and Alan Scheller-Wolf. 2020. Stability for Two-class Multiserver-job Systems . arXiv:2010.00631.Google ScholarGoogle Scholar
  16. Bruce Hajek. 1982. Hitting-Time and Occupation-Time Bounds Implied by Drift Analysis with Applications. Adv. Appl. Probab. , Vol. 14, 3 (1982), 502--525.Google ScholarGoogle ScholarCross RefCross Ref
  17. Shlomo Halfin and Ward Whitt. 1981. Heavy-Traffic Limits for Queues with Many Exponential Servers. Oper. Res. , Vol. 29, 3 (1981), 567--588.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. J. Hunt and T. G. Kurtz. 1994. Large loss networks. Stoch. Proc. Appl. , Vol. 53, 2 (1994), 363 -- 378.Google ScholarGoogle ScholarCross RefCross Ref
  19. P. J. Hunt and C. N. Laws. 1997. Optimization via trunk reservation in single resource loss systems under heavy traffic. Ann. Appl. Probab. , Vol. 7, 4 (Nov. 1997), 1058--1079.Google ScholarGoogle ScholarCross RefCross Ref
  20. Donald L. Iglehart. 1973. Weak convergence of compound stochastic process, I . Stoch. Proc. Appl. , Vol. 1, 1 (1973), 11 -- 31.Google ScholarGoogle ScholarCross RefCross Ref
  21. Sung Shick Kim. 1979. M/M/s queueing system where customers demand multiple server use . Ph.D. Dissertation. Southern Methodist University.Google ScholarGoogle Scholar
  22. A. E. Krzesinski. 2011. Order Independent Queues .Springer US, Boston, MA, 85--120.Google ScholarGoogle Scholar
  23. Thomas G. Kurtz. 1981. Approximation of Population Processes .Society for Industrial and Applied Mathematics.Google ScholarGoogle Scholar
  24. Sung-Han Lin, Marco Paolieri, Cheng-Fu Chou, and Leana Golubchik. 2018. A model-based approach to streamlining distributed training for asynchronous SGD. In IEEE Int. Symp. Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS). 306--318.Google ScholarGoogle ScholarCross RefCross Ref
  25. Xin Liu. 2019. Steady State Analysis of Load Balancing Algorithms in the Heavy Traffic Regime . Ph.D. Dissertation. Arizona State University.Google ScholarGoogle Scholar
  26. Xin Liu, Kang Gong, and Lei Ying. 2020. Steady-State Analysis of Load Balancing with Coxian-2 Distributed Service Times. arXiv:2005.09815 [math.PR] (2020).Google ScholarGoogle Scholar
  27. Xin Liu and Lei Ying. 2019. On Universal Scaling of Distributed Queues under Load Balancing. arXiv:1912.11904 [math.PR] (2019).Google ScholarGoogle Scholar
  28. Xin Liu and Lei Ying. 2020. Steady-state analysis of load-balancing algorithms in the sub-Halfin--Whitt regime. J. Appl. Probab. , Vol. 57, 2 (2020), 578--596.Google ScholarGoogle ScholarCross RefCross Ref
  29. Yi Lu, Qiaomin Xie, Gabriel Kliot, Alan Geller, James R. Larus, and Albert Greenberg. 2011. Join-Idle-Queue: A Novel Load Balancing Algorithm for Dynamically Scalable Web Services. Perform. Eval. , Vol. 68, 11 (Nov. 2011), 1056--1071.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Siva Theja Maguluri and R. Srikant. 2013. Scheduling jobs with unknown duration in clouds. In Proc. IEEE Int. Conf. Computer Communications (INFOCOM). 1887--1895.Google ScholarGoogle Scholar
  31. Siva Theja Maguluri and R. Srikant. 2016. Heavy traffic queue length behavior in a switch under the MaxWeight algorithm. Stoch. Syst. , Vol. 6, 1 (2016), 211--250.Google ScholarGoogle ScholarCross RefCross Ref
  32. Siva Theja Maguluri, R. Srikant, and Lei Ying. 2014. Heavy traffic optimal resource allocation algorithms for cloud computing clusters. Perform. Eval. , Vol. 81 (2014), 20--39.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Agassi Melikov. 1996 a. Computation and Optimization Methods for Multiresource Queues. Cybern. Syst. Anal. , Vol. 32, 6 (1996), 821--836.Google ScholarGoogle ScholarCross RefCross Ref
  34. A. Z. Melikov. 1996 b. Computation and Optimization Methods for Multiresource Queues. Cybernetics and Systems Analysis , Vol. 32, 6 (1996).Google ScholarGoogle Scholar
  35. Micheal David Mitzenmacher. 1996. The Power of Two Choices in Randomized Load Balancing . Ph.D. Dissertation. University of California at Berkeley.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Evsey Morozov and Alexander S. Rumyantsev. 2016. Stability Analysis of a MAP/M/s Cluster Model by Matrix-Analytic Method. In European Workshop Computer Performance Engineering (EPEW), Vol. 9951. Chios, Greece, 63--76.Google ScholarGoogle Scholar
  37. Debankur Mukherjee, Sem C. Borst, and Johan S.H. van Leeuwaarden. 2018. Asymptotically Optimal Load Balancing Topologies. Proc. ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems , Vol. 2, 1, Article 14 (April 2018), bibinfonumpages29 pages.Google ScholarGoogle Scholar
  38. Leonid Ponomarenko, Che Soong Kim, and Agassi Melikov. 2010. Performance analysis and optimization of multi-traffic on communication networks .Springer Science & Business Media.Google ScholarGoogle Scholar
  39. Konstantinos Psychas and Javad Ghaderi. 2018. On Non-Preemptive VM Scheduling in the Cloud. In Proc. ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems. Irvine, CA, 67--69.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Konstantinos Psychas and Javad Ghaderi. 2019. Scheduling Jobs with Random Resource Requirements in Computing Clusters. In Proc. IEEE Int. Conf. Computer Communications (INFOCOM). 2269--2277.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Alexander Rumyantsev and Evsey Morozov. 2017. Stability criterion of a multiserver model with simultaneous service. Annals of Operations Research , Vol. 252, 1 (2017), 29--39.Google ScholarGoogle ScholarCross RefCross Ref
  42. Daan Rutten and Debankur Mukherjee. 2021. Load balancing under strict compatibility constraints. In Proc. ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems .Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. R. Srikant and Lei Ying. 2014. Communication Networks: An Optimization, Control and Stochastic Networks Perspective .Cambridge Univ. Press, New York.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Alexander L. Stolyar. 2015. Pull-based load distribution in large-scale heterogeneous service systems. Queueing Syst. , Vol. 80, 4 (Aug. 2015), 341--361.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Oleg M. Tikhonenko. 2005. Generalized Erlang Problem for Service Systems with Finite Total Capacity. Problems of Information Transmission , Vol. 41, 3 (2005), 243--253.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Muhammad Tirmazi, Adam Barker, Nan Deng, Md E. Haque, Zhijing Gene Qin, Steven Hand, Mor Harchol-Balter, and John Wilkes. 2020. Borg: The next Generation. In Proc. European Conf. Computer Systems (EuroSys) . Heraklion, Greece, Article 30, bibinfonumpages14 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Nico M. van Dijk. 1989. Blocking of Finite Source Inputs Which Require Simultaneous Servers with General Think and Holding Times. Operations Research Letters , Vol. 8, 1 (February 1989), 45 -- 52.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-scale cluster management at Google with Borg. In Proc. European Conf. Computer Systems (EuroSys). ACM, 18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. N. D. Vvedenskaya, R. L. Dobrushin, and F. I. Karpelevich. 1996. Queueing System with Selection of the Shortest of Two Queues: An Asymptotic Approach. Probl. Inf. Transm. , Vol. 32, 1 (1996), 15--27.Google ScholarGoogle Scholar
  50. Weina Wang, Siva Theja Maguluri, R. Srikant, and Lei Ying. 2018. Heavy-Traffic Delay Insensitivity in Connection-Level Models of Data Transfer with Proportionally Fair Bandwidth Sharing. ACM SIGMETRICS Perform. Evaluation Rev. , Vol. 45, 3 (March 2018), 232--245.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Weina Wang, Kai Zhu, Lei Ying, Jian Tan, and Li Zhang. 2013. A throughput optimal algorithm for map task scheduling in MapReduce with data locality. ACM SIGMETRICS Perform. Evaluation Rev. , Vol. 40, 4 (March 2013), 33--42.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Wentao Weng and Weina Wang. 2021. Achieving Zero Asymptotic Queueing Delay for Parallel Jobs. In Proc. ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems .Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Wentao Weng, Xinyu Zhou, and R. Srikant. 2021. Optimal Load Balancing with Locality Constraints. In Proc. ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems .Google ScholarGoogle Scholar
  54. Ward Whitt. 1985. Blocking when service is required from several facilities simultaneously. AT&T Tech. J. , Vol. 64 (1985), 1807 -- 1856.Google ScholarGoogle ScholarCross RefCross Ref
  55. John Wilkes. 2019. Google cluster-usage traces v3. http://github.com/google/cluster-data.Google ScholarGoogle Scholar
  56. Qiaomin Xie, Xiaobo Dong, Yi Lu, and R. Srikant. 2015. Power of d Choices for Large-Scale Bin Packing: A Loss Model. In Proc. ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems. Portland, OR, 321--334.Google ScholarGoogle Scholar
  57. Qiaomin Xie and Yi Lu. 2015. Priority algorithm for near-data scheduling: Throughput and heavy-traffic optimality. In Proc. IEEE Int. Conf. Computer Communications (INFOCOM). Hong Kong, China, 963--972.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Zero Queueing for Multi-Server Jobs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!