Abstract

Cloud computing today is dominated by multi-server jobs. These are jobs that request multiple servers simultaneously and hold onto all of these servers for the duration of the job. Multi-server jobs add a lot of complexity to the traditional one-server-per-job model: an arrival might not "fit'' into the available servers and might have to queue, blocking later arrivals and leaving servers idle. From a queueing perspective, almost nothing is understood about multi-server job queueing systems; even understanding the exact stability region is a very hard problem. In this paper, we investigate a multi-server job queueing model under scaling regimes where the number of servers in the system grows. Specifically, we consider a system with multiple classes of jobs, where jobs from different classes can request different numbers of servers and have different service time distributions, and jobs are served in first-come-first-served order. The multi-server job model opens up new scaling regimes where both the number of servers that a job needs and the system load scale with the total number of servers. Within these scaling regimes, we derive the first results on stability, queueing probability, and the transient analysis of the number of jobs in the system for each class. In particular we derive sufficient conditions for zero queueing. Our analysis introduces a novel way of extracting information from the Lyapunov drift, which can be applicable to a broader scope of problems in queueing systems.
- Mart'in Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In Proc. USENIX Conf. Operating Systems Design and Implementation (OSDI). Savannah, GA, 265--283.Google Scholar
- Larisa Afanaseva, Elena Bashtova, and Svetlana Grishunina. 2019. Stability analysis of a multi-server model with simultaneous service and a regenerative input flow. Methodology and Computing in Applied Probability (2019), 1--17.Google Scholar
- E. Arthurs and J. Kaufman. 1979. Sizing a Message Store Subject to Blocking Criteria. In Proc. Int. Symp. Computer Performance, Modeling, Measurements and Evaluation (IFIP Performance) . 547--564.Google Scholar
- François Baccelli and Serguei Foss. 1995. On the Saturation Rule for the Stability of Queues. J. Appl. Probab. , Vol. 32, 2 (1995), 494--507.Google Scholar
Cross Ref
- N. G. Bean, R. J. Gibbens, and S. Zachary. 1995. Asymptotic Analysis of Single Resource Loss Systems in Heavy Traffic, with Applications to Integrated Networks. Adv. Appl. Probab. , Vol. 27, 1 (March 1995), 273--292.Google Scholar
- N. Benameur, S. Ben Fredj, F. Delcoigne, S. Oueslati-Boulahia, and J.W. Roberts. 2001. Integrated Admission Control for Streaming and Elastic Traffic. In Int. Workshop Quality of Future Internet Services (QofIS). 69--81.Google Scholar
- Dimitris Bertsimas, David Gamarnik, and John N. Tsitsiklis. 2001. Performance of Multiclass Markovian Queueing Networks Via Piecewise Linear Lyapunov Functions. Ann. Appl. Probab. , Vol. 11, 4 (11 2001), 1384--1428.Google Scholar
- Thomas Bonald and Céline Comte. 2017. Balanced fair resource sharing in computer clusters. Perform. Eval. , Vol. 116 (2017), 70 -- 83.Google Scholar
Digital Library
- Anton Braverman, J. G. Dai, and Jiekun Feng. 2017. Stein's method for steady-state diffusion approximations: an introduction through the Erlang-A and Erlang-C models. Stoch. Syst. , Vol. 6, 2 (2017), 301--366.Google Scholar
Cross Ref
- Percy H. Brill and Linda Green. 1984. Queues in Which Customers Receive Simultaneous Service from a Random Number of Servers: A System Point Approach. Manage. Sci. , Vol. 30, 1 (1984), 51--68.Google Scholar
Digital Library
- A. Dasylva and R. Srikant. 1999. Bounds on the Performance of Admission Control and Routing Policies for General Topology Networks with Multiple Call Centers. In Proc. IEEE Int. Conf. Computer Communications (INFOCOM), Vol. 2. New York, NY, 505--512.Google Scholar
- Moez Draief and Laurent Massoulié. 2009. Epidemics and Rumours in Complex Networks .Cambridge University Press.Google Scholar
- Atilla Eryilmaz and R. Srikant. 2012. Asymptotically Tight Steady-state Queue Length Bounds Implied by Drift Conditions. Queueing Syst. , Vol. 72, 3--4 (Dec. 2012), 311--359.Google Scholar
Digital Library
- D. Filippopoulos and H. Karatza. 2007. An M/M/2 parallel system model with pure space sharing among rigid jobs. Mathematical and Computer Modelling , Vol. 45, 5 (2007), 491--530.Google Scholar
Digital Library
- Isaac Grosof, Mor Harchol-Balter, and Alan Scheller-Wolf. 2020. Stability for Two-class Multiserver-job Systems . arXiv:2010.00631.Google Scholar
- Bruce Hajek. 1982. Hitting-Time and Occupation-Time Bounds Implied by Drift Analysis with Applications. Adv. Appl. Probab. , Vol. 14, 3 (1982), 502--525.Google Scholar
Cross Ref
- Shlomo Halfin and Ward Whitt. 1981. Heavy-Traffic Limits for Queues with Many Exponential Servers. Oper. Res. , Vol. 29, 3 (1981), 567--588.Google Scholar
Digital Library
- P. J. Hunt and T. G. Kurtz. 1994. Large loss networks. Stoch. Proc. Appl. , Vol. 53, 2 (1994), 363 -- 378.Google Scholar
Cross Ref
- P. J. Hunt and C. N. Laws. 1997. Optimization via trunk reservation in single resource loss systems under heavy traffic. Ann. Appl. Probab. , Vol. 7, 4 (Nov. 1997), 1058--1079.Google Scholar
Cross Ref
- Donald L. Iglehart. 1973. Weak convergence of compound stochastic process, I . Stoch. Proc. Appl. , Vol. 1, 1 (1973), 11 -- 31.Google Scholar
Cross Ref
- Sung Shick Kim. 1979. M/M/s queueing system where customers demand multiple server use . Ph.D. Dissertation. Southern Methodist University.Google Scholar
- A. E. Krzesinski. 2011. Order Independent Queues .Springer US, Boston, MA, 85--120.Google Scholar
- Thomas G. Kurtz. 1981. Approximation of Population Processes .Society for Industrial and Applied Mathematics.Google Scholar
- Sung-Han Lin, Marco Paolieri, Cheng-Fu Chou, and Leana Golubchik. 2018. A model-based approach to streamlining distributed training for asynchronous SGD. In IEEE Int. Symp. Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS). 306--318.Google Scholar
Cross Ref
- Xin Liu. 2019. Steady State Analysis of Load Balancing Algorithms in the Heavy Traffic Regime . Ph.D. Dissertation. Arizona State University.Google Scholar
- Xin Liu, Kang Gong, and Lei Ying. 2020. Steady-State Analysis of Load Balancing with Coxian-2 Distributed Service Times. arXiv:2005.09815 [math.PR] (2020).Google Scholar
- Xin Liu and Lei Ying. 2019. On Universal Scaling of Distributed Queues under Load Balancing. arXiv:1912.11904 [math.PR] (2019).Google Scholar
- Xin Liu and Lei Ying. 2020. Steady-state analysis of load-balancing algorithms in the sub-Halfin--Whitt regime. J. Appl. Probab. , Vol. 57, 2 (2020), 578--596.Google Scholar
Cross Ref
- Yi Lu, Qiaomin Xie, Gabriel Kliot, Alan Geller, James R. Larus, and Albert Greenberg. 2011. Join-Idle-Queue: A Novel Load Balancing Algorithm for Dynamically Scalable Web Services. Perform. Eval. , Vol. 68, 11 (Nov. 2011), 1056--1071.Google Scholar
Digital Library
- Siva Theja Maguluri and R. Srikant. 2013. Scheduling jobs with unknown duration in clouds. In Proc. IEEE Int. Conf. Computer Communications (INFOCOM). 1887--1895.Google Scholar
- Siva Theja Maguluri and R. Srikant. 2016. Heavy traffic queue length behavior in a switch under the MaxWeight algorithm. Stoch. Syst. , Vol. 6, 1 (2016), 211--250.Google Scholar
Cross Ref
- Siva Theja Maguluri, R. Srikant, and Lei Ying. 2014. Heavy traffic optimal resource allocation algorithms for cloud computing clusters. Perform. Eval. , Vol. 81 (2014), 20--39.Google Scholar
Digital Library
- Agassi Melikov. 1996 a. Computation and Optimization Methods for Multiresource Queues. Cybern. Syst. Anal. , Vol. 32, 6 (1996), 821--836.Google Scholar
Cross Ref
- A. Z. Melikov. 1996 b. Computation and Optimization Methods for Multiresource Queues. Cybernetics and Systems Analysis , Vol. 32, 6 (1996).Google Scholar
- Micheal David Mitzenmacher. 1996. The Power of Two Choices in Randomized Load Balancing . Ph.D. Dissertation. University of California at Berkeley.Google Scholar
Digital Library
- Evsey Morozov and Alexander S. Rumyantsev. 2016. Stability Analysis of a MAP/M/s Cluster Model by Matrix-Analytic Method. In European Workshop Computer Performance Engineering (EPEW), Vol. 9951. Chios, Greece, 63--76.Google Scholar
- Debankur Mukherjee, Sem C. Borst, and Johan S.H. van Leeuwaarden. 2018. Asymptotically Optimal Load Balancing Topologies. Proc. ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems , Vol. 2, 1, Article 14 (April 2018), bibinfonumpages29 pages.Google Scholar
- Leonid Ponomarenko, Che Soong Kim, and Agassi Melikov. 2010. Performance analysis and optimization of multi-traffic on communication networks .Springer Science & Business Media.Google Scholar
- Konstantinos Psychas and Javad Ghaderi. 2018. On Non-Preemptive VM Scheduling in the Cloud. In Proc. ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems. Irvine, CA, 67--69.Google Scholar
Digital Library
- Konstantinos Psychas and Javad Ghaderi. 2019. Scheduling Jobs with Random Resource Requirements in Computing Clusters. In Proc. IEEE Int. Conf. Computer Communications (INFOCOM). 2269--2277.Google Scholar
Digital Library
- Alexander Rumyantsev and Evsey Morozov. 2017. Stability criterion of a multiserver model with simultaneous service. Annals of Operations Research , Vol. 252, 1 (2017), 29--39.Google Scholar
Cross Ref
- Daan Rutten and Debankur Mukherjee. 2021. Load balancing under strict compatibility constraints. In Proc. ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems .Google Scholar
Digital Library
- R. Srikant and Lei Ying. 2014. Communication Networks: An Optimization, Control and Stochastic Networks Perspective .Cambridge Univ. Press, New York.Google Scholar
Digital Library
- Alexander L. Stolyar. 2015. Pull-based load distribution in large-scale heterogeneous service systems. Queueing Syst. , Vol. 80, 4 (Aug. 2015), 341--361.Google Scholar
Digital Library
- Oleg M. Tikhonenko. 2005. Generalized Erlang Problem for Service Systems with Finite Total Capacity. Problems of Information Transmission , Vol. 41, 3 (2005), 243--253.Google Scholar
Digital Library
- Muhammad Tirmazi, Adam Barker, Nan Deng, Md E. Haque, Zhijing Gene Qin, Steven Hand, Mor Harchol-Balter, and John Wilkes. 2020. Borg: The next Generation. In Proc. European Conf. Computer Systems (EuroSys) . Heraklion, Greece, Article 30, bibinfonumpages14 pages.Google Scholar
Digital Library
- Nico M. van Dijk. 1989. Blocking of Finite Source Inputs Which Require Simultaneous Servers with General Think and Holding Times. Operations Research Letters , Vol. 8, 1 (February 1989), 45 -- 52.Google Scholar
Digital Library
- Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-scale cluster management at Google with Borg. In Proc. European Conf. Computer Systems (EuroSys). ACM, 18.Google Scholar
Digital Library
- N. D. Vvedenskaya, R. L. Dobrushin, and F. I. Karpelevich. 1996. Queueing System with Selection of the Shortest of Two Queues: An Asymptotic Approach. Probl. Inf. Transm. , Vol. 32, 1 (1996), 15--27.Google Scholar
- Weina Wang, Siva Theja Maguluri, R. Srikant, and Lei Ying. 2018. Heavy-Traffic Delay Insensitivity in Connection-Level Models of Data Transfer with Proportionally Fair Bandwidth Sharing. ACM SIGMETRICS Perform. Evaluation Rev. , Vol. 45, 3 (March 2018), 232--245.Google Scholar
Digital Library
- Weina Wang, Kai Zhu, Lei Ying, Jian Tan, and Li Zhang. 2013. A throughput optimal algorithm for map task scheduling in MapReduce with data locality. ACM SIGMETRICS Perform. Evaluation Rev. , Vol. 40, 4 (March 2013), 33--42.Google Scholar
Digital Library
- Wentao Weng and Weina Wang. 2021. Achieving Zero Asymptotic Queueing Delay for Parallel Jobs. In Proc. ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems .Google Scholar
Digital Library
- Wentao Weng, Xinyu Zhou, and R. Srikant. 2021. Optimal Load Balancing with Locality Constraints. In Proc. ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems .Google Scholar
- Ward Whitt. 1985. Blocking when service is required from several facilities simultaneously. AT&T Tech. J. , Vol. 64 (1985), 1807 -- 1856.Google Scholar
Cross Ref
- John Wilkes. 2019. Google cluster-usage traces v3. http://github.com/google/cluster-data.Google Scholar
- Qiaomin Xie, Xiaobo Dong, Yi Lu, and R. Srikant. 2015. Power of d Choices for Large-Scale Bin Packing: A Loss Model. In Proc. ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems. Portland, OR, 321--334.Google Scholar
- Qiaomin Xie and Yi Lu. 2015. Priority algorithm for near-data scheduling: Throughput and heavy-traffic optimality. In Proc. IEEE Int. Conf. Computer Communications (INFOCOM). Hong Kong, China, 963--972.Google Scholar
Cross Ref
Index Terms
Zero Queueing for Multi-Server Jobs
Recommendations
Zero Queueing for Multi-Server Jobs
SIGMETRICS '21: Abstract Proceedings of the 2021 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer SystemsCloud computing today is dominated by multi-server jobs. These are jobs that request multiple servers simultaneously and hold onto all of these servers for the duration of the job. Multi-server jobs add a lot of complexity to the traditional one-server-...
Zero Queueing for Multi-Server Jobs
SIGMETRICS '21Cloud computing today is dominated by multi-server jobs. These are jobs that request multiple servers simultaneously and hold onto all of these servers for the duration of the job. Multi-server jobs add a lot of complexity to the traditional one-server-...
An N-Server Cutoff Priority Queue
We consider a multi-priority, nonpreemptive, multi-server queueing system, with Poisson arrivals and negative exponential service times. In order to save available servers for higher priority customers, the system deliberately queues arriving lower ...






Comments