Abstract
Zero queueing delay is highly desirable in large-scale computing systems. Existing work has shown that it can be asymptotically achieved by using the celebrated Power-of-d-choices (pod) policy with a probe overhead $d = ømegałeft(\fracłog N 1-łambda \right)$, and it is impossible when $d = Ołeft(\frac1 1-łambda \right)$, where N is the number of servers and $łambda$ is the load of the system. However, these results are based on the model where each job is an indivisible unit, which does not capture the parallel structure of jobs in today's predominant parallel computing paradigm. This paper thus considers a model where each job consists of a batch of parallel tasks. Under this model, we propose a new notion of zero (asymptotic) queueing delay that requires the job delay under a policy to approach the job delay given by the max of its tasks' service times, i.e., the job delay assuming its tasks entered service right upon arrival. This notion quantifies the effect of queueing on a job level for jobs consisting of multiple tasks, and thus deviates from the conventional zero queueing delay for single-task jobs in the literature. We show that zero queueing delay for parallel jobs can be achieved using the batch-filling policy (a variant of the celebrated \pod\ policy) with a probe overhead $d = ømegałeft(\frac1 (1-łambda)łog k \right)$ in the sub-Halfin-Whitt heavy-traffic regime, where k is the number of tasks in each job and k properly scales with N (the number of servers). This result demonstrates that for parallel jobs, zero queueing delay can be achieved with a smaller probe overhead. We also establish an impossibility result: we show that zero queueing delay cannot be achieved if $d = \expłeft(ołeft(\fracłog N łog k \right) \right)$. Simulation results are provided to demonstrate the consistency between numerical results and theoretical results under reasonable settings, and to investigate gaps in the theoretical analysis.
- George Amvrosiadis, Jun Woo Park, Gregory R Ganger, Garth A Gibson, Elisabeth Baseman, and Nathan DeBardeleben. 2018. On the diversity of cluster workloads and its impact on research results. In Proc. USENIX Ann. Technical Conf. (ATC). 533--546. Google Scholar
Digital Library
- Sayan Banerjee and Debankur Mukherjee. 2019. Join-the-shortest queue diffusion limit in Halfin--Whitt regime: Tail asymptotics and scaling of extrema. Ann. Appl. Probab. , Vol. 29, 2 (2019), 1262--1309.Google Scholar
Cross Ref
- Dimitris Bertsimas, David Gamarnik, and John N. Tsitsiklis. 2001. Performance of Multiclass Markovian Queueing Networks Via Piecewise Linear Lyapunov Functions. Ann. Appl. Probab. , Vol. 11, 4 (11 2001), 1384--1428.Google Scholar
- Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou. 2014. Apollo: Scalable and coordinated scheduling for cloud-scale computing. In Proc. USENIX Conf. Operating Systems Design and Implementation (OSDI) . USENIX, 285--300. Google Scholar
Digital Library
- Anton Braverman. 2018. Steady-state analysis of the Join the Shortest Queue model in the Halfin-Whitt regime. arXiv:1801.05121 [math.PR] (2018).Google Scholar
- Anton Braverman and JG Dai. 2017. Stein's method for steady-state diffusion approximations of $ M/mathit Ph/nGoogle Scholar
- M$ systems. Ann. Appl. Probab. , Vol. 27 (Feb. 2017), 550--581. https://doi.org/10.1214/16-AAP1211Google Scholar
- Anton Braverman, JG Dai, and Jiekun Feng. 2017. Stein's method for steady-state diffusion approximations: an introduction through the Erlang-A and Erlang-C models. Stoch. Syst. , Vol. 6, 2 (2017), 301--366.Google Scholar
Cross Ref
- Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: amazon's highly available key-value store. SIGOPS Oper. Syst. Rev. , Vol. 41, 6 (2007), 205--220. Google Scholar
Digital Library
- Christina Delimitrou, Daniel Sanchez, and Christos Kozyrakis. 2015. Tarcil: reconciling scheduling speed and quality in large shared clusters. In Proc. ACM Symp. Cloud Computing (SOCC). 97--110. Google Scholar
Digital Library
- Patrick Eschenfeldt and David Gamarnik. 2018. Join the shortest queue with many servers. The heavy-traffic asymptotics. Math. Oper. Res. , Vol. 43, 3 (2018), 867--886.Google Scholar
Cross Ref
- Stewart N. Ethier and Thomas G. Kurtz. 1986. Markov Processes: Characterization and Convergence .John Wiley & Sons, New York.Google Scholar
- David Gamarnik, John N Tsitsiklis, and Martin Zubeldia. 2016. Delay, memory, and messaging tradeoffs in distributed service systems. In Proc. ACM SIGMETRICS/PERFORMANCE Jt. Int. Conf. Measurement and Modeling of Computer Systems. ACM, 1--12. Google Scholar
Digital Library
- Kristen Gardner, Mor Harchol-Balter, and Alan Scheller-Wolf. 2016. A Better Model for Job Redundancy: Decoupling Server Slowdown and Job Size. In IEEE Int. Symp. Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS). London, United Kingdom, 1--10.Google Scholar
Cross Ref
- Nicolas Gast. 2017. Expected Values Estimated via Mean-Field Approximation are 1/N-Accurate. In Proc. ACM Measurement and Analysis of Computing Systems (POMACS) , Vol. 45. ACM, 50--50. Google Scholar
Digital Library
- Nicolas Gast and Benny Van Houdt. 2017. A refined mean field approximation. In Proc. ACM Measurement and Analysis of Computing Systems (POMACS), Vol. 1. ACM, 33. Google Scholar
Digital Library
- Ionel Gog, Malte Schwarzkopf, Adam Gleave, Robert NM Watson, and Steven Hand. 2016. Firmament: Fast, centralized cluster scheduling at scale. In Proc. USENIX Conf. Operating Systems Design and Implementation (OSDI). USENIX, 99--115. Google Scholar
Digital Library
- Varun Gupta and Neil Walton. 2019. Load Balancing in the Nondegenerate Slowdown Regime. Oper. Res. , Vol. 67, 1 (2019), 281--294.Google Scholar
Digital Library
- Itai Gurvich. 2014. Diffusion models and steady-state approximations for exponentially ergodic Markovian queues. Ann. Appl. Probab. , Vol. 24, 6 (2014), 2527--2559.Google Scholar
Cross Ref
- Shlomo Halfin and Ward Whitt. 1981. Heavy-traffic limits for queues with many exponential servers. Oper. Res. , Vol. 29, 3 (1981), 567--588. Google Scholar
Digital Library
- Wassily Hoeffding. 1963. Probability Inequalities for Sums of Bounded Random Variables. J. Amer. Stat. Assoc. , Vol. 58, 301 (1963), 13--30. http://www.jstor.org/stable/2282952Google Scholar
Cross Ref
- Eric Jonas, Qifan Pu, Shivaram Venkataraman, Ion Stoica, and Benjamin Recht. 2017. Occupy the cloud: Distributed computing for the 99%. In Proc. ACM Symp. Cloud Computing (SOCC) . 445--451. Google Scholar
Digital Library
- Avinash Lakshman and Prashant Malik. 2010. Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. , Vol. 44, 2 (2010), 35--40. Google Scholar
Digital Library
- Xin Liu and Lei Ying. 2018. On achieving zero delay with power-of-d-choices load balancing. In Proc. IEEE Int. Conf. Computer Communications (INFOCOM). Honolulu, HI, USA, 297--305.Google Scholar
Digital Library
- Xin Liu and Lei Ying. 2019. On Universal Scaling of Distributed Queues under Load Balancing. arXiv:1912.11904 [math.PR] (2019).Google Scholar
- Xin Liu and Lei Ying. 2020. Steady-state analysis of load-balancing algorithms in the sub-Halfin--Whitt regime. J. Appl. Probab. , Vol. 57, 2 (2020), 578--596. Google Scholar
Digital Library
- Yi Lu, Qiaomin Xie, Gabriel Kliot, Alan Geller, James R. Larus, and Albert Greenberg. 2011. Join-Idle-Queue: A Novel Load Balancing Algorithm for Dynamically Scalable Web Services. Perform. Eval. , Vol. 68, 11 (Nov. 2011), 1056--1071. Google Scholar
Digital Library
- Michael Mitzenmacher. 2001. The power of two choices in randomized load balancing. IEEE Trans. Parallel Distrib. Syst. , Vol. 12, 10 (2001), 1094--1104. Google Scholar
Digital Library
- Debankur Mukherjee, Sem C Borst, Johan SH Van Leeuwaarden, and Philip A Whiting. 2018. Universality of power-of-d load balancing in many-server systems. Stoch. Syst. , Vol. 8, 4 (2018), 265--292.Google Scholar
Cross Ref
- Willie Neiswanger, Chong Wang, and Eric Xing. 2013. Asymptotically exact, embarrassingly parallel MCMC. arXiv:1311.4780 [stat.ML] (2013).Google Scholar
- Kay Ousterhout, Aurojit Panda, Joshua Rosen, Shivaram Venkataraman, Reynold Xin, Sylvia Ratnasamy, Scott Shenker, and Ion Stoica. 2013a. The case for tiny tasks in compute clusters. In Proc. USENIX Conf. Hot Topics in Operating Systems (HotOS) . Google Scholar
Digital Library
- Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. 2013b. Sparrow: distributed, low latency scheduling. In Proc. ACM Symp. Operating Systems Principles (SOSP). ACM, 69--84. Google Scholar
Digital Library
- Seva Shneer and Alexander Stolyar. 2020. Large-scale parallel server system with multi-component jobs. arXiv:2006.11256 [math.PR] (2020).Google Scholar
- Charles Stein. 1972. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proc. 6th Berkeley Symposium on Mathematical Statistics and Probability, Volume 2: Probability Theory. The Regents of the University of California.Google Scholar
- Alexander L Stolyar. 2015a. Pull-based load distribution in large-scale heterogeneous service systems. Queueing Syst. , Vol. 80, 4 (2015), 341--361. Google Scholar
Digital Library
- Alexander L. Stolyar. 2015b. Tightness of Stationary Distributions of a Flexible-Server System in the Halfin-Whitt Asymptotic Regime. Stoch. Syst. , Vol. 5, 2 (2015), 239--267.Google Scholar
Cross Ref
- Vinod Kumar Vavilapalli, Arun C. Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, Bikas Saha, Carlo Curino, Owen O'Malley, Sanjay Radia, Benjamin Reed, and Eric Baldeschwieler. 2013. Apache Hadoop YARN: Yet Another Resource Negotiator. In Proc. ACM Symp. Cloud Computing (SOCC) (Santa Clara, California). ACM, New York, NY, USA. Google Scholar
Digital Library
- Abhishek Verma, Luis Pedrosa, Madhukar R. Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-scale cluster management at Google with Borg. In Proc. European Conf. Computer Systems (EuroSys) (Bordeaux, France). Google Scholar
Digital Library
- Nikita Dmitrievna Vvedenskaya, Roland L'vovich Dobrushin, and Fridrikh Izrailevich Karpelevich. 1996. Queueing system with selection of the shortest of two queues: An asymptotic approach. Problems of Information Transmission , Vol. 32, 1 (1996), 15--27.Google Scholar
- Weina Wang, Mor Harchol-Balter, Haotian Jiang, Alan Scheller-Wolf, and R. Srikant. 2019. Delay asymptotics and bounds for multitask parallel jobs. Queueing Syst. , Vol. 91, 3 (01 April 2019), 207--239. Google Scholar
Digital Library
- Weina Wang, Siva Theja Maguluri, R Srikant, and Lei Ying. 2018. Heavy-traffic delay insensitivity in connection-level models of data transfer with proportionally fair bandwidth sharing. In Proc. ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems, Vol. 45. ACM, 232--245. Google Scholar
Digital Library
- Richard R Weber. 1978. On the optimal assignment of customers to parallel servers. J. Appl. Probab. , Vol. 15, 2 (1978), 406--413.Google Scholar
Cross Ref
- Wayne Winston. 1977. Optimality of the shortest line discipline. J. Appl. Probab. , Vol. 14, 1 (1977), 181--189.Google Scholar
Cross Ref
- Lei Ying. 2016. On the approximation error of mean-field models. ACM SIGMETRICS Perform. Evaluation Rev. , Vol. 44, 1 (2016), 285--297. Google Scholar
Digital Library
- Lei Ying. 2017. Stein's method for mean field approximations in light and heavy traffic regimes. ACM SIGMETRICS Perform. Evaluation Rev. , Vol. 45, 1 (2017), 49.Google Scholar
Digital Library
- Lei Ying, R. Srikant, and Xiaohan Kang. 2015. The power of slightly more than one sample in randomized load balancing. In Proc. IEEE Int. Conf. Computer Communications (INFOCOM). Kowloon, Hong Kong, 1131--1139.Google Scholar
Cross Ref
Index Terms
Achieving Zero Asymptotic Queueing Delay for Parallel Jobs
Recommendations
Achieving Zero Asymptotic Queueing Delay for Parallel Jobs
SIGMETRICS '21Zero queueing delay is highly desirable in large-scale computing systems. Existing work has shown that it can be asymptotically achieved by using the celebrated Power-of-d-choices (Pod) policy with a probe overhead d = Ω(log N/1-λ), and it is impossible ...
Achieving Zero Asymptotic Queueing Delay for Parallel Jobs
SIGMETRICS '21: Abstract Proceedings of the 2021 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer SystemsZero queueing delay is highly desirable in large-scale computing systems. Existing work has shown that it can be asymptotically achieved by using the celebrated Power-of-d-choices (Pod) policy with a probe overhead d = Ω(log N/1-λ), and it is impossible ...
Zero Queueing for Multi-Server Jobs
SIGMETRICS '21Cloud computing today is dominated by multi-server jobs. These are jobs that request multiple servers simultaneously and hold onto all of these servers for the duration of the job. Multi-server jobs add a lot of complexity to the traditional one-server-...






Comments