skip to main content
research-article

Achieving Zero Asymptotic Queueing Delay for Parallel Jobs

Published:15 June 2021Publication History
Skip Abstract Section

Abstract

Zero queueing delay is highly desirable in large-scale computing systems. Existing work has shown that it can be asymptotically achieved by using the celebrated Power-of-d-choices (pod) policy with a probe overhead $d = ømegałeft(\fracłog N 1-łambda \right)$, and it is impossible when $d = Ołeft(\frac1 1-łambda \right)$, where N is the number of servers and $łambda$ is the load of the system. However, these results are based on the model where each job is an indivisible unit, which does not capture the parallel structure of jobs in today's predominant parallel computing paradigm. This paper thus considers a model where each job consists of a batch of parallel tasks. Under this model, we propose a new notion of zero (asymptotic) queueing delay that requires the job delay under a policy to approach the job delay given by the max of its tasks' service times, i.e., the job delay assuming its tasks entered service right upon arrival. This notion quantifies the effect of queueing on a job level for jobs consisting of multiple tasks, and thus deviates from the conventional zero queueing delay for single-task jobs in the literature. We show that zero queueing delay for parallel jobs can be achieved using the batch-filling policy (a variant of the celebrated \pod\ policy) with a probe overhead $d = ømegałeft(\frac1 (1-łambda)łog k \right)$ in the sub-Halfin-Whitt heavy-traffic regime, where k is the number of tasks in each job and k properly scales with N (the number of servers). This result demonstrates that for parallel jobs, zero queueing delay can be achieved with a smaller probe overhead. We also establish an impossibility result: we show that zero queueing delay cannot be achieved if $d = \expłeft(ołeft(\fracłog N łog k \right) \right)$. Simulation results are provided to demonstrate the consistency between numerical results and theoretical results under reasonable settings, and to investigate gaps in the theoretical analysis.

References

  1. George Amvrosiadis, Jun Woo Park, Gregory R Ganger, Garth A Gibson, Elisabeth Baseman, and Nathan DeBardeleben. 2018. On the diversity of cluster workloads and its impact on research results. In Proc. USENIX Ann. Technical Conf. (ATC). 533--546. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Sayan Banerjee and Debankur Mukherjee. 2019. Join-the-shortest queue diffusion limit in Halfin--Whitt regime: Tail asymptotics and scaling of extrema. Ann. Appl. Probab. , Vol. 29, 2 (2019), 1262--1309.Google ScholarGoogle ScholarCross RefCross Ref
  3. Dimitris Bertsimas, David Gamarnik, and John N. Tsitsiklis. 2001. Performance of Multiclass Markovian Queueing Networks Via Piecewise Linear Lyapunov Functions. Ann. Appl. Probab. , Vol. 11, 4 (11 2001), 1384--1428.Google ScholarGoogle Scholar
  4. Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou. 2014. Apollo: Scalable and coordinated scheduling for cloud-scale computing. In Proc. USENIX Conf. Operating Systems Design and Implementation (OSDI) . USENIX, 285--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Anton Braverman. 2018. Steady-state analysis of the Join the Shortest Queue model in the Halfin-Whitt regime. arXiv:1801.05121 [math.PR] (2018).Google ScholarGoogle Scholar
  6. Anton Braverman and JG Dai. 2017. Stein's method for steady-state diffusion approximations of $ M/mathit Ph/nGoogle ScholarGoogle Scholar
  7. M$ systems. Ann. Appl. Probab. , Vol. 27 (Feb. 2017), 550--581. https://doi.org/10.1214/16-AAP1211Google ScholarGoogle Scholar
  8. Anton Braverman, JG Dai, and Jiekun Feng. 2017. Stein's method for steady-state diffusion approximations: an introduction through the Erlang-A and Erlang-C models. Stoch. Syst. , Vol. 6, 2 (2017), 301--366.Google ScholarGoogle ScholarCross RefCross Ref
  9. Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: amazon's highly available key-value store. SIGOPS Oper. Syst. Rev. , Vol. 41, 6 (2007), 205--220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Christina Delimitrou, Daniel Sanchez, and Christos Kozyrakis. 2015. Tarcil: reconciling scheduling speed and quality in large shared clusters. In Proc. ACM Symp. Cloud Computing (SOCC). 97--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Patrick Eschenfeldt and David Gamarnik. 2018. Join the shortest queue with many servers. The heavy-traffic asymptotics. Math. Oper. Res. , Vol. 43, 3 (2018), 867--886.Google ScholarGoogle ScholarCross RefCross Ref
  12. Stewart N. Ethier and Thomas G. Kurtz. 1986. Markov Processes: Characterization and Convergence .John Wiley & Sons, New York.Google ScholarGoogle Scholar
  13. David Gamarnik, John N Tsitsiklis, and Martin Zubeldia. 2016. Delay, memory, and messaging tradeoffs in distributed service systems. In Proc. ACM SIGMETRICS/PERFORMANCE Jt. Int. Conf. Measurement and Modeling of Computer Systems. ACM, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kristen Gardner, Mor Harchol-Balter, and Alan Scheller-Wolf. 2016. A Better Model for Job Redundancy: Decoupling Server Slowdown and Job Size. In IEEE Int. Symp. Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS). London, United Kingdom, 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  15. Nicolas Gast. 2017. Expected Values Estimated via Mean-Field Approximation are 1/N-Accurate. In Proc. ACM Measurement and Analysis of Computing Systems (POMACS) , Vol. 45. ACM, 50--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Nicolas Gast and Benny Van Houdt. 2017. A refined mean field approximation. In Proc. ACM Measurement and Analysis of Computing Systems (POMACS), Vol. 1. ACM, 33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ionel Gog, Malte Schwarzkopf, Adam Gleave, Robert NM Watson, and Steven Hand. 2016. Firmament: Fast, centralized cluster scheduling at scale. In Proc. USENIX Conf. Operating Systems Design and Implementation (OSDI). USENIX, 99--115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Varun Gupta and Neil Walton. 2019. Load Balancing in the Nondegenerate Slowdown Regime. Oper. Res. , Vol. 67, 1 (2019), 281--294.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Itai Gurvich. 2014. Diffusion models and steady-state approximations for exponentially ergodic Markovian queues. Ann. Appl. Probab. , Vol. 24, 6 (2014), 2527--2559.Google ScholarGoogle ScholarCross RefCross Ref
  20. Shlomo Halfin and Ward Whitt. 1981. Heavy-traffic limits for queues with many exponential servers. Oper. Res. , Vol. 29, 3 (1981), 567--588. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Wassily Hoeffding. 1963. Probability Inequalities for Sums of Bounded Random Variables. J. Amer. Stat. Assoc. , Vol. 58, 301 (1963), 13--30. http://www.jstor.org/stable/2282952Google ScholarGoogle ScholarCross RefCross Ref
  22. Eric Jonas, Qifan Pu, Shivaram Venkataraman, Ion Stoica, and Benjamin Recht. 2017. Occupy the cloud: Distributed computing for the 99%. In Proc. ACM Symp. Cloud Computing (SOCC) . 445--451. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Avinash Lakshman and Prashant Malik. 2010. Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. , Vol. 44, 2 (2010), 35--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Xin Liu and Lei Ying. 2018. On achieving zero delay with power-of-d-choices load balancing. In Proc. IEEE Int. Conf. Computer Communications (INFOCOM). Honolulu, HI, USA, 297--305.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Xin Liu and Lei Ying. 2019. On Universal Scaling of Distributed Queues under Load Balancing. arXiv:1912.11904 [math.PR] (2019).Google ScholarGoogle Scholar
  26. Xin Liu and Lei Ying. 2020. Steady-state analysis of load-balancing algorithms in the sub-Halfin--Whitt regime. J. Appl. Probab. , Vol. 57, 2 (2020), 578--596. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yi Lu, Qiaomin Xie, Gabriel Kliot, Alan Geller, James R. Larus, and Albert Greenberg. 2011. Join-Idle-Queue: A Novel Load Balancing Algorithm for Dynamically Scalable Web Services. Perform. Eval. , Vol. 68, 11 (Nov. 2011), 1056--1071. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Michael Mitzenmacher. 2001. The power of two choices in randomized load balancing. IEEE Trans. Parallel Distrib. Syst. , Vol. 12, 10 (2001), 1094--1104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Debankur Mukherjee, Sem C Borst, Johan SH Van Leeuwaarden, and Philip A Whiting. 2018. Universality of power-of-d load balancing in many-server systems. Stoch. Syst. , Vol. 8, 4 (2018), 265--292.Google ScholarGoogle ScholarCross RefCross Ref
  30. Willie Neiswanger, Chong Wang, and Eric Xing. 2013. Asymptotically exact, embarrassingly parallel MCMC. arXiv:1311.4780 [stat.ML] (2013).Google ScholarGoogle Scholar
  31. Kay Ousterhout, Aurojit Panda, Joshua Rosen, Shivaram Venkataraman, Reynold Xin, Sylvia Ratnasamy, Scott Shenker, and Ion Stoica. 2013a. The case for tiny tasks in compute clusters. In Proc. USENIX Conf. Hot Topics in Operating Systems (HotOS) . Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. 2013b. Sparrow: distributed, low latency scheduling. In Proc. ACM Symp. Operating Systems Principles (SOSP). ACM, 69--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Seva Shneer and Alexander Stolyar. 2020. Large-scale parallel server system with multi-component jobs. arXiv:2006.11256 [math.PR] (2020).Google ScholarGoogle Scholar
  34. Charles Stein. 1972. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proc. 6th Berkeley Symposium on Mathematical Statistics and Probability, Volume 2: Probability Theory. The Regents of the University of California.Google ScholarGoogle Scholar
  35. Alexander L Stolyar. 2015a. Pull-based load distribution in large-scale heterogeneous service systems. Queueing Syst. , Vol. 80, 4 (2015), 341--361. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Alexander L. Stolyar. 2015b. Tightness of Stationary Distributions of a Flexible-Server System in the Halfin-Whitt Asymptotic Regime. Stoch. Syst. , Vol. 5, 2 (2015), 239--267.Google ScholarGoogle ScholarCross RefCross Ref
  37. Vinod Kumar Vavilapalli, Arun C. Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, Bikas Saha, Carlo Curino, Owen O'Malley, Sanjay Radia, Benjamin Reed, and Eric Baldeschwieler. 2013. Apache Hadoop YARN: Yet Another Resource Negotiator. In Proc. ACM Symp. Cloud Computing (SOCC) (Santa Clara, California). ACM, New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Abhishek Verma, Luis Pedrosa, Madhukar R. Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-scale cluster management at Google with Borg. In Proc. European Conf. Computer Systems (EuroSys) (Bordeaux, France). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Nikita Dmitrievna Vvedenskaya, Roland L'vovich Dobrushin, and Fridrikh Izrailevich Karpelevich. 1996. Queueing system with selection of the shortest of two queues: An asymptotic approach. Problems of Information Transmission , Vol. 32, 1 (1996), 15--27.Google ScholarGoogle Scholar
  40. Weina Wang, Mor Harchol-Balter, Haotian Jiang, Alan Scheller-Wolf, and R. Srikant. 2019. Delay asymptotics and bounds for multitask parallel jobs. Queueing Syst. , Vol. 91, 3 (01 April 2019), 207--239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Weina Wang, Siva Theja Maguluri, R Srikant, and Lei Ying. 2018. Heavy-traffic delay insensitivity in connection-level models of data transfer with proportionally fair bandwidth sharing. In Proc. ACM SIGMETRICS Int. Conf. Measurement and Modeling of Computer Systems, Vol. 45. ACM, 232--245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Richard R Weber. 1978. On the optimal assignment of customers to parallel servers. J. Appl. Probab. , Vol. 15, 2 (1978), 406--413.Google ScholarGoogle ScholarCross RefCross Ref
  43. Wayne Winston. 1977. Optimality of the shortest line discipline. J. Appl. Probab. , Vol. 14, 1 (1977), 181--189.Google ScholarGoogle ScholarCross RefCross Ref
  44. Lei Ying. 2016. On the approximation error of mean-field models. ACM SIGMETRICS Perform. Evaluation Rev. , Vol. 44, 1 (2016), 285--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Lei Ying. 2017. Stein's method for mean field approximations in light and heavy traffic regimes. ACM SIGMETRICS Perform. Evaluation Rev. , Vol. 45, 1 (2017), 49.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Lei Ying, R. Srikant, and Xiaohan Kang. 2015. The power of slightly more than one sample in randomized load balancing. In Proc. IEEE Int. Conf. Computer Communications (INFOCOM). Kowloon, Hong Kong, 1131--1139.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Achieving Zero Asymptotic Queueing Delay for Parallel Jobs

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!