skip to main content
research-article

Optimal Service Elasticity in Large-Scale Distributed Systems

Authors Info & Claims
Published:13 June 2017Publication History
Skip Abstract Section

Abstract

A fundamental challenge in large-scale cloud networks and data centers is to achieve highly efficient server utilization and limit energy consumption, while providing excellent user-perceived performance in the presence of uncertain and time-varying demand patterns. Auto-scaling provides a popular paradigm for automatically adjusting service capacity in response to demand while meeting performance targets, and queue-driven auto-scaling techniques have been widely investigated in the literature. In typical data center architectures and cloud environments however, no centralized queue is maintained, and load balancing algorithms immediately distribute incoming tasks among parallel queues. In these distributed settings with vast numbers of servers, centralized queue-driven auto-scaling techniques involve a substantial communication overhead and major implementation burden, or may not even be viable at all.

Motivated by the above issues, we propose a joint auto-scaling and load balancing scheme which does not require any global queue length information or explicit knowledge of system parameters, and yet provides provably near-optimal service elasticity. We establish the fluid-level dynamics for the proposed scheme in a regime where the total traffic volume and nominal service capacity grow large in proportion. The fluid-limit results show that the proposed scheme achieves asymptotic optimality in terms of user-perceived delay performance as well as energy consumption. Specifically, we prove that both the waiting time of tasks and the relative energy portion consumed by idle servers vanish in the limit. At the same time, the proposed scheme operates in a distributed fashion and involves only constant communication overhead per task, thus ensuring scalability in massive data center operations. Extensive simulation experiments corroborate the fluid-limit results, and demonstrate that the proposed scheme can match the user performance and energy consumption of state-of-the-art approaches that do take full advantage of a centralized queue.

References

  1. Lachlan L. H. Andrew, Minghong Lin, and Adam Wierman. 2010. Optimality, fairness, and robustness in speed scaling designs. ACM SIGMETRICS Perf. Eval. Rev. 38, 1 (2010), 37--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Remi Badonnel and Mark Burgess. 2008. Dynamic pull-based load balancing for autonomic servers. In Proc. IEEE/IFIP. 751--754.Google ScholarGoogle ScholarCross RefCross Ref
  3. Nikhil Bansal, Kirk Pruhs, and Cliff Stein. 2007. Speed scaling for weighted flow time. In Proc. SODA '07. Philadelphia, PA, 805--813. http://dl.acm.org/citation.cfm?id=1283383.1283469 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Luiz André Barroso and Urs Hölzle. 2007. The case for energy-proportional computing. Computer 40, 12 (2007), 33--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Michel Benaïm and Jean-Yves Le Boudec. 2008. A class of mean field interaction models for computer and communication systems. Perform. Evaluation 65, 11--12 (2008), 823--838. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Luca Bortolussi. 2016. Hybrid behaviour of Markov population models. Inform. Comput. 247 (2016), 37--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Luca Bortolussi and Nicolas Gast. 2016. Mean-field limits beyond ordinary differential equations. In Formal Methods for the Quantitative Evaluation of Collective Adaptive Systems, SFM 2016. Springer-Verlag New York, Inc., NY, USA, 61--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. James R. Bradley. 2005. Optimal control of a dual service rate M/M/1 production-inventory model. Eur. J. Oper. Res. 161, 3 (2005), 812--837.Google ScholarGoogle ScholarCross RefCross Ref
  9. Thomas B. Crabill. 1972. Optimal control of a service facility with variable exponential service times and constant arrival rate. Manage. Sci. 18, 9 (1972), 560--566.Google ScholarGoogle Scholar
  10. Stewart N Ethier and Thomas G Kurtz. 2009. Markov Processes: Characterization and Convergence. John Wiley & Sons.Google ScholarGoogle Scholar
  11. Anshul Gandhi, Sherwin Doroudi, Mor Harchol-Balter, and Alan Scheller-Wolf. 2013. Exact analysis of the M/M/k/setup class of Markov chains via recursive renewal reward. In Proc. ACM SIGMETRICS '13. New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Anshul Gandhi, Mor Harchol-Balter, and Michael A Kozuch. 2012. Are sleep states effective in data centers?. In Proc. IGCC '12. Washington, DC, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Nicolas Gast and Bruno Gaujal. 2010. Mean field limit of non-smooth systems and differential inclusions. ACM SIGMETRICS Perform. Eval. Rev. 38, 2 (2010), 30--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Nicolas Gast and Bruno Gaujal. 2012. Markov chains with discontinuous drifts have differential inclusion limits. Perform. Eval. 69, 12 (2012), 623--642. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P J Hunt and T G Kurtz. 1994. Large loss networks. Stoch. Proc. Appl. 53, 2 (1994), 363--378.Google ScholarGoogle ScholarCross RefCross Ref
  16. Thomas G. Kurtz. 1992. Averaging for martingale problems and stochastic approximation. In Applied Stochastic Analysis. Springer, Berlin, Heidelberg, 186--209.Google ScholarGoogle Scholar
  17. Minghong Lin, Zhenhua Liu, Adam Wierman, and Lachlan L H Andrew. 2012. Online algorithms for geographical load balancing. In Proc. IGCC '12. Washington, DC, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Minghong Lin, Adam Wierman, Lachlan L H Andrew, and Eno Thereska. 2013. Dynamic right-sizing for power-proportional data centers. IEEE/ACM Trans. Netw. 21, 5 (2013), 1378--1391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Robert Liptser and Albert Shiryaev. 1989. Theory of Martingales. Springer. http://link.springer.com/book/10.1007Google ScholarGoogle Scholar
  20. Zhenhua Liu, Yuan Chen, Cullen Bash, Adam Wierman, Daniel Gmach, Zhikui Wang, Manish Marwah, and Chris Hyser. 2012. Renewable and cooling aware workload management for sustainable data centers. ACM SIGMETRICS Perf. Eval. Rev. 40, 1 (2012), 175--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Zhenhua Liu, Minghong Lin, Adam Wierman, Steven H Low, and Lachlan L H Andrew. 2011. Geographical load balancing with renewables. ACM SIGMETRICS Perf. Eval. Rev. 39, 3 (2011), 62--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Zhenhua Liu, Minghong Lin, Adam Wierman, Steven H Low, and Lachlan L H Andrew. 2011. Greening geographical load balancing. In Proc. ACM SIGMETRICS '11. New York, NY, 233--244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Yi Lu, Qiaomin Xie, Gabriel Kliot, Alan Geller, James R. Larus, and Albert Greenberg. 2011. Join-idle queue: a novel load balancing algorithm for dynamically scalable web services. Perform. Evaluation 68 (2011), 1056--1071. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Michael Mitzenmacher. 2001. The power of two choices in randomized load balancing. IEEE Trans. Parallel Distrib. Syst. 12, 10 (2001), 1094--1104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Debankur Mukherjee, Sem C. Borst, Johan S. H. van Leeuwaarden, and Philip A. Whiting. 2016. Universality of load balancing schemes on the diffusion scale. J. Appl. Probab. 53, 4 (2016).Google ScholarGoogle ScholarCross RefCross Ref
  26. Lam M Nguyen and Alexander L Stolyar. 2016. A service system with randomly behaving on-demand agents. ACM SIGMETRICS Perf. Eval. Rev. 44, 1 (2016), 365--366. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Guodong Pang and Alexander L Stolyar. 2016. A service system with on-demand agent invitations. Queueing Syst. 82, 3--4 (2016), 259--283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Guodong Pang, Rishi Talreja, and Ward Whitt. 2007. Martingale proofs of many-server heavy-traffic limits for Markovian queues. Prob. Surveys 4 (2007), 193--267. arXiv:0712.4211Google ScholarGoogle ScholarCross RefCross Ref
  29. Jamol Pender and Tuan Phung-Duc. 2016. A law of large numbers for M/M/c/delayoff-setup queues with nonstationary arrivals. In Proc. ASMTA 2016, Cardiff, UK, August 24--26, 2016, Sabine Wittevrongel and Tuan Phung-Duc (Eds.). Springer International Publishing, Cham, 253--268.Google ScholarGoogle Scholar
  30. Ohad Perry and Ward Whitt. 2013. A fluid limit for an overloaded X model via a stochastic averaging principle. Math. Oper. Res. 38, 2 (2013), 294--349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. A. Puhalskii and M. I. Reiman. 2000. The multiclass GI/PH/N queue in the Halfin-Whitt regime. Adv. Appl. Probab. 32, 2 (2000), 564--595.Google ScholarGoogle ScholarCross RefCross Ref
  32. Alexander L Stolyar. 2015. Pull-based load distribution in large-scale heterogeneous service systems. Queueing Syst. 80, 4 (2015), 341--361. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Alexander L Stolyar. 2017. Pull-based load distribution among heterogeneous parallel servers: the case of multiple routers. Queueing Syst. 85, 1 (2017), 31--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. John N Tsitsiklis and Kuang Xu. 2011. On the power of (even a little) centralization in distributed processing. ACM SIGMETRICS Perform. Eval. Rev. 39, 1 (2011), 121--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. R. Urgaonkar, U. C. Kozat, K. Igarashi, and M. J. Neely. 2010. Dynamic resource allocation and power management in virtualized data centers. In Proc. IEEE/NOMS 2010. 479--486.Google ScholarGoogle Scholar
  36. Richard R. Weber and Shaler Stidham Jr. 1987. Optimal control of service rates in networks of queues. Adv. Appl. Probab. (1987), 202--218.Google ScholarGoogle Scholar
  37. Adam Wierman, Lachlan L. H. Andrew, and Ao Tang. 2012. Power-aware speed scaling in processor sharing systems: optimality and robustness. Perform. Evaluation 69, 12 (2012), 601--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. F. Yao, A. Demers, and S. Shenker. 1995. A scheduling model for reduced CPU energy. In Proc. FOCS '95. Washington, DC. http://dl.acm.org/citation.cfm?id=795662.796264 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimal Service Elasticity in Large-Scale Distributed Systems

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!