Abstract
A fundamental challenge in large-scale cloud networks and data centers is to achieve highly efficient server utilization and limit energy consumption, while providing excellent user-perceived performance in the presence of uncertain and time-varying demand patterns. Auto-scaling provides a popular paradigm for automatically adjusting service capacity in response to demand while meeting performance targets, and queue-driven auto-scaling techniques have been widely investigated in the literature. In typical data center architectures and cloud environments however, no centralized queue is maintained, and load balancing algorithms immediately distribute incoming tasks among parallel queues. In these distributed settings with vast numbers of servers, centralized queue-driven auto-scaling techniques involve a substantial communication overhead and major implementation burden, or may not even be viable at all.
Motivated by the above issues, we propose a joint auto-scaling and load balancing scheme which does not require any global queue length information or explicit knowledge of system parameters, and yet provides provably near-optimal service elasticity. We establish the fluid-level dynamics for the proposed scheme in a regime where the total traffic volume and nominal service capacity grow large in proportion. The fluid-limit results show that the proposed scheme achieves asymptotic optimality in terms of user-perceived delay performance as well as energy consumption. Specifically, we prove that both the waiting time of tasks and the relative energy portion consumed by idle servers vanish in the limit. At the same time, the proposed scheme operates in a distributed fashion and involves only constant communication overhead per task, thus ensuring scalability in massive data center operations. Extensive simulation experiments corroborate the fluid-limit results, and demonstrate that the proposed scheme can match the user performance and energy consumption of state-of-the-art approaches that do take full advantage of a centralized queue.
- Lachlan L. H. Andrew, Minghong Lin, and Adam Wierman. 2010. Optimality, fairness, and robustness in speed scaling designs. ACM SIGMETRICS Perf. Eval. Rev. 38, 1 (2010), 37--48. Google Scholar
Digital Library
- Remi Badonnel and Mark Burgess. 2008. Dynamic pull-based load balancing for autonomic servers. In Proc. IEEE/IFIP. 751--754.Google Scholar
Cross Ref
- Nikhil Bansal, Kirk Pruhs, and Cliff Stein. 2007. Speed scaling for weighted flow time. In Proc. SODA '07. Philadelphia, PA, 805--813. http://dl.acm.org/citation.cfm?id=1283383.1283469 Google Scholar
Digital Library
- Luiz André Barroso and Urs Hölzle. 2007. The case for energy-proportional computing. Computer 40, 12 (2007), 33--37. Google Scholar
Digital Library
- Michel Benaïm and Jean-Yves Le Boudec. 2008. A class of mean field interaction models for computer and communication systems. Perform. Evaluation 65, 11--12 (2008), 823--838. Google Scholar
Digital Library
- Luca Bortolussi. 2016. Hybrid behaviour of Markov population models. Inform. Comput. 247 (2016), 37--86. Google Scholar
Digital Library
- Luca Bortolussi and Nicolas Gast. 2016. Mean-field limits beyond ordinary differential equations. In Formal Methods for the Quantitative Evaluation of Collective Adaptive Systems, SFM 2016. Springer-Verlag New York, Inc., NY, USA, 61--82. Google Scholar
Digital Library
- James R. Bradley. 2005. Optimal control of a dual service rate M/M/1 production-inventory model. Eur. J. Oper. Res. 161, 3 (2005), 812--837.Google Scholar
Cross Ref
- Thomas B. Crabill. 1972. Optimal control of a service facility with variable exponential service times and constant arrival rate. Manage. Sci. 18, 9 (1972), 560--566.Google Scholar
- Stewart N Ethier and Thomas G Kurtz. 2009. Markov Processes: Characterization and Convergence. John Wiley & Sons.Google Scholar
- Anshul Gandhi, Sherwin Doroudi, Mor Harchol-Balter, and Alan Scheller-Wolf. 2013. Exact analysis of the M/M/k/setup class of Markov chains via recursive renewal reward. In Proc. ACM SIGMETRICS '13. New York, NY. Google Scholar
Digital Library
- Anshul Gandhi, Mor Harchol-Balter, and Michael A Kozuch. 2012. Are sleep states effective in data centers?. In Proc. IGCC '12. Washington, DC, 1--10. Google Scholar
Digital Library
- Nicolas Gast and Bruno Gaujal. 2010. Mean field limit of non-smooth systems and differential inclusions. ACM SIGMETRICS Perform. Eval. Rev. 38, 2 (2010), 30--32. Google Scholar
Digital Library
- Nicolas Gast and Bruno Gaujal. 2012. Markov chains with discontinuous drifts have differential inclusion limits. Perform. Eval. 69, 12 (2012), 623--642. Google Scholar
Digital Library
- P J Hunt and T G Kurtz. 1994. Large loss networks. Stoch. Proc. Appl. 53, 2 (1994), 363--378.Google Scholar
Cross Ref
- Thomas G. Kurtz. 1992. Averaging for martingale problems and stochastic approximation. In Applied Stochastic Analysis. Springer, Berlin, Heidelberg, 186--209.Google Scholar
- Minghong Lin, Zhenhua Liu, Adam Wierman, and Lachlan L H Andrew. 2012. Online algorithms for geographical load balancing. In Proc. IGCC '12. Washington, DC, 1--10. Google Scholar
Digital Library
- Minghong Lin, Adam Wierman, Lachlan L H Andrew, and Eno Thereska. 2013. Dynamic right-sizing for power-proportional data centers. IEEE/ACM Trans. Netw. 21, 5 (2013), 1378--1391. Google Scholar
Digital Library
- Robert Liptser and Albert Shiryaev. 1989. Theory of Martingales. Springer. http://link.springer.com/book/10.1007Google Scholar
- Zhenhua Liu, Yuan Chen, Cullen Bash, Adam Wierman, Daniel Gmach, Zhikui Wang, Manish Marwah, and Chris Hyser. 2012. Renewable and cooling aware workload management for sustainable data centers. ACM SIGMETRICS Perf. Eval. Rev. 40, 1 (2012), 175--186. Google Scholar
Digital Library
- Zhenhua Liu, Minghong Lin, Adam Wierman, Steven H Low, and Lachlan L H Andrew. 2011. Geographical load balancing with renewables. ACM SIGMETRICS Perf. Eval. Rev. 39, 3 (2011), 62--66. Google Scholar
Digital Library
- Zhenhua Liu, Minghong Lin, Adam Wierman, Steven H Low, and Lachlan L H Andrew. 2011. Greening geographical load balancing. In Proc. ACM SIGMETRICS '11. New York, NY, 233--244. Google Scholar
Digital Library
- Yi Lu, Qiaomin Xie, Gabriel Kliot, Alan Geller, James R. Larus, and Albert Greenberg. 2011. Join-idle queue: a novel load balancing algorithm for dynamically scalable web services. Perform. Evaluation 68 (2011), 1056--1071. Google Scholar
Digital Library
- Michael Mitzenmacher. 2001. The power of two choices in randomized load balancing. IEEE Trans. Parallel Distrib. Syst. 12, 10 (2001), 1094--1104. Google Scholar
Digital Library
- Debankur Mukherjee, Sem C. Borst, Johan S. H. van Leeuwaarden, and Philip A. Whiting. 2016. Universality of load balancing schemes on the diffusion scale. J. Appl. Probab. 53, 4 (2016).Google Scholar
Cross Ref
- Lam M Nguyen and Alexander L Stolyar. 2016. A service system with randomly behaving on-demand agents. ACM SIGMETRICS Perf. Eval. Rev. 44, 1 (2016), 365--366. Google Scholar
Digital Library
- Guodong Pang and Alexander L Stolyar. 2016. A service system with on-demand agent invitations. Queueing Syst. 82, 3--4 (2016), 259--283. Google Scholar
Digital Library
- Guodong Pang, Rishi Talreja, and Ward Whitt. 2007. Martingale proofs of many-server heavy-traffic limits for Markovian queues. Prob. Surveys 4 (2007), 193--267. arXiv:0712.4211Google Scholar
Cross Ref
- Jamol Pender and Tuan Phung-Duc. 2016. A law of large numbers for M/M/c/delayoff-setup queues with nonstationary arrivals. In Proc. ASMTA 2016, Cardiff, UK, August 24--26, 2016, Sabine Wittevrongel and Tuan Phung-Duc (Eds.). Springer International Publishing, Cham, 253--268.Google Scholar
- Ohad Perry and Ward Whitt. 2013. A fluid limit for an overloaded X model via a stochastic averaging principle. Math. Oper. Res. 38, 2 (2013), 294--349. Google Scholar
Digital Library
- A. A. Puhalskii and M. I. Reiman. 2000. The multiclass GI/PH/N queue in the Halfin-Whitt regime. Adv. Appl. Probab. 32, 2 (2000), 564--595.Google Scholar
Cross Ref
- Alexander L Stolyar. 2015. Pull-based load distribution in large-scale heterogeneous service systems. Queueing Syst. 80, 4 (2015), 341--361. Google Scholar
Digital Library
- Alexander L Stolyar. 2017. Pull-based load distribution among heterogeneous parallel servers: the case of multiple routers. Queueing Syst. 85, 1 (2017), 31--65. Google Scholar
Digital Library
- John N Tsitsiklis and Kuang Xu. 2011. On the power of (even a little) centralization in distributed processing. ACM SIGMETRICS Perform. Eval. Rev. 39, 1 (2011), 121--132. Google Scholar
Digital Library
- R. Urgaonkar, U. C. Kozat, K. Igarashi, and M. J. Neely. 2010. Dynamic resource allocation and power management in virtualized data centers. In Proc. IEEE/NOMS 2010. 479--486.Google Scholar
- Richard R. Weber and Shaler Stidham Jr. 1987. Optimal control of service rates in networks of queues. Adv. Appl. Probab. (1987), 202--218.Google Scholar
- Adam Wierman, Lachlan L. H. Andrew, and Ao Tang. 2012. Power-aware speed scaling in processor sharing systems: optimality and robustness. Perform. Evaluation 69, 12 (2012), 601--622. Google Scholar
Digital Library
- F. Yao, A. Demers, and S. Shenker. 1995. A scheduling model for reduced CPU energy. In Proc. FOCS '95. Washington, DC. http://dl.acm.org/citation.cfm?id=795662.796264 Google Scholar
Digital Library
Index Terms
Optimal Service Elasticity in Large-Scale Distributed Systems
Recommendations
Optimal Service Elasticity in Large-Scale Distributed Systems
SIGMETRICS '17 Abstracts: Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer SystemsA fundamental challenge in large-scale cloud networks and data centers is to achieve highly efficient server utilization and limit energy consumption, while providing excellent user-perceived performance in the presence of uncertain and time-varying ...
Asymptotically Optimal Load Balancing Topologies
We consider a system of N servers inter-connected by some underlying graph topology GN. Tasks with unit-mean exponential processing times arrive at the various servers as independent Poisson processes of rate λ. Each incoming task is irrevocably ...
Optimal Service Elasticity in Large-Scale Distributed Systems
Performance evaluation reviewA fundamental challenge in large-scale cloud networks and data centers is to achieve highly efficient server utilization and limit energy consumption, while providing excellent user-perceived performance in the presence of uncertain and time-varying ...






Comments