Abstract
This paper studies optimal control subject to changing conditions. This is an area that recently received a lot of attention as it arises in numerous situations in practice. Some applications being cloud computing systems where the arrival rates of new jobs fluctuate over time, or the time-varying capacity as encountered in power-aware systems or wireless downlink channels.
To study this, we focus on a restless bandit model, which has proved to be a powerful stochastic optimization framework to model scheduling of activities. In particular, it has been extensively applied in the context of optimal control of computing systems. This paper is a first step to its optimal control when restless bandits are subject to changing conditions, the latter being modeled by Markov-modulated environments. We consider the restless bandit problem in an asymptotic regime, which is obtained by letting the population of bandits grow large, and letting the environment change relatively fast. We present sufficient conditions for a policy to be asymptotically optimal and show that a set of priority policies satisfies these. Under an indexability assumption, an averaged version of Whittle's index policy is proved to be inside this set of asymptotic optimal policies. The performance of the averaged Whittle's index policy is numerically evaluated for a multi-class scheduling problem in a wireless downlink subject to changing conditions. While keeping the number of bandits constant, we observe that the average Whittle index policy becomes close to optimal as the speed of the modulated environment increases.
- L.L.H. Andrew A. Wierman and A. Tang. Power aware speed scaling in processor sharing systems. In Proceedings of IEEE INFOCOM, 2009.Google Scholar
Cross Ref
- S.H.A. Ahmad, M. Liu, T. Javidi, Q. Zhao, and B. Krishnamachari. Optimality of myopic sensing in multichannel opportunistic access. IEEE Transactions on Information Theory, 55:4040--4050, 2009. Google Scholar
Digital Library
- A. Anand and G. de Veciana. A Whittle's index based approach for QoE optimization in wireless networks. In Proceedings of ACM SIGMETRICS, Irvine, California, USA, 2018. Google Scholar
Digital Library
- P.S. Ansell, K.D. Glazebrook, J. Ni no-Mora, and M. O'Keeffe. Whittle's index policy for a multi-class queueing system with convex holding costs. Mathematical Methods of Operations Research, 57:21--39, 2003.Google Scholar
Cross Ref
- J. Anselmi. Asymptotically optimal open-loop load balancing. Queueing Systems, 87:245--267, 2017. Google Scholar
Digital Library
- A. Asanjarani and Y. Nazarathy. The role of information in system stability with partially observable servers. Arxiv report, 1610.02781v1, 2016.Google Scholar
- U. Ayesta, M. Erausquin, and P. Jacko. A modeling framework for optimizing the flow-level scheduling with time-varying channels. Performance Evaluation, 67:1014--1029, 2010. Google Scholar
Digital Library
- U. Ayesta, M. Erausquin, and P. Jacko. Resource-sharing in a single server with time-varying capacity. In Proceedings of 49th Annual Allerton Conference on Communication, Control and Computing, 2011.Google Scholar
Cross Ref
- U. Ayesta, P. Jacko, and V. Novak. A nearly-optimal index rule for scheduling of users with abandonment. In Proceedings of IEEE INFOCOM, Hong Kong, 2011.Google Scholar
Cross Ref
- M. Benaïm and J-Y Le Boudec. A class of mean field interaction models for computer and communication systems. Performance Evaluation, 65:823--838, 2008. Google Scholar
Digital Library
- P. Billingsley. Convergence of probability measures. Wiley, New York NY, 1968.Google Scholar
- C. Bordenave, D. McDonald, and A. Proutiére. A particle system in interaction with a rapidly varying environment: Mean field limits and applications. Networks and heterogeneous media, 5(1):31--62, 2010.Google Scholar
- S.C. Borst. User level performance of channel aware scheduling algorithms in wireless data networks. IEEE/ACM Transactions on Networking, 13:636--647, 2005. Google Scholar
Digital Library
- S. Bubeck and N. Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and trends in machine learning, 5:1--122, 2012.Google Scholar
- A. Budhiraja, A. Ghosh, and X. Liu. Scheduling control for markov-modulated single-server multiclass queueing systems in heavy traffic. Queueing Systems, 78(1):57--97, 2014. Google Scholar
Digital Library
- C. Buyukkoc, P. Varaya, and J. Walrand. The cμ rule revisited. Advances of Applied Probability, 17:237--238, 1985.Google Scholar
Cross Ref
- F. Cecchi and P. Jacko. Nearly-optimal scheduling of users with Markovian time-varying transmission rates. Performance Evaluation, 99--100:16--36, 2016. Google Scholar
Digital Library
- E. cCinlar. Introduction to Stochastic Processes. Prentice-Hall, New Jersey, 1975.Google Scholar
- N. Ehsan and M. Liu. On the optimality of an index policy for bandwidth allocation with delayed state observation and differentiated services. In Proceedings of IEEE INFOCOM, Hong Kong, 2004.Google Scholar
Cross Ref
- N. Gast and B. Gaujal. A mean field approach for optimization in discrete time. Discrete Event Dynamic Systems, 21(1):63--101, 2011. Google Scholar
Digital Library
- J.C. Gittins, K.D. Glazebrook, and R.R. Weber. Multi-Armed Bandit Allocation Indices. Wiley, Chichester, 2011.Google Scholar
Cross Ref
- K.D. Glazebrook, C. Kirkbride, and J. Ouenniche. Index policies for the admission control and routing of impatient customers to heterogeneous service stations. Operations Research, 57:975--989, 2009. Google Scholar
Digital Library
- K.D. Glazebrook and H.M. Mitchell. An index policy for a stochastic scheduling model with improving/deteriorating jobs. Naval Research Logistics, 49:706--721, 2002.Google Scholar
Cross Ref
- D.J. Hodge and K. D. Glazebrook. On the asymptotic optimality of greedy index heuristics for multi-action restless bandits. Advances in Applied Probability, 47:652--667, 2015.Google Scholar
Cross Ref
- M. Larra naga, U. Ayesta, and I.M. Verloop. Index policies for multi-class queues with convex holding cost and abandonments. In Proceedings of ACM SIGMETRICS, Austin TX, USA, 2014. Google Scholar
Digital Library
- M Larra naga, U Ayesta, and I.M. Verloop. Dynamic control of birth-and-death restless bandits: application to resource-allocation problems. IEEE/ACM Transactions on Networking, 24(6):3812--3825, 2016. Google Scholar
Digital Library
- K. Liu and Q. Zhao. Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access. IEEE Transactions on Information Theory, 56:5547--5567, 2010. Google Scholar
Digital Library
- A. Mahajan and D. Teneketzis. Multi-armed bandit problems. In Foundations and Application of Sensor Management, eds. A.O. Hero III, D.A. Castanon, D. Cochran and K. Kastella., pages 121--308, Springer-Verlag, 2007.Google Scholar
- Y. Nazarathy, T. Taimre, A. Asanjarani, J. Kuhn, B. Patch, and A. Vuorinen. The challenge of stabilizing control for queueing systems with unobservable server states. In IEEE Proceedings of the 5th Australian Control Conference, 2015.Google Scholar
- J. Ni no-Mora. Dynamic priority allocation via restless bandit marginal productivity indices. TOP, 15:161--198, 2007.Google Scholar
Cross Ref
- J. Ni no-Mora. Marginal productivity index policies for admission control and routing to parallel multi-server loss queues with reneging. Lecture Notes in Computer Science, 4465:138--149, 2007. Google Scholar
Digital Library
- J. R. Norris. Markov chains, volume 2 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 1998. Reprint of 1997 original.Google Scholar
- W. Ouyang, A. Eryilmaz, and N.B. Shroff. Asymptotically optimal downlink scheduling over Markovian fading channels. In Proceedings of IEEE INFOCOM, Orlando Fl, USA, 2012.Google Scholar
Cross Ref
- M.L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York, 1994. Google Scholar
Digital Library
- V. Raghunathan, V. Borkar, M. Cao, and P.R. Kumar. Index policies for real-time multicast scheduling for wireless broadcast systems. In Proceedings of IEEE INFOCOM, 2008.Google Scholar
Cross Ref
- A. Slivkins and E. Upfal. Adapting to a changing environment: The Brownian restless bandits. In Proceedings of 21st Annual Conference on Learning Theory, pages 343--354, 2008.Google Scholar
- I.M. Verloop. Asymptotic optimal control of multi-class restless bandits. Annals of Applied Probability 26 (4), 1947--1995, 2016.Google Scholar
Cross Ref
- R.R. Weber and G. Weiss. On an index policy for restless bandits. Journal of Applied Probability, 27(03):637--648, 1990.Google Scholar
Cross Ref
- P. Whittle. Restless bandits: Activity allocation in a changing world. Journal of applied probability, 25(A):287--298, 1988.Google Scholar
- P. Whittle. Optimal Control, Basics and Beyond. John Wiley & Sons, 1996. Google Scholar
Digital Library
Index Terms
Asymptotic Optimal Control of Markov-Modulated Restless Bandits
Recommendations
Asymptotic Optimal Control of Markov-Modulated Restless Bandits
SIGMETRICS '18: Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer SystemsThis paper studies optimal control subject to changing conditions. This is an area that recently received a lot of attention as it arises in numerous situations in practice. Some applications being cloud computing systems with fluctuating arrival rates, ...
Asymptotic Optimal Control of Markov-Modulated Restless Bandits
SIGMETRICS '18This paper studies optimal control subject to changing conditions. This is an area that recently received a lot of attention as it arises in numerous situations in practice. Some applications being cloud computing systems with fluctuating arrival rates, ...
On the Whittle index of Markov modulated restless bandits
AbstractIn this paper, we study a Multi-Armed Restless Bandit Problem (MARBP) subject to time fluctuations. This model has numerous applications in practice, like in cloud computing systems or in wireless communications networks. Each bandit is formed by ...






Comments