skip to main content
research-article

Asymptotic Optimal Control of Markov-Modulated Restless Bandits

Published:03 April 2018Publication History
Skip Abstract Section

Abstract

This paper studies optimal control subject to changing conditions. This is an area that recently received a lot of attention as it arises in numerous situations in practice. Some applications being cloud computing systems where the arrival rates of new jobs fluctuate over time, or the time-varying capacity as encountered in power-aware systems or wireless downlink channels.

To study this, we focus on a restless bandit model, which has proved to be a powerful stochastic optimization framework to model scheduling of activities. In particular, it has been extensively applied in the context of optimal control of computing systems. This paper is a first step to its optimal control when restless bandits are subject to changing conditions, the latter being modeled by Markov-modulated environments. We consider the restless bandit problem in an asymptotic regime, which is obtained by letting the population of bandits grow large, and letting the environment change relatively fast. We present sufficient conditions for a policy to be asymptotically optimal and show that a set of priority policies satisfies these. Under an indexability assumption, an averaged version of Whittle's index policy is proved to be inside this set of asymptotic optimal policies. The performance of the averaged Whittle's index policy is numerically evaluated for a multi-class scheduling problem in a wireless downlink subject to changing conditions. While keeping the number of bandits constant, we observe that the average Whittle index policy becomes close to optimal as the speed of the modulated environment increases.

References

  1. L.L.H. Andrew A. Wierman and A. Tang. Power aware speed scaling in processor sharing systems. In Proceedings of IEEE INFOCOM, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  2. S.H.A. Ahmad, M. Liu, T. Javidi, Q. Zhao, and B. Krishnamachari. Optimality of myopic sensing in multichannel opportunistic access. IEEE Transactions on Information Theory, 55:4040--4050, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Anand and G. de Veciana. A Whittle's index based approach for QoE optimization in wireless networks. In Proceedings of ACM SIGMETRICS, Irvine, California, USA, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P.S. Ansell, K.D. Glazebrook, J. Ni no-Mora, and M. O'Keeffe. Whittle's index policy for a multi-class queueing system with convex holding costs. Mathematical Methods of Operations Research, 57:21--39, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  5. J. Anselmi. Asymptotically optimal open-loop load balancing. Queueing Systems, 87:245--267, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Asanjarani and Y. Nazarathy. The role of information in system stability with partially observable servers. Arxiv report, 1610.02781v1, 2016.Google ScholarGoogle Scholar
  7. U. Ayesta, M. Erausquin, and P. Jacko. A modeling framework for optimizing the flow-level scheduling with time-varying channels. Performance Evaluation, 67:1014--1029, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. U. Ayesta, M. Erausquin, and P. Jacko. Resource-sharing in a single server with time-varying capacity. In Proceedings of 49th Annual Allerton Conference on Communication, Control and Computing, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  9. U. Ayesta, P. Jacko, and V. Novak. A nearly-optimal index rule for scheduling of users with abandonment. In Proceedings of IEEE INFOCOM, Hong Kong, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  10. M. Benaïm and J-Y Le Boudec. A class of mean field interaction models for computer and communication systems. Performance Evaluation, 65:823--838, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Billingsley. Convergence of probability measures. Wiley, New York NY, 1968.Google ScholarGoogle Scholar
  12. C. Bordenave, D. McDonald, and A. Proutiére. A particle system in interaction with a rapidly varying environment: Mean field limits and applications. Networks and heterogeneous media, 5(1):31--62, 2010.Google ScholarGoogle Scholar
  13. S.C. Borst. User level performance of channel aware scheduling algorithms in wireless data networks. IEEE/ACM Transactions on Networking, 13:636--647, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Bubeck and N. Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and trends in machine learning, 5:1--122, 2012.Google ScholarGoogle Scholar
  15. A. Budhiraja, A. Ghosh, and X. Liu. Scheduling control for markov-modulated single-server multiclass queueing systems in heavy traffic. Queueing Systems, 78(1):57--97, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Buyukkoc, P. Varaya, and J. Walrand. The cμ rule revisited. Advances of Applied Probability, 17:237--238, 1985.Google ScholarGoogle ScholarCross RefCross Ref
  17. F. Cecchi and P. Jacko. Nearly-optimal scheduling of users with Markovian time-varying transmission rates. Performance Evaluation, 99--100:16--36, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. cCinlar. Introduction to Stochastic Processes. Prentice-Hall, New Jersey, 1975.Google ScholarGoogle Scholar
  19. N. Ehsan and M. Liu. On the optimality of an index policy for bandwidth allocation with delayed state observation and differentiated services. In Proceedings of IEEE INFOCOM, Hong Kong, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  20. N. Gast and B. Gaujal. A mean field approach for optimization in discrete time. Discrete Event Dynamic Systems, 21(1):63--101, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J.C. Gittins, K.D. Glazebrook, and R.R. Weber. Multi-Armed Bandit Allocation Indices. Wiley, Chichester, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  22. K.D. Glazebrook, C. Kirkbride, and J. Ouenniche. Index policies for the admission control and routing of impatient customers to heterogeneous service stations. Operations Research, 57:975--989, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K.D. Glazebrook and H.M. Mitchell. An index policy for a stochastic scheduling model with improving/deteriorating jobs. Naval Research Logistics, 49:706--721, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  24. D.J. Hodge and K. D. Glazebrook. On the asymptotic optimality of greedy index heuristics for multi-action restless bandits. Advances in Applied Probability, 47:652--667, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  25. M. Larra naga, U. Ayesta, and I.M. Verloop. Index policies for multi-class queues with convex holding cost and abandonments. In Proceedings of ACM SIGMETRICS, Austin TX, USA, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M Larra naga, U Ayesta, and I.M. Verloop. Dynamic control of birth-and-death restless bandits: application to resource-allocation problems. IEEE/ACM Transactions on Networking, 24(6):3812--3825, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. Liu and Q. Zhao. Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access. IEEE Transactions on Information Theory, 56:5547--5567, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Mahajan and D. Teneketzis. Multi-armed bandit problems. In Foundations and Application of Sensor Management, eds. A.O. Hero III, D.A. Castanon, D. Cochran and K. Kastella., pages 121--308, Springer-Verlag, 2007.Google ScholarGoogle Scholar
  29. Y. Nazarathy, T. Taimre, A. Asanjarani, J. Kuhn, B. Patch, and A. Vuorinen. The challenge of stabilizing control for queueing systems with unobservable server states. In IEEE Proceedings of the 5th Australian Control Conference, 2015.Google ScholarGoogle Scholar
  30. J. Ni no-Mora. Dynamic priority allocation via restless bandit marginal productivity indices. TOP, 15:161--198, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  31. J. Ni no-Mora. Marginal productivity index policies for admission control and routing to parallel multi-server loss queues with reneging. Lecture Notes in Computer Science, 4465:138--149, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. R. Norris. Markov chains, volume 2 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 1998. Reprint of 1997 original.Google ScholarGoogle Scholar
  33. W. Ouyang, A. Eryilmaz, and N.B. Shroff. Asymptotically optimal downlink scheduling over Markovian fading channels. In Proceedings of IEEE INFOCOM, Orlando Fl, USA, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  34. M.L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. V. Raghunathan, V. Borkar, M. Cao, and P.R. Kumar. Index policies for real-time multicast scheduling for wireless broadcast systems. In Proceedings of IEEE INFOCOM, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  36. A. Slivkins and E. Upfal. Adapting to a changing environment: The Brownian restless bandits. In Proceedings of 21st Annual Conference on Learning Theory, pages 343--354, 2008.Google ScholarGoogle Scholar
  37. I.M. Verloop. Asymptotic optimal control of multi-class restless bandits. Annals of Applied Probability 26 (4), 1947--1995, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  38. R.R. Weber and G. Weiss. On an index policy for restless bandits. Journal of Applied Probability, 27(03):637--648, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  39. P. Whittle. Restless bandits: Activity allocation in a changing world. Journal of applied probability, 25(A):287--298, 1988.Google ScholarGoogle Scholar
  40. P. Whittle. Optimal Control, Basics and Beyond. John Wiley & Sons, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Asymptotic Optimal Control of Markov-Modulated Restless Bandits

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!