Abstract
This paper addresses a generic sequential resource allocation problem, where in each round a decision maker selects an allocation of resources (servers) to a set of tasks consisting of a large number of jobs. A job of task i assigned to server j is successfully treated with probability θ_ij $ in a round, and the decision maker is informed on whether this job is completed at the end of the round. The probabilities θ_ij $'s are initially unknown and have to be learned. The objective of the decision maker is to sequentially assign jobs of various tasks to servers so that it rapidly learns and converges to the Proportionally Fair (PF) allocation (or other similar allocations achieving an appropriate trade-off between efficiency and fairness). We formulate the problem as a multi-armed bandit (MAB) optimization problem, and devise sequential assignment algorithms with low regret (defined as the difference in utility achieved by an oracle algorithm aware of the θ_ij $'s and by the proposed algorithm over a given number of slots). We first provide the properties of the so-called Restricted-PF (RPF) allocation, obtained by assuming that each task can only use a single server, and in particular show that it is very close to the PF allocation. We devise ES-RPF, an algorithm that learns the RPF allocation with regret no greater than $\mathcal O \bigl(m^3øver θ_\min Δ_\min łog(T)\big)$ after T slots, where m , θ_\min $, and Δ_\min $ represent the number of tasks, the minimum success rate $\min_i,j θ_ij $, and an appropriately defined notion of gap, respectively. We further provide regret lower bounds satisfied by any algorithm targeting the RPF allocation. Finally, we present ES-PF, an algorithm directly learning the PF allocation, and prove that its regret does not exceed $\mathcal O \bigl(\fracm^2s θ_\min \sqrtT łog(T)\big)$ after T slots, where s denotes the number of servers.
- Alekh Agarwal, Dean P. Foster, Daniel J. Hsu, Sham M. Kakade, and Alexander Rakhlin. 2011. Stochastic Convex Optimization with Bandit Feedback. In Advances in Neural Information Processing Systems 24 (NIPS). 1035--1043 Google Scholar
Digital Library
- Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. 2002. Finite Time Analysis of the Multiarmed Bandit Problem. Machine Learning 47, 2--3 (2002), 235--256. Google Scholar
Digital Library
- Thomas Bonald, Laurent Massoulié, Alexandre Proutière, and Jorma T. Virtamo. 2006. A Queueing Analysis of Max-min Fairness, Proportional Fairness and Balanced Fairness. Queueing Systems 53, 1--2 (2006), 65--84 Google Scholar
Digital Library
- Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press. Google Scholar
Digital Library
- Sébastien Bubeck and Nicolò Cesa-Bianchi. 2012. Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems. Foundations and Trends in Machine Learning 5, 1 (2012), 1--222Google Scholar
Cross Ref
- Wei Chen, Wei Hu, Fu Li, Jian Li, Yu Liu, and Pinyan Lu. 2016. Combinatorial Multi-Armed Bandit with General Reward Functions. In Advances in Neural Information Processing Systems 29 (NIPS). 1651--1659 Google Scholar
Digital Library
- Wei Chen, Yajun Wang, and Yang Yuan. 2013. Combinatorial Multi-Armed Bandit: General Framework and Applications. In Proceedings of the 30th International Conference on Machine Learning (ICML). 151--159. Google Scholar
Digital Library
- Richard Combes and Alexandre Proutière.2014. Unimodal Bandits without Smoothness. arXiv:1406.7447 (2014).Google Scholar
- Richard Combes and Alexandre Proutiere. 2014. Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms. arXiv:1405.5096 {cs.LG} (2014). http://arxiv.org/abs/1405.5096 Google Scholar
Digital Library
- Richard Combes, Mohammad Sadegh Talebi, Alexandre Proutiere, and Marc Lelarge. 2015. Combinatorial Bandits Revisited. In Advances in Neural Information Processing Systems 28 (NIPS). 2107--2115 Google Scholar
Digital Library
- Imre Csiszár and Paul C. Shields. 2004. Information Theory and Statistics: A Tutorial. Foundations and Trends in Communications and Information Theory 1, 4 (2004), 417--528. Google Scholar
Digital Library
- Varsha Dani, Thomas P. Hayes, and Sham M. Kakade. 2008. Stochastic Linear Optimization under Bandit Feedback.. In Proceedings of the 21st Annual Conference on Learning Theory (COLT). 355--366.Google Scholar
- Yi Gai, Bhaskar Krishnamachari, and Rahul Jain. 2012. Combinatorial Network Optimization with Unknown Variables: Multi-Armed Bandits with Linear Rewards and Individual Observations. IEEE/ACM Transactions on Networking (TON) 20, 5 (2012), 1466--1478 Google Scholar
Digital Library
- Aurélien Garivier and Olivier Cappé. 2011. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond. In Proceedings of the 24th Annual Conference on Learning Theory (COLT). 359--376.Google Scholar
- Aurélien Garivier and Gilles Ménard, Pierre Stoltz. Jan. 2018. Explore First, Exploit Next: The True Shape of Regret in Bandit Problems. Mathematics of Operations Research (Jan. 2018).Google Scholar
- Todd L. Graves and Tze Leung Lai. 1997. Asymptotically Efficient Adaptive Choice of Control Laws in Controlled Markov Chains. SIAM Journal on Control and Optimization 35, 3 (1997), 715--743. Google Scholar
Digital Library
- Ramesh Johari, Vijay Kamble, and Yash Kanoria. 2017. Matching while Learning. In Proceedings of 18th ACM Conference on Economics and Computation (EC) Google Scholar
Digital Library
- Emilie Kaufmann, Olivier Cappé, and Aurélien Garivier. 2016. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models. The Journal of Machine Learning Research 17, 1 (Jan. 2016), 1--42. http://dl.acm.org/ citation.cfm?id=2946645.2946646 Google Scholar
Digital Library
- Frank P. Kelly, Aman K. Maulloo, and David K. H. Tan. 1998. Rate Control for Communication Networks: Shadow Prices, Proportional Fairness and Stability. Journal of the Operational Research Society 49, 3 (1998), 237--252.Google Scholar
Cross Ref
- Branislav Kveton, Zheng Wen, Azin Ashkan, and Csaba Szepesvári. 2015. Combinatorial Cascading Bandits. In Advances in Neural Information Processing Systems 28 (NIPS). 1450--1458. Google Scholar
Digital Library
- Branislav Kveton, Zheng Wen, Azin Ashkan, and Csaba Szepesvári. 2015. Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits. In Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS). 535--543Google Scholar
- Tze Leung Lai and Herbert Robbins. 1985. Asymptotically Efficient Adaptive Allocation Rules. Advances in Applied Mathematics 6, 1 (1985), 4--22. Google Scholar
Digital Library
- Tor Lattimore, Koby Crammer, and Csaba Szepesvári. 2014. Optimal Resource Allocation with Semi-Bandit Feedback. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI). 477--486. Google Scholar
Digital Library
- Tor Lattimore, Koby Crammer, and Csaba Szepesvári. 2015. Linear Multi-Resource Allocation with Semi-bandit Feedback. In Advances in Neural Information Processing Systems 28 (NIPS). 964--972. Google Scholar
Digital Library
- Jeonghoon Mo and Jean Walrand. 2000. Fair End-to-End Window-Based Congestion Control. IEEE/ACM Transactions on Networking 8, 5 (2000), 556--567 Google Scholar
Digital Library
- Alexander Schrijver. 2000. A Course in Combinatorial Optimization. TU DelftGoogle Scholar
- Mohammad Sadegh Talebi and Odalric-Ambrym Maillard. 2018. Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs. In International Conference on Algorithmic Learning Theory (ALT). 770--805Google Scholar
- Mohammad Sadegh Talebi, Zhenhua Zou, Richard Combes, Alexandre Proutiere, and Mikael Johansson. 2018. Stochastic Online Shortest Path Routing: The Value of Feedback. IEEE Transactions on Automatic Control 63, 4 (2018), 915--930.Google Scholar
Cross Ref
- Flemming Topsøe. 2006. Some Bounds for the Logarithmic Function. Inequality Theory and Applications 4 (2006), 137Google Scholar
- David Tse and Pramod Viswanath. 2005. Fundamentals of Wireless Communication. Cambridge University Press. Google Scholar
Digital Library
- Se-Young Yun and Alexandre Proutiere. 2015. Distributed Proportional Fair Load Balancing in Heterogenous Systems. In Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. 17--30. Google Scholar
Digital Library
Index Terms
Learning Proportionally Fair Allocations with Low Regret
Recommendations
Learning Proportionally Fair Allocations with Low Regret
SIGMETRICS '18: Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer SystemsWe address the problem of learning Proportionally Fair (PF) allocations in parallel server systems with unknown service rates. We provide the first algorithms, to our knowledge, for learning such allocations with sub-linear regret.
Learning Proportionally Fair Allocations with Low Regret
SIGMETRICS '18We address the problem of learning Proportionally Fair (PF) allocations in parallel server systems with unknown service rates. We provide the first algorithms, to our knowledge, for learning such allocations with sub-linear regret.
Fair Round-Robin: A Low Complexity Packet Schduler with Proportional and Worst-Case Fairness
Round robin based packet schedulers generally have a low complexity and provide long-term fairness. The main limitation of such schemes is that they do not support short-term fairness. In this paper, we propose a new low complexity round robin scheduler,...






Comments