Abstract
We consider combinatorial semi-bandits over a set of arms X \subset \0,1\ ^d where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound R(T) = O( d (łn m)^2 (łn T) / Δ_\min ) after T rounds, where m = \max_x \in X 1^\top x. However, ESCB it has computational complexity O(|X|), which is typically exponential in d, and cannot be used in large dimensions. We propose the first algorithm that is both computationally and statistically efficient for this problem with regret R(T) = O( d (łn m)^2 (łn T) / Δ_\min ) and computational asymptotic complexity O(δ_T^-1 poly(d)), where δ_T is a function which vanishes arbitrarily slowly. Our approach involves carefully designing AESCB, an approximate version of ESCB with the same regret guarantees. We show that, whenever budgeted linear maximization over X can be solved up to a given approximation ratio, AESCB is implementable in polynomial time O(δ_T^-1 poly(d)) by repeatedly maximizing a linear function over X subject to a linear budget constraint, and showing how to solve these maximization problems efficiently.
- Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. 2011. Improved Algorithms for Linear Stochastic Bandits. In Proc. of NIPS .Google Scholar
Digital Library
- Venkatachalam Anantharam, Pravin Varaiya, and Jean Walrand. 1987. Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part I: iid rewards. Automatic Control, IEEE Transactions on , Vol. 32, 11 (1987), 968--976.Google Scholar
Cross Ref
- Alper Atamtü rk and André s Gó mez. 2017. Maximizing a Class of Utility Functions Over the Vertices of a Polytope. Operations Research , Vol. 65, 2 (2017), 433--445.Google Scholar
Digital Library
- Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. 2003. The Nonstochastic Multiarmed Bandit Problem. SIAM J. Comput. , Vol. 32, 1 (2003), 48--77.Google Scholar
Digital Library
- Andre Berger, Vincenzo Bonifaci, Fabrizio Gandoni, and Guido Shaefer. 2011. Budgeted Matching and Budgeted Matroid intersection via the Gasoline Puzzle. Mathematical Programming (2011).Google Scholar
- Jeff. Bezanson, Alan. Edelman, Stefan. Karpinski, and Viral B. Shah. 2017. Julia: A Fresh Approach to Numerical Computing. SIAM Rev. , Vol. 59, 1 (2017), 65--98.Google Scholar
Digital Library
- Olivier Cappé, Aurelien Garivier, Odalric Maillard, Remi Munos, and Gilles Stoltz. 2013. Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation. Annals of Statistics , Vol. 41, 3 (June 2013), 516--541.Google Scholar
Cross Ref
- Nicolò Cesa-Bianchi and Gábor Lugosi. 2012. Combinatorial bandits. J. Comput. System Sci. , Vol. 78, 5 (2012), 1404--1422.Google Scholar
Digital Library
- Wei Chen, Yajun Wang, and Yang Yuan. 2013. Combinatorial Multi-Armed Bandit: General Framework, Results and Applications. In Proc. of ICML .Google Scholar
- Wei Chu, Lihong Li, Lev Reyzin, and Robert Schapire. 2011. Contextual Bandits with Linear Payoff Functions. In Proc. of AISTATS .Google Scholar
- Chris Coey, Miles Lubin, and Juan Pablo Vielma. 2020. Outer approximation with conic certificates for mixed-integer convex problems. Math. Program. Comput. , Vol. 12, 2 (2020), 249--293.Google Scholar
- Richard Combes, M. Sadegh Talebi, Alexandre Proutiere, and Marc Lelarge. 2015. Combinatorial Bandits Revisited. In Proc. of NIPS .Google Scholar
- Varsha Dani, Thomas P. Hayes, and Sham M. Kakade. 2008. Stochastic Linear Optimization under Bandit Feedback. In Proc. of COLT . 355--366.Google Scholar
- Remy Degenne and Vianney Perchet. 2016. Combinatorial semi-bandit with known covariance. In Proc. of NIPS .Google Scholar
- Branislav Kveton, Zheng Wen, Azin Ashkan, Hoda Eydgahi, and Brian Eriksson. 2014. Matroid Bandits: Fast Combinatorial Optimization with Learning. In Proc. of UAI .Google Scholar
- Branislav Kveton, Zheng Wen, Azin Ashkan, and Csaba Szepesvari. 2015. Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits. In Proc. of AISTATS .Google Scholar
- Tze Leung Lai and Herbert Robbins. 1985. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics , Vol. 6, 1 (1985), 4--2.Google Scholar
Digital Library
- Miguel Sousa Lobo, Lieven Vandenberghe, Stephen Boyd, and Hervé Lebret. 1998. Applications of second-order cone programming. Linear Algebra Appl. , Vol. 284, 1 (1998), 193 -- 228.Google Scholar
Cross Ref
- Nimrod Megiddo. 1981. Applying parallel computation algorithms in the design of serial algorithms. In Proc. of FOCS .Google Scholar
Digital Library
- James G. Oxley. 2006. Matroid Theory (Oxford Graduate Texts in Mathematics) .Oxford University Press, Inc., USA.Google Scholar
- Pierre Perrault, Vianney Perchet, and Michal Valko. 2019. Exploiting structure of uncertainty for efficient matroid semi-bandits. In Proc. of ICML .Google Scholar
- R. Ravi and Michel X. Goemans. 1996. The constrained minimum spanning tree problem. SWAT (1996).Google Scholar
- Idan Rejwan and Yishay Mansour. 2020. Top-k Combinatorial Bandits with Full-Bandit Feedback (Proc. of ALT).Google Scholar
- Herbert Robbins. 1952. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. , Vol. 58, 5 (1952), 527--535.Google Scholar
Cross Ref
- M. Sadegh Talebi and Alexandre Proutiere. 2016. An Optimal Algorithm for Stochastic Matroid Bandit Optimization. In Proc. of ICAAMS .Google Scholar
- Siwei Wang and Wei Chen. 2018. Thompson Sampling for Combinatorial Semi-Bandits. In Proc. of ICML .Google Scholar
- Zheng Wen, Branislav Kveton, and Azin Ashkan. 2015. Efficient learning in large-scale combinatorial semi-bandits. In Proc. of ICML .Google Scholar
Index Terms
Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits
Recommendations
Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits
SIGMETRICS '21: Abstract Proceedings of the 2021 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer SystemsWe consider combinatorial semi-bandits over a set X ⊂ (0,1)d where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound R(T) = O ( d (łn m)2 (łn T) øver Δmin) after T rounds, where m = maxx ∈ ...
Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits
SIGMETRICS '21We consider combinatorial semi-bandits over a set X ⊂ (0,1)d where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound R(T) = O ( d (łn m)2 (łn T) øver Δmin) after T rounds, where m = maxx ∈ ...
Budgeted Combinatorial Multi-Armed Bandits
AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent SystemsWe consider a budgeted combinatorial multi-armed bandit setting where, in every round, the algorithm selects a super-arm consisting of one or more arms. The goal is to minimize the total expected regret after all rounds within a limited budget. Existing ...






Comments