skip to main content
research-article

Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits

Published:22 February 2021Publication History
Skip Abstract Section

Abstract

We consider combinatorial semi-bandits over a set of arms X \subset \0,1\ ^d where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound R(T) = O( d (łn m)^2 (łn T) / Δ_\min ) after T rounds, where m = \max_x \in X 1^\top x. However, ESCB it has computational complexity O(|X|), which is typically exponential in d, and cannot be used in large dimensions. We propose the first algorithm that is both computationally and statistically efficient for this problem with regret R(T) = O( d (łn m)^2 (łn T) / Δ_\min ) and computational asymptotic complexity O(δ_T^-1 poly(d)), where δ_T is a function which vanishes arbitrarily slowly. Our approach involves carefully designing AESCB, an approximate version of ESCB with the same regret guarantees. We show that, whenever budgeted linear maximization over X can be solved up to a given approximation ratio, AESCB is implementable in polynomial time O(δ_T^-1 poly(d)) by repeatedly maximizing a linear function over X subject to a linear budget constraint, and showing how to solve these maximization problems efficiently.

References

  1. Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. 2011. Improved Algorithms for Linear Stochastic Bandits. In Proc. of NIPS .Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Venkatachalam Anantharam, Pravin Varaiya, and Jean Walrand. 1987. Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part I: iid rewards. Automatic Control, IEEE Transactions on , Vol. 32, 11 (1987), 968--976.Google ScholarGoogle ScholarCross RefCross Ref
  3. Alper Atamtü rk and André s Gó mez. 2017. Maximizing a Class of Utility Functions Over the Vertices of a Polytope. Operations Research , Vol. 65, 2 (2017), 433--445.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. 2003. The Nonstochastic Multiarmed Bandit Problem. SIAM J. Comput. , Vol. 32, 1 (2003), 48--77.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Andre Berger, Vincenzo Bonifaci, Fabrizio Gandoni, and Guido Shaefer. 2011. Budgeted Matching and Budgeted Matroid intersection via the Gasoline Puzzle. Mathematical Programming (2011).Google ScholarGoogle Scholar
  6. Jeff. Bezanson, Alan. Edelman, Stefan. Karpinski, and Viral B. Shah. 2017. Julia: A Fresh Approach to Numerical Computing. SIAM Rev. , Vol. 59, 1 (2017), 65--98.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Olivier Cappé, Aurelien Garivier, Odalric Maillard, Remi Munos, and Gilles Stoltz. 2013. Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation. Annals of Statistics , Vol. 41, 3 (June 2013), 516--541.Google ScholarGoogle ScholarCross RefCross Ref
  8. Nicolò Cesa-Bianchi and Gábor Lugosi. 2012. Combinatorial bandits. J. Comput. System Sci. , Vol. 78, 5 (2012), 1404--1422.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Wei Chen, Yajun Wang, and Yang Yuan. 2013. Combinatorial Multi-Armed Bandit: General Framework, Results and Applications. In Proc. of ICML .Google ScholarGoogle Scholar
  10. Wei Chu, Lihong Li, Lev Reyzin, and Robert Schapire. 2011. Contextual Bandits with Linear Payoff Functions. In Proc. of AISTATS .Google ScholarGoogle Scholar
  11. Chris Coey, Miles Lubin, and Juan Pablo Vielma. 2020. Outer approximation with conic certificates for mixed-integer convex problems. Math. Program. Comput. , Vol. 12, 2 (2020), 249--293.Google ScholarGoogle Scholar
  12. Richard Combes, M. Sadegh Talebi, Alexandre Proutiere, and Marc Lelarge. 2015. Combinatorial Bandits Revisited. In Proc. of NIPS .Google ScholarGoogle Scholar
  13. Varsha Dani, Thomas P. Hayes, and Sham M. Kakade. 2008. Stochastic Linear Optimization under Bandit Feedback. In Proc. of COLT . 355--366.Google ScholarGoogle Scholar
  14. Remy Degenne and Vianney Perchet. 2016. Combinatorial semi-bandit with known covariance. In Proc. of NIPS .Google ScholarGoogle Scholar
  15. Branislav Kveton, Zheng Wen, Azin Ashkan, Hoda Eydgahi, and Brian Eriksson. 2014. Matroid Bandits: Fast Combinatorial Optimization with Learning. In Proc. of UAI .Google ScholarGoogle Scholar
  16. Branislav Kveton, Zheng Wen, Azin Ashkan, and Csaba Szepesvari. 2015. Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits. In Proc. of AISTATS .Google ScholarGoogle Scholar
  17. Tze Leung Lai and Herbert Robbins. 1985. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics , Vol. 6, 1 (1985), 4--2.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Miguel Sousa Lobo, Lieven Vandenberghe, Stephen Boyd, and Hervé Lebret. 1998. Applications of second-order cone programming. Linear Algebra Appl. , Vol. 284, 1 (1998), 193 -- 228.Google ScholarGoogle ScholarCross RefCross Ref
  19. Nimrod Megiddo. 1981. Applying parallel computation algorithms in the design of serial algorithms. In Proc. of FOCS .Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. James G. Oxley. 2006. Matroid Theory (Oxford Graduate Texts in Mathematics) .Oxford University Press, Inc., USA.Google ScholarGoogle Scholar
  21. Pierre Perrault, Vianney Perchet, and Michal Valko. 2019. Exploiting structure of uncertainty for efficient matroid semi-bandits. In Proc. of ICML .Google ScholarGoogle Scholar
  22. R. Ravi and Michel X. Goemans. 1996. The constrained minimum spanning tree problem. SWAT (1996).Google ScholarGoogle Scholar
  23. Idan Rejwan and Yishay Mansour. 2020. Top-k Combinatorial Bandits with Full-Bandit Feedback (Proc. of ALT).Google ScholarGoogle Scholar
  24. Herbert Robbins. 1952. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. , Vol. 58, 5 (1952), 527--535.Google ScholarGoogle ScholarCross RefCross Ref
  25. M. Sadegh Talebi and Alexandre Proutiere. 2016. An Optimal Algorithm for Stochastic Matroid Bandit Optimization. In Proc. of ICAAMS .Google ScholarGoogle Scholar
  26. Siwei Wang and Wei Chen. 2018. Thompson Sampling for Combinatorial Semi-Bandits. In Proc. of ICML .Google ScholarGoogle Scholar
  27. Zheng Wen, Branislav Kveton, and Azin Ashkan. 2015. Efficient learning in large-scale combinatorial semi-bandits. In Proc. of ICML .Google ScholarGoogle Scholar

Index Terms

  1. Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!