Abstract
We study the online decision problem in which the set of available actions varies over time, also called the sleeping experts problem. We consider the setting in which the performance comparison is made with respect to the best ordering of actions in hindsight. In this article, both the payoff function and the availability of actions are adversarial. Kleinberg et al. [2010] gave a computationally efficient no-regret algorithm in the setting in which payoffs are stochastic. Kanade et al. [2009] gave an efficient no-regret algorithm in the setting in which action availability is stochastic.
However, the question of whether there exists a computationally efficient no-regret algorithm in the adversarial setting was posed as an open problem by Kleinberg et al. [2010]. We show that such an algorithm would imply an algorithm for PAC learning DNF, a long-standing important open problem. We also consider the setting in which the number of available actions is restricted and study its relation to agnostic-learning monotone disjunctions over examples with bounded Hamming weight.
- J. Abernethy. 2010. Can we learn to gamble efficiently? (open problem). In Proceedings of the 23rd Annual Conference on Learning Theory. 318--319.Google Scholar
- S. Arora, C. Lund, R. Motwani, M. Sudan, and M. Szegedy. 1998. Proof verification and the hardness of approximation problems. J. ACM 45, 501--555. Google Scholar
Digital Library
- S. Ben-David, D. Pál, and S. Shalev-Shwartz. 2009. Agnostic online learning. In Proceedings of the 22nd Annual Conference on Learning Theory.Google Scholar
- A. Beygelzimer, J. Langford, L. Li, L. Reyzin, and R. E. Schapire. 2011. Contextual bandit algorithms with supervised learning guarantees. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. JMLR Proceedings Track, 19--26.Google Scholar
- A. Blum and Y. Mansour. 2007. From external to internal regret. J. Machine Learn. Res. 8, 1307--1324. Google Scholar
Digital Library
- N. Cesa-Bianchi, A. Conconi, and C. Gentile. 2004. On the generalization ability of on-line learning algorithms. IEEE Trans. Inform. Theory 50, 9, 2050--2057. Google Scholar
Digital Library
- N. Cesa-Bianchi and G. Lugosi. 2006. Prediction, Learning, and Games. Cambridge University Press. Google Scholar
Digital Library
- D. P. Dubhashi and A. Panconesi. 2009. Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press. Google Scholar
Digital Library
- M. Dudik, D. Hsu, S. Kale, N. Karampatziakis, J. Langford, L. Reyzin, and T. Zhang. 2011. Efficient optimal learning for contextual bandits. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelliegence, 169--178.Google Scholar
- Y. Freund and R. E. Schapire. 1995. A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the 2nd European Conference on Computational Learning Theory. 23--37. Google Scholar
Digital Library
- Y. Freund, R. E. Schapire, Y. Singer, and M. K. Warmuth. 1997. Using and combining predictors that specialize. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing. ACM, New York, NY, 334--343. Google Scholar
Digital Library
- J. Håstad. 2001. Some optimal inapproximability results. J. ACM 48, 798--859. Google Scholar
Digital Library
- D. Haussler. 1992. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Inform. Computat. 100, 78--150. Google Scholar
Digital Library
- E. Hazan, S. Kale, and S. Shalev-Shwartz. 2012. Near-optimal algorithms for online matrix prediction. In Proceedings of the 25th Annual Conference on Learning Theory, Vol. 23, JMLR Proceedings Track, 38.1--38.13.Google Scholar
- A. T. Kalai, V. Kanade, and Y. Mansour. 2009. Reliable agnostic learning. In Proceedings of the 22nd Annual Conference on Learning Theory.Google Scholar
- A. T. Kalai, A. R. Klivans, Y. Mansour, and R. A. Servedio. 2005. Agnostically learning halfspaces. In Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science. IEEE. Google Scholar
Digital Library
- V. Kanade, B. McMahan, and B. Bryan. 2009. Sleeping experts and bandits with stochastic action availability and adversarial rewards. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics. JMLR Proceedings Track, 272--279.Google Scholar
- M. J. Kearns, R. E. Schapire, and L. M. Sellie. 1994. Toward efficient agnostic learning. Machine Learn. 17, 2--3, 115--141. Google Scholar
Digital Library
- R. Kleinberg, A. Niculescu-Mizil, and Y. Sharma. 2010. Regret bounds for sleeping experts and bandits. Machine Learn. 80, 2--3, 245--272. Google Scholar
Digital Library
- A. R. Klivans and R. A. Servedio. 2001. Learning DNF in time 2Õ(n1/3). In Proceedings of the 33rd Annual ACM Symposium on Theory of Computing. ACM, New York, NY, 258--265. Google Scholar
Digital Library
- A. R. Klivans and A. Sherstov. 2007. A lower bound for agnostically learning disjunctions. In Proceedings of the 20th Annual Conference on Learning Theory. 409--423. Google Scholar
Digital Library
- J. Langford and T. Zhang. 2007. The epoch-greedy algorithm for contextual multi-armed bandits. In Proceedings of the 23rd Annual Conference on Neural Information Processing Systems.Google Scholar
- N. Littlestone. 1989. From on-line to batch learning. In Proceedings of the 2nd Annual Workshop on Computational Learning Theory. 269--284. Google Scholar
Digital Library
- C. H. Papadimitriou and M. Yannakakis. 1991. Optimization, approximation, and complexity classes. J. Comput. System Sci. 43, 3, 425--440.Google Scholar
Cross Ref
- S. Shalev-Shwartz, O. Shamir, and K. Sridharan. 2010. Learning kernel-based halfspaces with the zero-one loss. In Proceedings of the 23rd Annual Conference on Learning Theory. 441--450.Google Scholar
- L. G. Valiant. 1984. A theory of the learnable. Commun. ACM 27, 11, 1134--1142. Google Scholar
Digital Library
Index Terms
Learning Hurdles for Sleeping Experts
Recommendations
Learning hurdles for sleeping experts
ITCS '12: Proceedings of the 3rd Innovations in Theoretical Computer Science ConferenceWe study the online decision problem where the set of available actions varies over time, also called the sleeping experts problem. We consider the setting where the performance comparison is made with respect to the best ordering of actions in ...
Bandits with switching costs: T2/3 regret
STOC '14: Proceedings of the forty-sixth annual ACM symposium on Theory of computingWe study the adversarial multi-armed bandit problem in a setting where the player incurs a unit cost each time he switches actions. We prove that the player's T-round minimax regret in this setting is [EQUATION], thereby closing a fundamental gap in our ...
Regret to the best vs. regret to the average
We study online regret minimization algorithms in an experts setting. In this setting, the algorithm chooses a distribution over experts at each time step and receives a gain that is a weighted average of the experts' instantaneous gains. We consider a ...






Comments