Abstract
In many online learning paradigms, convexity plays a central role in the derivation and analysis of online learning algorithms. The results, however, fail to be extended to the non-convex settings, while non-convexity is necessitated by a large number of recent applications. The Online Non-Convex Learning (ønco) problem generalizes the classic Online Convex Optimization (øco) framework by relaxing the convexity assumption on the cost function (to a Lipschitz continuous function) and the decision set. The state-of-the-art result for the ønco demonstrates that the classic online exponential weighting algorithm attains a sublinear regret of $O(\sqrtTłog T )$. The regret lower bound for the øco, however, is $Ømega(\sqrtT )$, and to the best of our knowledge, there is no result in the context of the ønco problem achieving the same bound. This paper proposes the Online Recursive Weighting (\rw) algorithm with regret of $O(\sqrtT )$, matching the tight regret lower bound for the øco problem, and fills the regret gap between the state-of-the-art results in the online convex and non-convex optimization problems.
- J. Abernethy, E. Hazan, and A. Rakhlin. Competing in the dark: An efficient algorithm for bandit linear optimization. In COLT, pages 263--274, 2008.Google Scholar
- A. Agarwal, O. Dekel, and L. Xiao. Optimal algorithms for online convex optimization with multi-point bandit feedback. In COLT, pages 28--40. Citeseer, 2010.Google Scholar
- R. Agrawal. The continuum-armed bandit problem. SIAM journal on control and optimization, 33(6):1926--1951, 1995. Google Scholar
Digital Library
- M. Akbari, B. Gharesifard, and T. Linder. Distributed online convex optimization on time-varying directed graphs. IEEE Transactions on Control of Network Systems, 2015.Google Scholar
- D. Ardia, K. Boudt, P. Carl, K. M. Mullen, and B. Peterson. Differential evolution (deoptim) for non-convex portfolio optimization. 2010.Google Scholar
- P. Auer, N. Cesa-Bianchi, Y. Freund, and R. Schapire. The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48--77, 2002. Google Scholar
Digital Library
- B. Awerbuch and R. Kleinberg. Online linear optimization and adaptive routing. Journal of Computer and System Sciences, 74(1):97--114, 2008. Google Scholar
Digital Library
- N. Cesa-Bianchi, Y. Freund, D. Haussler, D. P. Helmbold, R. E. Schapire, and M. K. Warmuth. How to use expert advice. Journal of the ACM (JACM), 44(3):427--485, 1997. Google Scholar
Digital Library
- R. Combes, S. Magureanu, A. Proutiere, and C. Laroche. Learning to rank: Regret lower bounds and efficient algorithms. ACM SIGMETRICS Performance Evaluation Review, 43(1):231--244, 2015. Google Scholar
Digital Library
- T. Cover. Universal portfolios. Mathematical finance, 1(1):1--29, 1991.Google Scholar
- O. Dekel, R. Eldan, and T. Koren. Bandit smooth convex optimization: Improving the bias-variance tradeoff. In Advances in Neural Information Processing Systems (NIPS), pages 2926--2934, 2015. Google Scholar
Digital Library
- S. Ertekin, L. Bottou, and C. Giles. Non-convex online support vector machines. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(2):368--381, 2011. Google Scholar
Digital Library
- A. Flaxman, A. Kalai, and H. McMahan. Online convex optimization in the bandit setting: gradient descent without a gradient. In Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms (SODA), pages 385--394, 2005. Google Scholar
Digital Library
- Y. Freund and R. E. Schapire. A desicion-theoretic generalization of on-line learning and an application to boosting. In European conference on computational learning theory, pages 23--37. Springer, 1995. Google Scholar
Digital Library
- G. Gasso, L. Pappaioannou, M. Spivak, and L. Bottou. Batch and online learning algorithms for nonconvex neymanpearson classification. ACM Transactions on Intelligent Systems and Technology, 2(3):28, 2011. Google Scholar
Digital Library
- A. J. Grove, N. Littlestone, and D. Schuurmans. General convergence results for linear discriminant updates. Machine Learning, 43(3):173--210, 2001. Google Scholar
Digital Library
- T. Guzella and W. Caminhas. A review of machine learning approaches to spam filtering. Expert Systems with Applications, 36(7):10206--10222, 2009. Google Scholar
Digital Library
- E. Hazan. Introduction to online convex optimization. Foundations and Trends in Optimization, 2(3--4):157--325, 2016. Google Scholar
Digital Library
- E. Hazan and S. Kale. Online submodular minimization. Journal of Machine Learning Research, 13(Oct):2903--2922, 2012. Google Scholar
Digital Library
- E. Hazan and S. Kale. Beyond the regret minimization barrier: optimal algorithms for stochastic strongly-convex optimization. Journal of Machine Learning Research, 15(1):2489--2512, 2014. Google Scholar
Digital Library
- E. Hazan and K. Levy. Bandit convex optimization: Towards tight bounds. In Advances in Neural Information Processing Systems (NIPS), pages 784--792, 2014. Google Scholar
Digital Library
- E. Hazan and Y. Li. An optimal algorithm for bandit convex optimization. arXiv preprint arXiv:1603.04350.Google Scholar
- S. Hosseini, A. Chapman, and M. Mesbahi. Online distributed convex optimization on dynamic networks. IEEE Transactions on Automatic Control, 61(11):3545--3550, 2016.Google Scholar
Cross Ref
- P. Jain, B. Kulis, I. S. Dhillon, and K. Grauman. Online metric learning and fast similarity search. In Advances in neural information processing systems, pages 761--768, 2009. Google Scholar
Digital Library
- A. Kalai and S. Vempala. Efficient algorithms for universal portfolios. Journal of Machine Learning Research, 3(Nov):423-- 440, 2002. Google Scholar
Digital Library
- J. Kivinen and M. K. Warmuth. Relative loss bounds for multidimensional regression problems. In Advances in neural information processing systems (NIPS), pages 287--293, 1998. Google Scholar
Digital Library
- R. Kleinberg and A. Slivkins. Sharp dichotomies for regret minimization in metric spaces. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 827--846. Society for Industrial and Applied Mathematics, 2010. Google Scholar
Digital Library
- R. Kleinberg, A. Slivkins, and E. Upfal. Multi-armed bandits in metric spaces. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pages 681--690. ACM, 2008. Google Scholar
Digital Library
- R. Kleinberg, A. Slivkins, and E. Upfal. Bandits and experts in metric spaces. arXiv preprint arXiv:1312.1277, 2013.Google Scholar
- R. D. Kleinberg. Nearly tight bounds for the continuum-armed bandit problem. In Advances in Neural Information Processing Systems, pages 697--704, 2005. Google Scholar
Digital Library
- W. Krichene, M. Balandat, C. Tomlin, and A. Bayen. The hedge algorithm on a continuum. In the 32nd International Conference on Machine Learning (ICML-15), pages 824--832, 2015. Google Scholar
Digital Library
- P. Krokhmal, J. Palmquist, and S. Uryasev. Portfolio optimization with conditional value-at-risk objective and constraints. Journal of risk, 4:43--68, 2002.Google Scholar
Cross Ref
- S. Lee, A. Nedich, and M. Raginsky. Stochastic dual averaging for decentralized online optimization on time-varying communication graphs. IEEE Transactions on Automatic Control, 2017.Google Scholar
Cross Ref
- N. Littlestone and M. Warmuth. The weighted majority algorithm. Information and computation, 108(2):212--261, 1994. Google Scholar
Digital Library
- S. Magureanu, A. Proutiere, M. Isaksson, and B. Zhang. Online learning of optimally diverse rankings. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 1(2):32, 2017. Google Scholar
Digital Library
- O.-A. Maillard and R. Munos. Online learning in adversarial lipschitz environments. Machine Learning and Knowledge Discovery in Databases, pages 305--320, 2010. Google Scholar
Digital Library
- L. Mason, P. L. Bartlett, and J. Baxter. Improved generalization through explicit optimization of margins. Machine Learning, 38(3):243--255, 2000. Google Scholar
Digital Library
- S. Pandey, D. Agarwal, D. Chakrabarti, and V. Josifovski. Bandits for taxonomies: A model-based approach. In Proceedings of the 2007 SIAM International Conference on Data Mining, pages 216--227. SIAM, 2007.Google Scholar
Cross Ref
- A. Rakhlin, O. Shamir, and K. Sridharan. Making gradient descent optimal for strongly convex stochastic optimization. In Proceedings of the 29th International Conference on Machine Learning (ICML-12), pages 449--456, 2012. Google Scholar
Digital Library
- H. Robbins and S. Monro. A stochastic approximation method. The annals of mathematical statistics, pages 400--407, 1951.Google Scholar
- A. Saha and A. Tewari. Improved regret guarantees for online smooth convex optimization with bandit feedback. In AISTATS, pages 636--642, 2011.Google Scholar
- D. Sculley and G. Wachman. Relaxed online svms for spam filtering. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 415--422, 2007. Google Scholar
Digital Library
- S. Shahrampour and A. Jadbabaie. Distributed online optimization in dynamic environments using mirror descent. IEEE Transactions on Automatic Control, 2017.Google Scholar
- O. Shamir and T. Zhang. Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes. In International Conference on Machine Learning, pages 71--79, 2013. Google Scholar
Digital Library
- M. S. Talebi, Z. Zou, R. Combes, A. Proutiere, and M. Johansson. Stochastic online shortest path routing: The value of feedback. IEEE Transactions on Automatic Control, 2017.Google Scholar
- S. Uryasev. Conditional value-at-risk: Optimization algorithms and applications. In Computational Intelligence for Financial Engineering, 2000.(CIFEr) Proceedings of the IEEE/IAFE/INFORMS 2000 Conference on, pages 49--57. IEEE, 2000.Google Scholar
- F. Wauthier, M. Jordan, and N. Jojic. Efficient ranking from pairwise comparisons. In International Conference on Machine Learning (ICML), pages 109--117, 2013. Google Scholar
Digital Library
- W. Wong and C. Sung. Robust convergence of low-data rate-distributed controllers. IEEE transactions on automatic control, 49(1):82--87, 2004.Google Scholar
- H. H. Zhang, J. Ahn, X. Lin, and C. Park. Gene selection using support vector machines with non-convex penalty. bioinformatics, 22(1):88--95, 2005. Google Scholar
Digital Library
- L. Zhang, T. Yang, R. Jin, and Z. Zhou. Online bandit learning for a special class of non-convex losses. In AAAI, pages 3158--3164, 2015. Google Scholar
Digital Library
- M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning (ICML), pages 928--936, 2003. Google Scholar
Digital Library
Index Terms
An Optimal Algorithm for Online Non-Convex Learning
Recommendations
An Optimal Algorithm for Online Non-Convex Learning
SIGMETRICS '18: Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer SystemsIn many online learning paradigms, convexity plays a central role in the derivation and analysis of online learning algorithms. The results, however, fail to be extended to the non-convex settings, which are necessitated by tons of recent applications. ...
An Optimal Algorithm for Online Non-Convex Learning
SIGMETRICS '18In many online learning paradigms, convexity plays a central role in the derivation and analysis of online learning algorithms. The results, however, fail to be extended to the non-convex settings, which are necessitated by tons of recent applications. ...
Regret bounded by gradual variation for online convex optimization
Recently, it has been shown that the regret of the Follow the Regularized Leader (FTRL) algorithm for online linear optimization can be bounded by the total variation of the cost vectors rather than the number of rounds. In this paper, we extend this ...






Comments