skip to main content
research-article

An Optimal Algorithm for Online Non-Convex Learning

Authors Info & Claims
Published:13 June 2018Publication History
Skip Abstract Section

Abstract

In many online learning paradigms, convexity plays a central role in the derivation and analysis of online learning algorithms. The results, however, fail to be extended to the non-convex settings, while non-convexity is necessitated by a large number of recent applications. The Online Non-Convex Learning (ønco) problem generalizes the classic Online Convex Optimization (øco) framework by relaxing the convexity assumption on the cost function (to a Lipschitz continuous function) and the decision set. The state-of-the-art result for the ønco demonstrates that the classic online exponential weighting algorithm attains a sublinear regret of $O(\sqrtTłog T )$. The regret lower bound for the øco, however, is $Ømega(\sqrtT )$, and to the best of our knowledge, there is no result in the context of the ønco problem achieving the same bound. This paper proposes the Online Recursive Weighting (\rw) algorithm with regret of $O(\sqrtT )$, matching the tight regret lower bound for the øco problem, and fills the regret gap between the state-of-the-art results in the online convex and non-convex optimization problems.

References

  1. J. Abernethy, E. Hazan, and A. Rakhlin. Competing in the dark: An efficient algorithm for bandit linear optimization. In COLT, pages 263--274, 2008.Google ScholarGoogle Scholar
  2. A. Agarwal, O. Dekel, and L. Xiao. Optimal algorithms for online convex optimization with multi-point bandit feedback. In COLT, pages 28--40. Citeseer, 2010.Google ScholarGoogle Scholar
  3. R. Agrawal. The continuum-armed bandit problem. SIAM journal on control and optimization, 33(6):1926--1951, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Akbari, B. Gharesifard, and T. Linder. Distributed online convex optimization on time-varying directed graphs. IEEE Transactions on Control of Network Systems, 2015.Google ScholarGoogle Scholar
  5. D. Ardia, K. Boudt, P. Carl, K. M. Mullen, and B. Peterson. Differential evolution (deoptim) for non-convex portfolio optimization. 2010.Google ScholarGoogle Scholar
  6. P. Auer, N. Cesa-Bianchi, Y. Freund, and R. Schapire. The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48--77, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Awerbuch and R. Kleinberg. Online linear optimization and adaptive routing. Journal of Computer and System Sciences, 74(1):97--114, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Cesa-Bianchi, Y. Freund, D. Haussler, D. P. Helmbold, R. E. Schapire, and M. K. Warmuth. How to use expert advice. Journal of the ACM (JACM), 44(3):427--485, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Combes, S. Magureanu, A. Proutiere, and C. Laroche. Learning to rank: Regret lower bounds and efficient algorithms. ACM SIGMETRICS Performance Evaluation Review, 43(1):231--244, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. Cover. Universal portfolios. Mathematical finance, 1(1):1--29, 1991.Google ScholarGoogle Scholar
  11. O. Dekel, R. Eldan, and T. Koren. Bandit smooth convex optimization: Improving the bias-variance tradeoff. In Advances in Neural Information Processing Systems (NIPS), pages 2926--2934, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Ertekin, L. Bottou, and C. Giles. Non-convex online support vector machines. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(2):368--381, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Flaxman, A. Kalai, and H. McMahan. Online convex optimization in the bandit setting: gradient descent without a gradient. In Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms (SODA), pages 385--394, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Freund and R. E. Schapire. A desicion-theoretic generalization of on-line learning and an application to boosting. In European conference on computational learning theory, pages 23--37. Springer, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Gasso, L. Pappaioannou, M. Spivak, and L. Bottou. Batch and online learning algorithms for nonconvex neymanpearson classification. ACM Transactions on Intelligent Systems and Technology, 2(3):28, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. J. Grove, N. Littlestone, and D. Schuurmans. General convergence results for linear discriminant updates. Machine Learning, 43(3):173--210, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Guzella and W. Caminhas. A review of machine learning approaches to spam filtering. Expert Systems with Applications, 36(7):10206--10222, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. Hazan. Introduction to online convex optimization. Foundations and Trends in Optimization, 2(3--4):157--325, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. Hazan and S. Kale. Online submodular minimization. Journal of Machine Learning Research, 13(Oct):2903--2922, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. Hazan and S. Kale. Beyond the regret minimization barrier: optimal algorithms for stochastic strongly-convex optimization. Journal of Machine Learning Research, 15(1):2489--2512, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. Hazan and K. Levy. Bandit convex optimization: Towards tight bounds. In Advances in Neural Information Processing Systems (NIPS), pages 784--792, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. E. Hazan and Y. Li. An optimal algorithm for bandit convex optimization. arXiv preprint arXiv:1603.04350.Google ScholarGoogle Scholar
  23. S. Hosseini, A. Chapman, and M. Mesbahi. Online distributed convex optimization on dynamic networks. IEEE Transactions on Automatic Control, 61(11):3545--3550, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  24. P. Jain, B. Kulis, I. S. Dhillon, and K. Grauman. Online metric learning and fast similarity search. In Advances in neural information processing systems, pages 761--768, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Kalai and S. Vempala. Efficient algorithms for universal portfolios. Journal of Machine Learning Research, 3(Nov):423-- 440, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Kivinen and M. K. Warmuth. Relative loss bounds for multidimensional regression problems. In Advances in neural information processing systems (NIPS), pages 287--293, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. Kleinberg and A. Slivkins. Sharp dichotomies for regret minimization in metric spaces. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 827--846. Society for Industrial and Applied Mathematics, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Kleinberg, A. Slivkins, and E. Upfal. Multi-armed bandits in metric spaces. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pages 681--690. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. Kleinberg, A. Slivkins, and E. Upfal. Bandits and experts in metric spaces. arXiv preprint arXiv:1312.1277, 2013.Google ScholarGoogle Scholar
  30. R. D. Kleinberg. Nearly tight bounds for the continuum-armed bandit problem. In Advances in Neural Information Processing Systems, pages 697--704, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. W. Krichene, M. Balandat, C. Tomlin, and A. Bayen. The hedge algorithm on a continuum. In the 32nd International Conference on Machine Learning (ICML-15), pages 824--832, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. Krokhmal, J. Palmquist, and S. Uryasev. Portfolio optimization with conditional value-at-risk objective and constraints. Journal of risk, 4:43--68, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  33. S. Lee, A. Nedich, and M. Raginsky. Stochastic dual averaging for decentralized online optimization on time-varying communication graphs. IEEE Transactions on Automatic Control, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  34. N. Littlestone and M. Warmuth. The weighted majority algorithm. Information and computation, 108(2):212--261, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Magureanu, A. Proutiere, M. Isaksson, and B. Zhang. Online learning of optimally diverse rankings. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 1(2):32, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. O.-A. Maillard and R. Munos. Online learning in adversarial lipschitz environments. Machine Learning and Knowledge Discovery in Databases, pages 305--320, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. L. Mason, P. L. Bartlett, and J. Baxter. Improved generalization through explicit optimization of margins. Machine Learning, 38(3):243--255, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Pandey, D. Agarwal, D. Chakrabarti, and V. Josifovski. Bandits for taxonomies: A model-based approach. In Proceedings of the 2007 SIAM International Conference on Data Mining, pages 216--227. SIAM, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  39. A. Rakhlin, O. Shamir, and K. Sridharan. Making gradient descent optimal for strongly convex stochastic optimization. In Proceedings of the 29th International Conference on Machine Learning (ICML-12), pages 449--456, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. H. Robbins and S. Monro. A stochastic approximation method. The annals of mathematical statistics, pages 400--407, 1951.Google ScholarGoogle Scholar
  41. A. Saha and A. Tewari. Improved regret guarantees for online smooth convex optimization with bandit feedback. In AISTATS, pages 636--642, 2011.Google ScholarGoogle Scholar
  42. D. Sculley and G. Wachman. Relaxed online svms for spam filtering. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 415--422, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. S. Shahrampour and A. Jadbabaie. Distributed online optimization in dynamic environments using mirror descent. IEEE Transactions on Automatic Control, 2017.Google ScholarGoogle Scholar
  44. O. Shamir and T. Zhang. Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes. In International Conference on Machine Learning, pages 71--79, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. M. S. Talebi, Z. Zou, R. Combes, A. Proutiere, and M. Johansson. Stochastic online shortest path routing: The value of feedback. IEEE Transactions on Automatic Control, 2017.Google ScholarGoogle Scholar
  46. S. Uryasev. Conditional value-at-risk: Optimization algorithms and applications. In Computational Intelligence for Financial Engineering, 2000.(CIFEr) Proceedings of the IEEE/IAFE/INFORMS 2000 Conference on, pages 49--57. IEEE, 2000.Google ScholarGoogle Scholar
  47. F. Wauthier, M. Jordan, and N. Jojic. Efficient ranking from pairwise comparisons. In International Conference on Machine Learning (ICML), pages 109--117, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. W. Wong and C. Sung. Robust convergence of low-data rate-distributed controllers. IEEE transactions on automatic control, 49(1):82--87, 2004.Google ScholarGoogle Scholar
  49. H. H. Zhang, J. Ahn, X. Lin, and C. Park. Gene selection using support vector machines with non-convex penalty. bioinformatics, 22(1):88--95, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. L. Zhang, T. Yang, R. Jin, and Z. Zhou. Online bandit learning for a special class of non-convex losses. In AAAI, pages 3158--3164, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning (ICML), pages 928--936, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An Optimal Algorithm for Online Non-Convex Learning

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!