skip to main content
10.1145/3511808.3557234acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

An Accelerated Doubly Stochastic Gradient Method with Faster Explicit Model Identification

Authors Info & Claims
Published:17 October 2022Publication History

ABSTRACT

Sparsity regularized loss minimization problems play an important role in various fields including machine learning, data mining, and modern statistics. Proximal gradient descent method and coordinate descent method are the most popular approaches to solving the minimization problem. Although existing methods can achieve implicit model identification, aka support set identification, in a finite number of iterations, these methods still suffer from huge computational costs and memory burdens in high-dimensional scenarios. The reason is that the support set identification in these methods is implicit and thus cannot explicitly identify the low-complexity structure in practice, namely, they cannot discard useless coefficients of the associated features to achieve algorithmic acceleration via dimension reduction. To address this challenge, we propose a novel accelerated doubly stochastic gradient descent (ADSGD) method for sparsity regularized loss minimization problems, which can reduce the number of block iterations by eliminating inactive coefficients during the optimization process and eventually achieve faster explicit model identification and improve the algorithm efficiency. Theoretically, we first prove that ADSGD can achieve a linear convergence rate and lower overall computational complexity. More importantly, we prove that ADSGD can achieve a linear rate of explicit model identification. Numerically, experimental results on benchmark datasets confirm the efficiency of our proposed method.

References

  1. Runxue Bao, Bin Gu, and Heng Huang. 2020. Fast OSCAR and OWL regression via safe screening rules. In International Conference on Machine Learning. PMLR, 653--663.Google ScholarGoogle Scholar
  2. Runxue Bao, Bin Gu, and Heng Huang. 2022a. An Accelerated Doubly Stochastic Gradient Method with Faster Explicit Model Identification. arXiv preprint arXiv:2208.06058 (2022).Google ScholarGoogle Scholar
  3. Runxue Bao, Xidong Wu, Wenhan Xian, and Heng Huang. 2022b. Doubly Sparse Asynchronous Learning for Stochastic Composite Optimization. In IJCAI.Google ScholarGoogle Scholar
  4. Heinz H Bauschke, Patrick L Combettes, et al. 2011. Convex analysis and monotone operator theory in Hilbert spaces. Vol. 408. Springer.Google ScholarGoogle Scholar
  5. Patrick L Combettes and Valérie R Wajs. 2005. Signal recovery by proximal forward-backward splitting. Multiscale Modeling & Simulation, Vol. 4, 4 (2005), 1168--1200.Google ScholarGoogle ScholarCross RefCross Ref
  6. Cong D Dang and Guanghui Lan. 2015. Stochastic block mirror descent methods for nonsmooth and stochastic optimization. SIAM Journal on Optimization, Vol. 25, 2 (2015), 856--881.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Aaron Defazio, Francis Bach, and Simon Lacoste-Julien. 2014. SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In Advances in neural information processing systems. 1646--1654.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. John Duchi and Yoram Singer. 2009. Efficient online and batch learning using forward backward splitting. The Journal of Machine Learning Research, Vol. 10 (2009), 2899--2934.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Celestine Dünner, Simone Forte, Martin Takác, and Martin Jaggi. 2016. Primal-dual rates and certificates. In International Conference on Machine Learning. PMLR, 783--792.Google ScholarGoogle Scholar
  10. Olivier Fercoq, Alexandre Gramfort, and Joseph Salmon. 2015. Mind the duality gap: safer rules for the Lasso. In International Conference on Machine Learning. 333--342.Google ScholarGoogle Scholar
  11. Tyler Johnson and Carlos Guestrin. 2015. Blitz: A principled meta-algorithm for scaling sparse optimization. In International Conference on Machine Learning. 1171--1179.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A Ya Kruger. 2003. On fréchet subdifferentials. Journal of Mathematical Sciences, Vol. 116, 3 (2003), 3325--3358.Google ScholarGoogle ScholarCross RefCross Ref
  13. Adrian S Lewis. 2002. Active sets, nonsmoothness, and sensitivity. SIAM Journal on Optimization, Vol. 13, 3 (2002), 702--725.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Adrian S Lewis and Stephen J Wright. 2016. A proximal method for composite minimization. Mathematical Programming, Vol. 158, 1 (2016), 501--546.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jingwei Liang, Jalal Fadili, and Gabriel Peyré. 2017. Activity Identification and Local Linear Convergence of Forward--Backward-type Methods. SIAM Journal on Optimization, Vol. 27, 1 (2017), 408--437.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Pierre-Louis Lions and Bertrand Mercier. 1979. Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal., Vol. 16, 6 (1979), 964--979.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Boris S Mordukhovich, Nguyen Mau Nam, and ND Yen. 2006. Fréchet subdifferential calculus and optimality conditions in nondifferentiable programming. Optimization, Vol. 55, 5--6 (2006), 685--708.Google ScholarGoogle ScholarCross RefCross Ref
  18. Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, and Joseph Salmon. 2017. Gap safe screening rules for sparsity enforcing penalties. The Journal of Machine Learning Research, Vol. 18, 1 (2017), 4671--4703.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Eugene Ndiaye, Olivier Fercoq, and Joseph Salmon. 2020. Screening rules and its complexity for active set identification. arXiv preprint arXiv:2009.02709 (2020).Google ScholarGoogle Scholar
  20. Andrew Y Ng. 2004. Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In Proceedings of the twenty-first international conference on Machine learning. 78.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Clarice Poon, Jingwei Liang, and Carola Schoenlieb. 2018. Local convergence properties of SAGA/Prox-SVRG and acceleration. In International Conference on Machine Learning. PMLR, 4124--4132.Google ScholarGoogle Scholar
  22. Alain Rakotomamonjy, Gilles Gasso, and Joseph Salmon. 2019. Screening rules for Lasso with non-convex Sparse Regularizers. In International Conference on Machine Learning. 5341--5350.Google ScholarGoogle Scholar
  23. Peter Richtárik and Martin Takávc. 2014. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Mathematical Programming, Vol. 144, 1--2 (2014), 1--38.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Shai Shalev-Shwartz and Ambuj Tewari. 2011. Stochastic methods for l 1-regularized loss minimization. The Journal of Machine Learning Research, Vol. 12 (2011), 1865--1892.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Zebang Shen, Hui Qian, Tongzhou Mu, and Chao Zhang. 2017. Accelerated doubly stochastic gradient algorithm for large-scale empirical risk minimization. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2715--2721.Google ScholarGoogle ScholarCross RefCross Ref
  26. Noah Simon, Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2013. A sparse-group lasso. Journal of computational and graphical statistics, Vol. 22, 2 (2013), 231--245.Google ScholarGoogle ScholarCross RefCross Ref
  27. Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), Vol. 58, 1 (1996), 267--288.Google ScholarGoogle ScholarCross RefCross Ref
  28. Ryan J Tibshirani et al. 2013. The lasso problem and uniqueness. Electronic Journal of statistics, Vol. 7 (2013), 1456--1490.Google ScholarGoogle Scholar
  29. Huahua Wang and Arindam Banerjee. 2014. Randomized block coordinate descent for online and stochastic optimization. arXiv preprint arXiv:1407.0107 (2014).Google ScholarGoogle Scholar
  30. Lin Xiao and Tong Zhang. 2014. A proximal stochastic gradient method with progressive variance reduction. SIAM Journal on Optimization, Vol. 24, 4 (2014), 2057--2075.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jianhua Xu, Jiali Liu, Jing Yin, and Chengyu Sun. 2016. A multi-label feature extraction algorithm via maximizing featurxue variance and feature-label dependence simultaneously. Knowledge-Based Systems, Vol. 98 (2016), 172--184.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ming Yuan and Yi Lin. 2006. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 68, 1 (2006), 49--67.Google ScholarGoogle ScholarCross RefCross Ref
  33. Tuo Zhao, Mo Yu, Yiming Wang, Raman Arora, and Han Liu. 2014. Accelerated mini-batch randomized block coordinate descent method. Advances in neural information processing systems, Vol. 27 (2014), 3329--3337.Google ScholarGoogle Scholar
  34. Ji Zhu, Saharon Rosset, Robert Tibshirani, and Trevor J Hastie. 2003. 1-norm support vector machines. In Advances in neural information processing systems. Citeseer, None.Google ScholarGoogle Scholar
  35. Hui Zou and Trevor Hastie. 2005. Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology), Vol. 67, 2 (2005), 301--320.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. An Accelerated Doubly Stochastic Gradient Method with Faster Explicit Model Identification

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management
          October 2022
          5274 pages
          ISBN:9781450392365
          DOI:10.1145/3511808
          • General Chairs:
          • Mohammad Al Hasan,
          • Li Xiong

          Copyright © 2022 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 October 2022

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          CIKM '22 Paper Acceptance Rate621of2,257submissions,28%Overall Acceptance Rate1,861of8,427submissions,22%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader