skip to main content
10.1145/3511808.3557150acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Towards Practical Large Scale Non-Linear Semi-Supervised Learning with Balancing Constraints

Authors Info & Claims
Published:17 October 2022Publication History

ABSTRACT

Semi-Supervised Support Vector Machine (S3VM) is one of the most popular methods for semi-supervised learning, which can make full use of plentiful, easily accessible unlabeled data. Balancing constraint is normally enforced in S3VM (denoted as BCS3VM) to avoid the harmful solution which assigns all or most of the unlabeled examples to one same label. Traditionally, non-linear BCS3VM is solved by sequential minimal optimization algorithm. Recently, a novel incremental learning algorithm (IL-BCS3VM) was proposed to scale up BCS3VM further. However, IL-BCS3VM needs to calculate the inverse of the linear system related to the support matrix, making the algorithm not scalable enough. To make BCS3VM be more practical in large-scale problems, in this paper, we propose a new scalable BCS3VM with accelerated triply stochastic gradients (denoted as TSG-BCS3VM). Specifically, to make the balancing constraint handle different proportions of positive and negative samples among labeled and unlabeled data, we propose a soft balancing constraint for S3VM. To make the algorithm scalable, we generate triply stochastic gradients by sampling labeled and unlabeled samples as well as the random features to update the solutions, where Quasi-Monte Carlo (QMC) sampling is utilized on random features to accelerate TSG-BCS3VM further. Our theoretical analysis shows that the convergence rate is O(1/√T) for both diminishing and constant learning rates where T is the number of iterations, which is much better than previous results thanks to the QMC method. Empirical results on a variety of benchmark datasets show that our algorithm not only has a good generalization performance but also enjoys better scalability than existing BCS3VM algorithms.

References

  1. Haim Avron, Vikas Sindhwani, Jiyan Yang, and Michael W. Mahoney. 2016. Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels. Journal of Machine Learning Research 17, 120 (2016), 1--38. http://jmlr.org/papers/v17/14--538.htmlGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  2. Andrew Carlson, Justin Betteridge, Richard C. Wang, Estevam R. Hruschka, and Tom M. Mitchell. 2010. Coupled Semi-Supervised Learning for Information Extraction. In Proceedings of the Third ACM International Conference on Web Search and Data Mining (New York, New York, USA) (WSDM '10). Association for Computing Machinery, New York, NY, USA, 101--110. https://doi.org/10.1145/1718487.1718501Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Luigi Carratino, Alessandro Rudi, and Lorenzo Rosasco. 2018. Learning with sgd and random features. In Advances in Neural Information Processing Systems. 10213--10224.Google ScholarGoogle Scholar
  4. Olivier Chapelle. 2007. Training a Support Vector Machine in the Primal. Neural Computation 19, 5 (2007), 1155--1178. https://doi.org/10.1162/neco.2007.19.5.1155Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Olivier Chapelle, Mingmin Chi, and Alexander Zien. 2006. A Continuation Method for Semi-Supervised SVMs. In Proceedings of the 23rd International Conference on Machine Learning (Pittsburgh, Pennsylvania, USA) (ICML '06). Association for Computing Machinery, New York, NY, USA, 185--192. https://doi.org/10.1145/1143844.1143868Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Olivier Chapelle, Vikas Sindhwani, and Sathiya Keerthi. 2006. Branch and bound for semi-supervised support vector machines. Advances in neural information processing systems 19 (2006).Google ScholarGoogle Scholar
  7. Olivier Chapelle, Vikas Sindhwani, and Sathiya S Keerthi. 2008. Optimization techniques for semi-supervised support vector machines. Journal of Machine Learning Research 9, Feb (2008), 203--233.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Olivier Chapelle and Alexander Zien. 2005. Semi-supervised classification by low density separation. In AISTATS, Vol. 2005. Citeseer, 57--64.Google ScholarGoogle Scholar
  9. Ronan Collobert, Fabian Sinz, Jason Weston, and Léon Bottou. 2006. Large scale transductive SVMs. Journal of Machine Learning Research 7, Aug (2006), 1687--1712.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Bo Dai, Bo Xie, Niao He, Yingyu Liang, Anant Raj, Maria-Florina Balcan, and Le Song. 2014. Scalable kernel methods via doubly stochastic gradients. arXiv preprint arXiv:1407.5599 (2014).Google ScholarGoogle Scholar
  11. Xiang Geng, Bin Gu, Xiang Li, Wanli Shi, Guansheng Zheng, and Heng Huang. 2019. Scalable semi-supervised SVM via triply stochastic gradients. arXiv preprint arXiv:1907.11584 (2019).Google ScholarGoogle Scholar
  12. Saeed Ghadimi and Guanghui Lan. 2013. Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization 23, 4 (2013), 2341--2368.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Fabian Gieseke, Antti Airola, Tapio Pahikkala, and Oliver Kramer. 2012. Sparse Quasi-Newton Optimization for Semi-supervised Support Vector Machines. In ICPRAM (1). 45--54.Google ScholarGoogle Scholar
  14. Bin Gu, Zhouyuan Huo, Cheng Deng, and Heng Huang. 2018. Faster Derivative- Free Stochastic Algorithm for Shared Memory Machines. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, 1812--1821. https://proceedings.mlr.press/v80/gu18a.htmlGoogle ScholarGoogle Scholar
  15. Bin Gu, Yingying Shan, Xiang Geng, and Guansheng Zheng. 2018. Accelerated Asynchronous Greedy Coordinate Descent Algorithm for SVMs. In IJCAI. 2170--2176.Google ScholarGoogle Scholar
  16. Bin Gu, Xiao-Tong Yuan, Songcan Chen, and Heng Huang. 2018. NewIncremental Learning Algorithm for Semi-Supervised Support Vector Machine. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1475--1484.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Matthieu Guillaumin, Jakob Verbeek, and Cordelia Schmid. 2010. Multimodal semi-supervised learning for image classification. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 902--909. https://doi.org/10.1109/CVPR.2010.5540120Google ScholarGoogle ScholarCross RefCross Ref
  18. Einar Hille. 1972. Introduction to general theory of reproducing kernels. The Rocky Mountain Journal of Mathematics 2, 3 (1972), 321--368.Google ScholarGoogle ScholarCross RefCross Ref
  19. Thomas Hofmann, Bernhard Schölkopf, and Alexander J. Smola. 2008. Kernel methods in machine learning. The annals of statistics 36, 3 (2008), 1171--1220.Google ScholarGoogle Scholar
  20. Amir Hussain and Erik Cambria. 2018. Semi-supervised learning for big social data analysis. Neurocomputing 275 (2018), 1662--1673.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Thorsten Joachims. 1999. Transductive inference for text classification using support vector machines. In Icml, Vol. 99. 200--209.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Trung Le, Phuong Duong, Mi Dinh, Tu Dinh Nguyen, Vu Nguyen, and Dinh Q Phung. 2016. Budgeted Semi-supervised Support Vector Machine. In UAI.Google ScholarGoogle Scholar
  23. Yu-Feng Li, James Kwok, and Zhi-Hua Zhou. 2010. Cost-sensitive semi-supervised support vector machine. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 24.Google ScholarGoogle ScholarCross RefCross Ref
  24. Zhiyun Lu, Avner May, Kuan Liu, Alireza Bagheri Garakani, Dong Guo, Aurélien Bellet, Linxi Fan, Michael Collins, Brian Kingsbury, Michael Picheny, and Fei Sha. 2015. How to Scale Up Kernel Methods to Be As Good As Deep Neural Nets. arXiv:1411.4000 [cs.LG]Google ScholarGoogle Scholar
  25. Ali Rahimi, Benjamin Recht, et al. 2007. Random Features for Large-Scale Kernel Machines. In NIPS, Vol. 3. Citeseer, 5.Google ScholarGoogle Scholar
  26. Bernhard Schölkopf, Alexander J Smola, Francis Bach, et al. 2002. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press.Google ScholarGoogle Scholar
  27. Vikas Sindhwani and S. Sathiya Keerthi. 2006. Large Scale Semi-Supervised Linear SVMs. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Seattle, Washington, USA) (SIGIR '06). Association for Computing Machinery, New York, NY, USA, 477--484. https://doi.org/10.1145/1148170.1148253Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Fabian Sinz and Matteo Roffilli. 2012. UniverSVM. https://github.com/fabiansinz/UniverSVMGoogle ScholarGoogle Scholar
  29. Fabian H. Sinz, Olivier Chapelle, Alekh Agarwal, and Bernhard Schölkopf. 2007. An Analysis of Inference with the Universum. In Proceedings of the 20th International Conference on Neural Information Processing Systems (Vancouver, British Columbia, Canada) (NIPS'07). Curran Associates Inc., Red Hook, NY, USA, 1369--1376.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Xilan Tian, Gilles Gasso, and Stéphane Canu. 2012. A multiple kernel framework for inductive semi-supervised SVM learning. Neurocomputing 90 (2012), 46--58.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jean-Francois Ton, Seth Flaxman, Dino Sejdinovic, and Samir Bhatt. 2018. Spatial mapping with Gaussian processes and nonstationary Fourier features. Spatial statistics 28 (2018), 59--78.Google ScholarGoogle Scholar
  32. Grace Wahba. 1990. Spline models for observational data. SIAM.Google ScholarGoogle Scholar
  33. Jason Weston, Ronan Collobert, Fabian Sinz, Léon Bottou, and Vladimir Vapnik. 2006. Inference with the Universum. In Proceedings of the 23rd International Conference on Machine Learning (Pittsburgh, Pennsylvania, USA) (ICML '06). Association for Computing Machinery, New York, NY, USA, 1009--1016. https://doi.org/10.1145/1143844.1143971Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Christopher Williams and Matthias Seeger. 2001. Using the Nyström Method to Speed Up Kernel Machines. In Advances in Neural Information Processing Systems 13. MIT Press, 682--688.Google ScholarGoogle Scholar
  35. Shuyang Yu, Bin Gu, Kunpeng Ning, Haiyan Chen, Jian Pei, and Heng Huang. 2019. Tackle Balancing Constraint for Incremental Semi-Supervised Support Vector Learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery amp; Data Mining (Anchorage, AK, USA) (KDD '19). Association for Computing Machinery, New York, NY, USA, 1587--1595. https://doi.org/10.1145/3292500.3330962Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Towards Practical Large Scale Non-Linear Semi-Supervised Learning with Balancing Constraints

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management
      October 2022
      5274 pages
      ISBN:9781450392365
      DOI:10.1145/3511808
      • General Chairs:
      • Mohammad Al Hasan,
      • Li Xiong

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 October 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CIKM '22 Paper Acceptance Rate621of2,257submissions,28%Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    • Article Metrics

      • Downloads (Last 12 months)36
      • Downloads (Last 6 weeks)7

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader