ABSTRACT
Semi-Supervised Support Vector Machine (S3VM) is one of the most popular methods for semi-supervised learning, which can make full use of plentiful, easily accessible unlabeled data. Balancing constraint is normally enforced in S3VM (denoted as BCS3VM) to avoid the harmful solution which assigns all or most of the unlabeled examples to one same label. Traditionally, non-linear BCS3VM is solved by sequential minimal optimization algorithm. Recently, a novel incremental learning algorithm (IL-BCS3VM) was proposed to scale up BCS3VM further. However, IL-BCS3VM needs to calculate the inverse of the linear system related to the support matrix, making the algorithm not scalable enough. To make BCS3VM be more practical in large-scale problems, in this paper, we propose a new scalable BCS3VM with accelerated triply stochastic gradients (denoted as TSG-BCS3VM). Specifically, to make the balancing constraint handle different proportions of positive and negative samples among labeled and unlabeled data, we propose a soft balancing constraint for S3VM. To make the algorithm scalable, we generate triply stochastic gradients by sampling labeled and unlabeled samples as well as the random features to update the solutions, where Quasi-Monte Carlo (QMC) sampling is utilized on random features to accelerate TSG-BCS3VM further. Our theoretical analysis shows that the convergence rate is O(1/√T) for both diminishing and constant learning rates where T is the number of iterations, which is much better than previous results thanks to the QMC method. Empirical results on a variety of benchmark datasets show that our algorithm not only has a good generalization performance but also enjoys better scalability than existing BCS3VM algorithms.
- Haim Avron, Vikas Sindhwani, Jiyan Yang, and Michael W. Mahoney. 2016. Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels. Journal of Machine Learning Research 17, 120 (2016), 1--38. http://jmlr.org/papers/v17/14--538.htmlGoogle Scholar
Digital Library
- Andrew Carlson, Justin Betteridge, Richard C. Wang, Estevam R. Hruschka, and Tom M. Mitchell. 2010. Coupled Semi-Supervised Learning for Information Extraction. In Proceedings of the Third ACM International Conference on Web Search and Data Mining (New York, New York, USA) (WSDM '10). Association for Computing Machinery, New York, NY, USA, 101--110. https://doi.org/10.1145/1718487.1718501Google Scholar
Digital Library
- Luigi Carratino, Alessandro Rudi, and Lorenzo Rosasco. 2018. Learning with sgd and random features. In Advances in Neural Information Processing Systems. 10213--10224.Google Scholar
- Olivier Chapelle. 2007. Training a Support Vector Machine in the Primal. Neural Computation 19, 5 (2007), 1155--1178. https://doi.org/10.1162/neco.2007.19.5.1155Google Scholar
Digital Library
- Olivier Chapelle, Mingmin Chi, and Alexander Zien. 2006. A Continuation Method for Semi-Supervised SVMs. In Proceedings of the 23rd International Conference on Machine Learning (Pittsburgh, Pennsylvania, USA) (ICML '06). Association for Computing Machinery, New York, NY, USA, 185--192. https://doi.org/10.1145/1143844.1143868Google Scholar
Digital Library
- Olivier Chapelle, Vikas Sindhwani, and Sathiya Keerthi. 2006. Branch and bound for semi-supervised support vector machines. Advances in neural information processing systems 19 (2006).Google Scholar
- Olivier Chapelle, Vikas Sindhwani, and Sathiya S Keerthi. 2008. Optimization techniques for semi-supervised support vector machines. Journal of Machine Learning Research 9, Feb (2008), 203--233.Google Scholar
Digital Library
- Olivier Chapelle and Alexander Zien. 2005. Semi-supervised classification by low density separation. In AISTATS, Vol. 2005. Citeseer, 57--64.Google Scholar
- Ronan Collobert, Fabian Sinz, Jason Weston, and Léon Bottou. 2006. Large scale transductive SVMs. Journal of Machine Learning Research 7, Aug (2006), 1687--1712.Google Scholar
Digital Library
- Bo Dai, Bo Xie, Niao He, Yingyu Liang, Anant Raj, Maria-Florina Balcan, and Le Song. 2014. Scalable kernel methods via doubly stochastic gradients. arXiv preprint arXiv:1407.5599 (2014).Google Scholar
- Xiang Geng, Bin Gu, Xiang Li, Wanli Shi, Guansheng Zheng, and Heng Huang. 2019. Scalable semi-supervised SVM via triply stochastic gradients. arXiv preprint arXiv:1907.11584 (2019).Google Scholar
- Saeed Ghadimi and Guanghui Lan. 2013. Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization 23, 4 (2013), 2341--2368.Google Scholar
Digital Library
- Fabian Gieseke, Antti Airola, Tapio Pahikkala, and Oliver Kramer. 2012. Sparse Quasi-Newton Optimization for Semi-supervised Support Vector Machines. In ICPRAM (1). 45--54.Google Scholar
- Bin Gu, Zhouyuan Huo, Cheng Deng, and Heng Huang. 2018. Faster Derivative- Free Stochastic Algorithm for Shared Memory Machines. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, 1812--1821. https://proceedings.mlr.press/v80/gu18a.htmlGoogle Scholar
- Bin Gu, Yingying Shan, Xiang Geng, and Guansheng Zheng. 2018. Accelerated Asynchronous Greedy Coordinate Descent Algorithm for SVMs. In IJCAI. 2170--2176.Google Scholar
- Bin Gu, Xiao-Tong Yuan, Songcan Chen, and Heng Huang. 2018. NewIncremental Learning Algorithm for Semi-Supervised Support Vector Machine. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1475--1484.Google Scholar
Digital Library
- Matthieu Guillaumin, Jakob Verbeek, and Cordelia Schmid. 2010. Multimodal semi-supervised learning for image classification. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 902--909. https://doi.org/10.1109/CVPR.2010.5540120Google Scholar
Cross Ref
- Einar Hille. 1972. Introduction to general theory of reproducing kernels. The Rocky Mountain Journal of Mathematics 2, 3 (1972), 321--368.Google Scholar
Cross Ref
- Thomas Hofmann, Bernhard Schölkopf, and Alexander J. Smola. 2008. Kernel methods in machine learning. The annals of statistics 36, 3 (2008), 1171--1220.Google Scholar
- Amir Hussain and Erik Cambria. 2018. Semi-supervised learning for big social data analysis. Neurocomputing 275 (2018), 1662--1673.Google Scholar
Digital Library
- Thorsten Joachims. 1999. Transductive inference for text classification using support vector machines. In Icml, Vol. 99. 200--209.Google Scholar
Digital Library
- Trung Le, Phuong Duong, Mi Dinh, Tu Dinh Nguyen, Vu Nguyen, and Dinh Q Phung. 2016. Budgeted Semi-supervised Support Vector Machine. In UAI.Google Scholar
- Yu-Feng Li, James Kwok, and Zhi-Hua Zhou. 2010. Cost-sensitive semi-supervised support vector machine. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 24.Google Scholar
Cross Ref
- Zhiyun Lu, Avner May, Kuan Liu, Alireza Bagheri Garakani, Dong Guo, Aurélien Bellet, Linxi Fan, Michael Collins, Brian Kingsbury, Michael Picheny, and Fei Sha. 2015. How to Scale Up Kernel Methods to Be As Good As Deep Neural Nets. arXiv:1411.4000 [cs.LG]Google Scholar
- Ali Rahimi, Benjamin Recht, et al. 2007. Random Features for Large-Scale Kernel Machines. In NIPS, Vol. 3. Citeseer, 5.Google Scholar
- Bernhard Schölkopf, Alexander J Smola, Francis Bach, et al. 2002. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press.Google Scholar
- Vikas Sindhwani and S. Sathiya Keerthi. 2006. Large Scale Semi-Supervised Linear SVMs. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Seattle, Washington, USA) (SIGIR '06). Association for Computing Machinery, New York, NY, USA, 477--484. https://doi.org/10.1145/1148170.1148253Google Scholar
Digital Library
- Fabian Sinz and Matteo Roffilli. 2012. UniverSVM. https://github.com/fabiansinz/UniverSVMGoogle Scholar
- Fabian H. Sinz, Olivier Chapelle, Alekh Agarwal, and Bernhard Schölkopf. 2007. An Analysis of Inference with the Universum. In Proceedings of the 20th International Conference on Neural Information Processing Systems (Vancouver, British Columbia, Canada) (NIPS'07). Curran Associates Inc., Red Hook, NY, USA, 1369--1376.Google Scholar
Digital Library
- Xilan Tian, Gilles Gasso, and Stéphane Canu. 2012. A multiple kernel framework for inductive semi-supervised SVM learning. Neurocomputing 90 (2012), 46--58.Google Scholar
Digital Library
- Jean-Francois Ton, Seth Flaxman, Dino Sejdinovic, and Samir Bhatt. 2018. Spatial mapping with Gaussian processes and nonstationary Fourier features. Spatial statistics 28 (2018), 59--78.Google Scholar
- Grace Wahba. 1990. Spline models for observational data. SIAM.Google Scholar
- Jason Weston, Ronan Collobert, Fabian Sinz, Léon Bottou, and Vladimir Vapnik. 2006. Inference with the Universum. In Proceedings of the 23rd International Conference on Machine Learning (Pittsburgh, Pennsylvania, USA) (ICML '06). Association for Computing Machinery, New York, NY, USA, 1009--1016. https://doi.org/10.1145/1143844.1143971Google Scholar
Digital Library
- Christopher Williams and Matthias Seeger. 2001. Using the Nyström Method to Speed Up Kernel Machines. In Advances in Neural Information Processing Systems 13. MIT Press, 682--688.Google Scholar
- Shuyang Yu, Bin Gu, Kunpeng Ning, Haiyan Chen, Jian Pei, and Heng Huang. 2019. Tackle Balancing Constraint for Incremental Semi-Supervised Support Vector Learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery amp; Data Mining (Anchorage, AK, USA) (KDD '19). Association for Computing Machinery, New York, NY, USA, 1587--1595. https://doi.org/10.1145/3292500.3330962Google Scholar
Digital Library
Index Terms
- Towards Practical Large Scale Non-Linear Semi-Supervised Learning with Balancing Constraints
Recommendations
Tackle Balancing Constraint for Incremental Semi-Supervised Support Vector Learning
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningSemi-Supervised Support Vector Machine (S3VM) is one of the most popular methods for semi-supervised learning. To avoid the trivial solution of classifying all the unlabeled examples to a same class, balancing constraint is often used with S3VM (denoted ...
An overview on semi-supervised support vector machine
Support vector machine (SVM) is a machine learning method based on statistical learning theory. It has a lot of advantages, such as solid theoretical foundation, global optimization, the sparsity of the solution, nonlinear and generalization. The ...





Comments