Abstract
Sparse subspace clustering (SSC) is a classical method to cluster data with specific subspace structure for each group. It has many desirable theoretical properties and has been shown to be effective in various applications. However, under the condition of a large-scale dataset, learning the sparse sample affinity graph is computationally expensive. To tackle the computation time cost challenge, we develop a memory-efficient parallel framework for computing SSC via an alternating direction method of multiplier (ADMM) algorithm. The proposed framework partitions the data matrix into column blocks and then decomposes the original problem into parallel multivariate Lasso regression subproblems and samplewise operations. The proposed method allows us to allocate multiple cores/machines for the processing of individual column blocks. We propose a stochastic optimization algorithm to minimize the objective function. Experimental results on real-world datasets demonstrate that the proposed blockwise ADMM framework is substantially more efficient than its matrix counterpart used by SSC, without sacrificing performance in applications. Moreover, our approach is directly applicable to parallel neighborhood selection for Gaussian graphical models structure estimation.
- Yoshua Bengio. 2009. Learning deep architectures for AI. Found. Trends Mach. Learn. 2, 1 (2009), 1--127. Google Scholar
Digital Library
- Haithem Boussaid and Iasonas Kokkinos. 2014. Fast and exact: ADMM-based discriminative shape segmentation with loopy part models. In IEEE Conference on Computer Vision and Pattern Recognition. Google Scholar
Digital Library
- Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3, 1 (2011), 1--122. Google Scholar
Digital Library
- Paul S. Bradley and Olvi L. Mangasarian. 2000. K-plane clustering. J. Global Optimiz. 16, 1 (2000), 23--32. Google Scholar
Digital Library
- Wen-Yen Chen, Yangqiu Song, Hongjie Bai, Chih-Jen Lin, and Edward Y. Chang. 2011. Parallel spectral clustering in distributed systems. IEEE Trans. Pattern Anal. Mach. Intell. 33, 3 (2011), 568--586. Google Scholar
Digital Library
- Bin Cheng, Jianchao Yang, Shuicheng Yan, Yun Fu, and Thomas S. Huang. 2010. Learning with ℓ1 graph for image analysis. IEEE Trans. Image Process. 19, 4 (2010), 858--866. Google Scholar
Digital Library
- João Paulo Costeira and Takeo Kanade. 1998. A multibody factorization method for independently moving objects. Int. J. Comput. Vis. 29, 3 (1998), 159--179. Google Scholar
Digital Library
- Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V. Le, and others. 2012. Large scale distributed deep networks. In Advances in Neural Information Processing Systems. Google Scholar
Digital Library
- Ehsan Elhamifar and René Vidal. 2009. Sparse subspace clustering. In IEEE Conference on Computer Vision and Pattern Recognition. Google Scholar
Cross Ref
- Ehsan Elhamifar and René Vidal. 2013. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 35, 11 (2013), 2765--2781. Google Scholar
Digital Library
- Jiashi Feng, Zhouchen Lin, Huan Xu, and Shuicheng Yan. 2014. Robust subspace segmentation with block-diagonal prior. In IEEE Conference on Computer Vision and Pattern Recognition. Google Scholar
Digital Library
- Q. Fu, H. Wang, and A. Banerjee. 2013. Bethe-ADMM for tree decomposition based parallel MAP inference. In Conference on Uncertainty in Artificial Intelligence. Google Scholar
Digital Library
- Hans P. Graf, Eric Cosatto, Leon Bottou, Igor Dourdanovic, and Vladimir Vapnik. 2004. Parallel support vector machines: The cascade SVM. In Advances in Neural Information Processing Systems. Google Scholar
Digital Library
- Xiaofei He and Partha Niyogi. 2004. Locality preserving projections. In Neural Information Processing Systems.Google Scholar
- Mingyi Hong, Zhi-Quan Luo, and Meisam Razaviyayn. 2016. Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optimiz. 26, 1 (2016), 337--364. Google Scholar
Cross Ref
- Wei Hong, John Wright, Kun Huang, and Yi Ma. 2006. Multiscale hybrid linear models for lossy image representation. IEEE Trans. Image Process. 15, 12 (2006), 3655--3671. Google Scholar
Digital Library
- Han Hu, Zhouchen Lin, Jianjiang Feng, and Jie Zhou. 2014. Smooth representation clustering. In IEEE Conference on Computer Vision and Pattern Recognition. Google Scholar
Digital Library
- Ian Jolliffe. 2005. Principal Component Analysis. Wiley Online Library. Google Scholar
Cross Ref
- Tim Kraska, Ameet Talwalkar, John C. Duchi, Rean Griffith, Michael J. Franklin, and Michael I. Jordan. 2013. MLbase: A distributed machine-learning system. In Biennial Conference on Innovative Data Systems Research.Google Scholar
- Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. (2009).Google Scholar
- Hanjiang Lai, Yan Pan, Canyi Lu, Yong Tang, and Shuicheng Yan. 2014. Efficient k-support matrix pursuit. In European Conference on Computer Vision. Google Scholar
Cross Ref
- Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324. Google Scholar
Cross Ref
- Baohua Li, Ying Zhang, Zhouchen Lin, and Huchuan Lu. 2015. Subspace clustering by mixture of gaussian regression. In IEEE Conference on Computer Vision and Pattern Recognition. Google Scholar
Cross Ref
- Mu Li, Li Zhou, Zichao Yang, Aaron Li, Fei Xia, David G. Andersen, and Alexander Smola. 2013. Parameter server for distributed machine learning. In The Big Learning Workshop at Advances in Neural Information Processing Systems.Google Scholar
- Yingyu Liang, Maria-Florina Balcan, and Vandana Kanchanapally. 2013. Distributed PCA and k-means clustering. In The Big Learning Workshop at Advances in Neural Information Processing Systems.Google Scholar
- Bo Liu, Meng Wang, Richang Hong, Zhengjun Zha, and Xian-Sheng Hua. 2010. Joint learning of labels and distance metric. IEEE Trans. Syst. Man Cybernet. B 40, 3 (2010), 973--978. Google Scholar
Digital Library
- Bo Liu, Xiao-Tong Yuan, Yang Yu, Qingshan Liu, and Dimitris N. Metaxas. 2016. Decentralized robust subspace clustering. In The AAAI Conference on Artificial Intelligence. Google Scholar
Digital Library
- Guangcan Liu, Zhouchen Lin, Shuicheng Yan, Ju Sun, Yong Yu, and Yi Ma. 2013. Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1 (2013), 171--184. Google Scholar
Digital Library
- Canyi Lu, Jiashi Feng, Zhouchen Lin, and Shuicheng Yan. 2013. Correlation adaptive subspace segmentation by trace lasso. In IEEE International Conference on Computer Vision. Google Scholar
Digital Library
- Canyi Lu, Hai Min, Zhong-Qiu Zhao, Lin Zhu, De-Shuang Huang, and Shuicheng Yan. 2012. Robust and efficient subspace segmentation via least squares regression. In European Conference on Computer Vision. Google Scholar
Digital Library
- Dijun Luo, Feiping Nie, Chris Ding, and Heng Huang. 2011. Multi-subspace representation and discovery. In Machine Learning and Knowledge Discovery in Databases. Springer, 405--420. Google Scholar
Digital Library
- Yi Ma, Allen Y. Yang, Harm Derksen, and Robert Fossum. 2008. Estimation of subspace arrangements with applications in modeling and segmenting mixed data. SIAM Rev. 50, 3 (2008), 413--458. Google Scholar
Digital Library
- Nicolai Meinshausen and Peter Bühlmann. 2006. High-dimensional graphs and variable selection with the lasso. Ann. Stat. 34, 3 (2006), 1436--1462. Google Scholar
Cross Ref
- Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. 2002. On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems (2002). Google Scholar
Digital Library
- Robert Nishihara, Laurent Lessard, Benjamin Recht, Andrew Packard, and Michael I. Jordan. 2015. A general analysis of the convergence of ADMM. In International Conference on Machine Learning. Google Scholar
Digital Library
- Feng Niu, Benjamin Retcht, Christopher Ré, and Stephen J. Wright. 2011. Hogwild! a lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems. Google Scholar
Digital Library
- Aude Oliva and Antonio Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 3 (2001), 145--175. Google Scholar
Digital Library
- Dohyung Park, Constantine Caramanis, and Sujay Sanghavi. 2014. Greedy subspace clustering. In Advances in Neural Information Processing Systems. Google Scholar
Digital Library
- Lance Parsons, Ehtesham Haque, and Huan Liu. 2004. Subspace clustering for high dimensional data: A review. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 90--105. Google Scholar
Digital Library
- Yigang Peng, Arvind Ganesh, John Wright, Wenli Xu, and Yi Ma. 2012. RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34, 11 (2012), 2233--2246. Google Scholar
Digital Library
- Rajat Raina, Anand Madhavan, and Andrew Y. Ng. 2009. Large-scale deep unsupervised learning using graphics processors. In International Conference on Machine Learning. Google Scholar
Digital Library
- Jianbo Shi and Jitendra Malik. 2000. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 8 (2000), 888--905. Google Scholar
Digital Library
- Wei Shi, Qing Ling, Kun Yuan, Gang Wu, and Wotao Yin. 2014. On the linear convergence of the ADMM in decentralized consensus optimization. IEEE Trans. Signal Process. 62, 7 (2014), 1750--1761. Google Scholar
Digital Library
- Mahdi Soltanolkotabi, Emmanuel J. Candes, and others. 2012. A geometric analysis of subspace clustering with outliers. The Annals of Statistics 40, 4 (2012), 2195--2238. Google Scholar
Cross Ref
- Evan R. Sparks, Ameet Talwalkar, Virginia Smith, Jey Kottalam, Xinghao Pan, Joseph Gonzalez, Michael J. Franklin, Michael I. Jordan, and Tim Kraska. 2013. MLI: An API for distributed machine learning. In IEEE International Conference on Data Mining. Google Scholar
Cross Ref
- Martin Szummer and Martin Szummer. 2002. Partially labeled classification with Markov random walks. Advances in Neural Information Processing Systems (2002). Google Scholar
Digital Library
- Jinhui Tang, Richang Hong, Shuicheng Yan, Tat-Seng Chua, Guo-Jun Qi, and Ramesh Jain. 2011. Image annotation by kNN-sparse graph-based label propagation over noisily tagged web images. ACM Trans. Intell. Syst. Technol. 2, 2 (2011), 14. Google Scholar
Digital Library
- Ameet Talwalkar, Tim Kraska, Rean Griffith, John Duchi, Joseph Gonzalez, Denny Britz, Xinghao Pan, Virginia Smith, Evan Sparks, Andre Wibisono, Michael J. Franklin, and Michael I. Jordan. 2012. MLbase: A distributed machine learning wrapper. NIPS Big Learning Workshop.Google Scholar
- Stephen Tierney, Junbin Gao, and Yi Guo. 2014. Subspace clustering for sequential data. In IEEE Conference on Computer Vision and Pattern Recognition. Google Scholar
Digital Library
- Michael Tipping and Christopher Bishop. 1999. Mixtures of probabilistic principal component analyzers. Neur. Comput. 11, 2 (1999), 443--482. Google Scholar
Digital Library
- René Vidal. 2010. A tutorial on subspace clustering. IEEE Signal Process. Mag. 28, 2 (2010), 52--68. Google Scholar
Cross Ref
- René Vidal, Stefano Soatto, Yi Ma, and Shankar Sastry. 2003. An algebraic geometric approach to the identification of a class of linear hybrid systems. In IEEE Conference on Decision and Control. Google Scholar
Cross Ref
- Huahua Wang, Arindam Banerjee, Cho-Jui Hsieh, Pradeep Ravikumar, and Inderjit Dhillon. 2013. Large scale distributed sparse precision estimation. In Advances in Neural Information Processing Systems. Google Scholar
Digital Library
- Meng Wang, Xian-Sheng Hua, Richang Hong, Jinhui Tang, Guo-Jun Qi, and Yan Song. 2009. Unified video annotation via multigraph learning. IEEE Trans. Circ. Syst. Video Technol. 19, 5 (2009), 733--746. Google Scholar
Digital Library
- Meng Wang, Xueliang Liu, and Xindong Wu. 2015a. Visual classification by ℓ1-hypergraph modeling. IEEE Trans. Knowl. Data Eng. 27, 9 (2015), 2564--2574. Google Scholar
Cross Ref
- Yu Wang, David Wipf, Qing Ling, Wei Chen, and Ian Wassell. 2015b. Multi-task learning for subspace segmentation. (2015).Google Scholar
- Ermin Wei and Asuman Ozdaglar. 2013. On the O(1/k) convergence of asynchronous distributed alternating direction method of multipliers. In IEEE Global Conference on Signal and Information Processing. Google Scholar
Cross Ref
- Shuicheng Yan and Huan Wang. 2009. Semi-supervised learning by sparse representation. In SIAM International Conference on Data Mining. Google Scholar
Cross Ref
- Allen Y. Yang, John Wright, Yi Ma, and Shankar Sastry. 2008. Unsupervised segmentation of natural images via lossy data compression. Comput. Vision Image Underst. 110, 2 (2008), 212--225. Google Scholar
Digital Library
- Sen Yang, Jie Wang, Wei Fan, Xiatian Zhang, Peter Wonka, and Jieping Ye. 2013. An efficient ADMM algorithm for multidimensional anisotropic total variation regularization problems. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google Scholar
Digital Library
- Zhi-Qin Yu, Xing-Jian Shi, Ling Yan, and Wu-Jun Li. 2014. Distributed stochastic ADMM for matrix factorization. In ACM International Conference on Conference on Information and Knowledge Management. Google Scholar
Digital Library
- Xiao-Tong Yuan and Ping Li. 2014. Sparse additive subspace clustering. In European Conference on Computer Vision. Google Scholar
Cross Ref
- Caoxie Zhang, Honglak Lee, and Kang G. Shin. 2012. Efficient distributed linear classification algorithms via the alternating direction method of multipliers. In International Conference on Artificial Intelligence and Statistics.Google Scholar
- Ruiliang Zhang and James Kwok. 2014a. Asynchronous distributed ADMM algorithm for Global Variable Consensus Optimization. In International Conference on Machine Learning. Google Scholar
Digital Library
- Ruiliang Zhang and James Kwok. 2014b. Asynchronous distributed ADMM for consensus optimization. In International Conference on Machine Learning. Google Scholar
Digital Library
- Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf. 2004. Learning with local and global consistency. In Advances in Neural Information Processing Systems. Google Scholar
Digital Library
- Xiaojin Zhu, Zoubin Ghahramani, John Lafferty, and others. 2003. Semi-supervised learning using gaussian fields and harmonic functions. In International Conference on Machine Learning. Google Scholar
Digital Library
Index Terms
Parallel Sparse Subspace Clustering via Joint Sample and Parameter Blockwise Partition
Recommendations
Structured Sparse Subspace Clustering with Within-Cluster Grouping
Highlights- We define a concept of GEWC to group data from the same subspace together and design a new regularization term to enforce it.
AbstractMany high-dimensional data in computer vision essentially lie in multiple low-dimensional subspaces. Recently developed subspace clustering methods have shown good effectiveness in recovering the underlying low-dimensional subspace ...
Subspace clustering based on latent low rank representation with Frobenius norm minimization
The problem of subspace clustering which refers to segmenting a collection of data samples approximately drawn from a union of linear subspaces is considered in this paper. Among existing subspace clustering algorithms, low rank representation (LRR) ...
Sparse sample self-representation for subspace clustering
This paper proposes a new subspace clustering method based on sparse sample self-representation (SSR). The proposed method considers SSR to solve the problem that affinity matrix does not strictly follow the structure of subspace, and also utilizes ...






Comments