Abstract
Multi-label problems arise in various domains such as multi-topic document categorization, protein function prediction, and automatic image annotation. One natural way to deal with such problems is to construct a binary classifier for each label, resulting in a set of independent binary classification problems. Since multiple labels share the same input space, and the semantics conveyed by different labels are usually correlated, it is essential to exploit the correlation information contained in different labels. In this paper, we consider a general framework for extracting shared structures in multi-label classification. In this framework, a common subspace is assumed to be shared among multiple labels. We show that the optimal solution to the proposed formulation can be obtained by solving a generalized eigenvalue problem, though the problem is nonconvex. For high-dimensional problems, direct computation of the solution is expensive, and we develop an efficient algorithm for this case. One appealing feature of the proposed framework is that it includes several well-known algorithms as special cases, thus elucidating their intrinsic relationships. We further show that the proposed framework can be extended to the kernel-induced feature space. We have conducted extensive experiments on multi-topic web page categorization and automatic gene expression pattern image annotation tasks, and results demonstrate the effectiveness of the proposed formulation in comparison with several representative algorithms.
- Amit, Y., Fink, M., Srebro, N., and Ullman, S. 2007. Uncovering shared structures in multiclass classification. In Proceedings of the 24th International Conference on Machine Learning. 17--24. Google Scholar
Digital Library
- Andersen, E. D. and Andersen, K. D. 2000. The MOSEK interior point optimizer for linear programming: An implementation of the homogeneous algorithm. In High Performance Optimization. Kluwer Academic Publishers, 197--232.Google Scholar
- Ando, R. K. and Zhang, T. 2005. A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Resear. 6, 1817--1853. Google Scholar
Digital Library
- Arenas-García, J., Petersen, K. B., and Hansen, L. K. 2007. Sparse kernel orthonormalized PLS for feature extraction in large data sets. Adv. Neural Inform. Proces. Syst. 19. 33--40.Google Scholar
- Argyriou, A., Maurer, A., and Pontil, M. 2008. An algorithm for transfer learning in a heterogeneous environment. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases. 71--85.Google Scholar
- Bakker, B. and Heskes, T. 2003. Task clustering and gating for Bayesian multitask learning. J. Mach. Learn. Resear. 4, 83--99. Google Scholar
Digital Library
- Barnard, K., Duygulu, P., Forsyth, D., d. Freitas, N., Blei, D. M., and Jordan, M. I. 2003. Matching words and pictures. J. Mach. Learn. Resear. 3, 1107--1135. Google Scholar
Digital Library
- Barutcuoglu, Z., Schapire, R. E., and Troyanskaya, O. G. 2006. Hierarchical multi-label prediction of gene function. Bioinformatics 22, 7, 830--836. Google Scholar
Digital Library
- Carneiro, G., Chan, A. B., Moreno, P. J., and Vasconcelos, M.-N. 2007. Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans. Patt. Anal. Mach. Intel. 29, 3, 394--410. Google Scholar
Digital Library
- Chang, C.-C. and Lin, C.-J. 2001. LIBSVM: A Library for Support Vector Machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm.Google Scholar
- Elisseeff, A. and Weston, J. 2002. A kernel method for multi-labelled classification. Adv. Neural Inform. Proces. Syst. 14. 681--687.Google Scholar
- Fan, R.-E. and Lin, C.-J. 2007. A study on threshold selection for multi-label classification. Tech. rep., Department of Computer Science and Information Engineering, National Taiwan University.Google Scholar
- Fukunaga, K. 1990. Introduction to Statistical Pattern Recognition 2nd Ed. Academic Press Professional. Google Scholar
Digital Library
- Fung, G. M. and Mangasarian, O. L. 2005. Multicategory proximal support vector machine classifiers. Mach. Learn. 59, 1-2, 77--97. Google Scholar
Digital Library
- Ghamrawi, N. and McCallum, A. 2005. Collective multi-label classification. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management. 195--200. Google Scholar
Digital Library
- Golub, G. H. and Van Loan, C. F. 1996. Matrix Computations 3rd Ed. The Johns Hopkins University Press. Google Scholar
Digital Library
- Grauman, K. and Darrell, T. 2006. Approximate correspondences in high dimensions. Adv. Neural Inform. Proces. Syst. 19. 505--512.Google Scholar
- Grauman, K. and Darrell, T. 2007. The pyramid match kernel: Efficient learning with sets of features. J. Mach. Learn. Res. 8, 725--760. Google Scholar
Digital Library
- Hoerl, A. and Kennard, R. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 3, 55--67.Google Scholar
Digital Library
- Hotelling, H. 1936. Relations between two sets of variates. Biometrika 28, 3-4, 321--377.Google Scholar
Cross Ref
- Jacob, L., Bach, F., and Vert, J.-P. 2009. Clustered multi-task learning: A convex formulation. Adv. Neural Inform. Proces. Syst. 21, D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, Eds. 745--752.Google Scholar
- Ji, S., Li, Y.-X., Zhou, Z.-H., Kumar, S., and Ye, J. 2009a. A bag-of-words approach for Drosophila gene expression pattern annotation. Bioinformatics 10, 1, 119.Google Scholar
- Ji, S., Sun, L., Jin, R., Kumar, S., and Ye, J. 2008. Automated annotation of Drosophila gene expression patterns using a controlled vocabulary. Bioinformatics 24, 17, 1881--1888. Google Scholar
Digital Library
- Ji, S. and Ye, J. 2009. Linear dimensionality reduction for multi-label classification. In Proceedings of the 21st International Joint Conference on Artificial Intelligence. 1077--1082. Google Scholar
Digital Library
- Ji, S., Yuan, L., Li, Y.-X., Zhou, Z.-H., Kumar, S., and Ye, J. 2009b. Drosophila gene expression pattern annotation using sparse features and term-term interactions. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 407--416. Google Scholar
Digital Library
- Jin, R. and Ghahramani, Z. 2002. Learning with multiple labels. Adv. Neural Inform. Proces. Syst. 15. 897--904.Google Scholar
- Joachims, T. 1998. Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the 10th European Conference on Machine Learning. 137--142. Google Scholar
Digital Library
- Joachims, T. 2005. A support vector method for multivariate performance measures. In Proceedings of the 22nd International Conference on Machine Learning. 377--384. Google Scholar
Digital Library
- Joachims, T. 2006. Training linear SVMs in linear time. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 217--226. Google Scholar
Digital Library
- Kang, F., Jin, R., and Sukthankar, R. 2006. Correlated label propagation with application to multi-label learning. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 1719--1726. Google Scholar
Digital Library
- Kazawa, H., Izumitani, T., Taira, H., and Maeda, E. 2005. Maximal margin labeling for multi-topic text categorization. Adv. Neural Inform. Proces. Syst. 17. 649--656.Google Scholar
- Kim, S., Sohn, K.-A., and Xing, E. P. 2008. A multivariate regression approach to association analysis of quantitative trait network. Tech. rep. CMU-ML-08-113, Carnegie Mellon University.Google Scholar
- Kumar, S., Jayaraman, K., Panchanathan, S., Gurunathan, R., Marti-Subirana, A., and Newfeld, S. J. 2002. BEST: A novel computational approach for comparing gene expression patterns from early stages of Drosophlia melanogaster develeopment. Genetics 169, 2037--2047.Google Scholar
Cross Ref
- Larsen, R. M. 2000. Computing the SVD for large and sparse matrices. http://soi.stanford.edu/~rmunk/PROPAC.Google Scholar
- Lewis, D. D., Yang, Y., Rose, T. G., and Li, F. 2004. RCV1: A new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361--397. Google Scholar
Digital Library
- Li, J. and Wang, J. Z. 2008. Real-time computerized annotation of pictures. IEEE Trans. Patt. Anal. Mach. Intel. 3, 6, 985--1002. Google Scholar
Digital Library
- Li, Y.-X., Ji, S., Kumar, S., Ye, J., and Zhou, Z.-H. 2009. Drosophila gene expression pattern annotation through multi-instance multi-label learning. In Proceedings of the 21st International Joint Conference on Artificial Intelligence. 1445--1450. Google Scholar
Digital Library
- McCallum, A. 1999. Multi-label text classification with a mixture model trained by EM. In Proceedings of the AAAI Workshop on Text Learning.Google Scholar
- Mikolajczyk, K. and Schmid, C. 2005. A performance evaluation of local descriptors. IEEE Trans. Patt. Anal. Mach. Intel. 27, 10, 1615--1630. Google Scholar
Digital Library
- Monay, F. and Gatica-Perez, D. 2007. Modeling semantic aspects for cross-media image indexing. IEEE Trans. Patt. Anal. Mach. Intel. 29, 10. Google Scholar
Digital Library
- Park, H., Jeon, M., and Rosen, J. B. 2003. Lower dimensional representation of text data based on centroids and least squares. BIT 43, 2, 1--22.Google Scholar
Cross Ref
- Rifkin, R. and Klautau, A. 2004. In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101--141. Google Scholar
Digital Library
- Roth, V. and Fischer, B. 2007. Improved functional prediction of proteins by learning kernel combinations in multilabel settings. Bioinformatics 8, S12.Google Scholar
- Schölkopf, S. and Smola, A. 2002. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond. MIT Press.Google Scholar
- Sun, L., Ji, S., and Ye, J. 2008a. Hypergraph spectral learning for multi-label classification. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google Scholar
Digital Library
- Sun, L., Ji, S., and Ye, J. 2008b. A least squares formulation for canonical correlation analysis. In Proceedings of the 25th International Conference on Machine Learning. 1024--1031. Google Scholar
Digital Library
- Sun, L., Ji, S., and Ye, J. 2009. On the equivalence between canonical correlation analysis and orthonormalized partial least squares. In Proceedings of the 21st International Joint Conference on Artificial Intelligence. 1230--1235. Google Scholar
Digital Library
- Tang, L., Chen, J., and Ye, J. 2009. On multiple kernel learning with multiple labels. In Proceedings of the 21st International Joint Conference on Artificial Intelligence. Google Scholar
Digital Library
- Tang, L., Rajan, S., and Narayanan, V. K. 2009. Large scale multi-label classification via MetaLabeler. In Proceedings of the 18th International World Wide Web Conference. Google Scholar
Digital Library
- Tomancak, P., Beaton, A., Weiszmann, R., Kwan, E., Shu, S. Q., Lewis, S. E., Richards, S., Ashburner, M., Hartenstein, V., Celniker, S. E., and Rubin, G. 2002. Systematic determination of patterns of gene expression during Drosophila embryogenesis. Genome Biol. 3, 12.Google Scholar
Cross Ref
- Tomancak, P., Berman, B., Beaton, A., Weiszmann, R., Kwan, E., Hartenstein, V., Celniker, S., and Rubin, G. 2007. Global analysis of patterns of gene expression during Drosophila embryogenesis. Genome Biol. 8, 7, R145.Google Scholar
Cross Ref
- Ueda, N. and Saito, K. 2002a. Parametric mixture models for multi-labeled text. Adv. Neural Inform. Proces. Syst. 15. 721--728.Google Scholar
- Ueda, N. and Saito, K. 2002b. Single-shot detection of multiple categories of text using parametric mixture models. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 626--631. Google Scholar
Digital Library
- Wold, H. 1966. Estimation of principal components and related models by iterative least squares. P. R. Krishnaiah, Ed., Multivariate Analysis. Academic Press, New York, 391--420.Google Scholar
- Yan, R., Tesic, J., and Smith, J. R. 2007. Model-shared subspace boosting for multi-label classification. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 834--843. Google Scholar
Digital Library
- Yang, Y. and Pedersen, J. O. 1997. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning. 412--420. Google Scholar
Digital Library
- Ye, J. 2005. Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems. J. Mach. Learn. Resear. 6, 483--502. Google Scholar
Digital Library
- Ye, J. 2007. Least squares linear discriminant analysis. In Proceedings of the 24th International Conference on Machine Learning. 1087--1093. Google Scholar
Digital Library
- Yu, K., Yu, S., and Tresp, V. 2005. Multi-label informed latent semantic indexing. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. 258--265. Google Scholar
Digital Library
- Zhang, J., Marszalek, M., Lazebnik, S., and Schmid, C. 2007. Local features and kernels for classification of texture and object categories: A comprehensive study. Int. J. Comput. Vision 73, 2, 213--238. Google Scholar
Digital Library
- Zhang, M.-L. and Zhou, Z.-H. 2006. Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans. Knowl. Data Engin. 18, 10, 1338--1351. Google Scholar
Digital Library
- Zhang, M.-L. and Zhou, Z.-H. 2007. ML-KNN: A lazy learning approach to multi-label learning. Patt. Recog. 40, 7, 2038--2048. Google Scholar
Digital Library
- Zhou, Z.-H. and Zhang, M.-L. 2007. Multi-instance multi-label learning with application to scene classification. Adv. Neural Inform. Process. Syst. 19. 1609--1616.Google Scholar
Index Terms
A shared-subspace learning framework for multi-label classification
Recommendations
Extracting shared subspace for multi-label classification
Multi-label problems arise in various domains such as multi-topic document categorization and protein function prediction. One natural way to deal with such problems is to construct a binary classifier for each label, resulting in a set of independent ...
Improving multi-label classification using semi-supervised learning and dimensionality reduction
Multi-label classification has been increasingly recognized since it can assign multiple class labels to an object. This paper proposes a new method to solve simultaneously two major problems in multi-label classification; (1) requirement of sufficient ...
Learning shared subspace for multi-label dimensionality reduction via dependence maximization
Multi-label Dimensionality reduction via Dependence Maximization (MDDM) has been proposed recently to cope with high-dimensional multi-label data. MDDM projects the original data onto a lower-dimensional feature space in which the dependence between the ...





Comments