research-article

A shared-subspace learning framework for multi-label classification

Published:28 May 2010Publication History
Skip Abstract Section

Abstract

Multi-label problems arise in various domains such as multi-topic document categorization, protein function prediction, and automatic image annotation. One natural way to deal with such problems is to construct a binary classifier for each label, resulting in a set of independent binary classification problems. Since multiple labels share the same input space, and the semantics conveyed by different labels are usually correlated, it is essential to exploit the correlation information contained in different labels. In this paper, we consider a general framework for extracting shared structures in multi-label classification. In this framework, a common subspace is assumed to be shared among multiple labels. We show that the optimal solution to the proposed formulation can be obtained by solving a generalized eigenvalue problem, though the problem is nonconvex. For high-dimensional problems, direct computation of the solution is expensive, and we develop an efficient algorithm for this case. One appealing feature of the proposed framework is that it includes several well-known algorithms as special cases, thus elucidating their intrinsic relationships. We further show that the proposed framework can be extended to the kernel-induced feature space. We have conducted extensive experiments on multi-topic web page categorization and automatic gene expression pattern image annotation tasks, and results demonstrate the effectiveness of the proposed formulation in comparison with several representative algorithms.

References

  1. Amit, Y., Fink, M., Srebro, N., and Ullman, S. 2007. Uncovering shared structures in multiclass classification. In Proceedings of the 24th International Conference on Machine Learning. 17--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Andersen, E. D. and Andersen, K. D. 2000. The MOSEK interior point optimizer for linear programming: An implementation of the homogeneous algorithm. In High Performance Optimization. Kluwer Academic Publishers, 197--232.Google ScholarGoogle Scholar
  3. Ando, R. K. and Zhang, T. 2005. A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Resear. 6, 1817--1853. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Arenas-García, J., Petersen, K. B., and Hansen, L. K. 2007. Sparse kernel orthonormalized PLS for feature extraction in large data sets. Adv. Neural Inform. Proces. Syst. 19. 33--40.Google ScholarGoogle Scholar
  5. Argyriou, A., Maurer, A., and Pontil, M. 2008. An algorithm for transfer learning in a heterogeneous environment. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases. 71--85.Google ScholarGoogle Scholar
  6. Bakker, B. and Heskes, T. 2003. Task clustering and gating for Bayesian multitask learning. J. Mach. Learn. Resear. 4, 83--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Barnard, K., Duygulu, P., Forsyth, D., d. Freitas, N., Blei, D. M., and Jordan, M. I. 2003. Matching words and pictures. J. Mach. Learn. Resear. 3, 1107--1135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Barutcuoglu, Z., Schapire, R. E., and Troyanskaya, O. G. 2006. Hierarchical multi-label prediction of gene function. Bioinformatics 22, 7, 830--836. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Carneiro, G., Chan, A. B., Moreno, P. J., and Vasconcelos, M.-N. 2007. Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans. Patt. Anal. Mach. Intel. 29, 3, 394--410. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chang, C.-C. and Lin, C.-J. 2001. LIBSVM: A Library for Support Vector Machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm.Google ScholarGoogle Scholar
  11. Elisseeff, A. and Weston, J. 2002. A kernel method for multi-labelled classification. Adv. Neural Inform. Proces. Syst. 14. 681--687.Google ScholarGoogle Scholar
  12. Fan, R.-E. and Lin, C.-J. 2007. A study on threshold selection for multi-label classification. Tech. rep., Department of Computer Science and Information Engineering, National Taiwan University.Google ScholarGoogle Scholar
  13. Fukunaga, K. 1990. Introduction to Statistical Pattern Recognition 2nd Ed. Academic Press Professional. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Fung, G. M. and Mangasarian, O. L. 2005. Multicategory proximal support vector machine classifiers. Mach. Learn. 59, 1-2, 77--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ghamrawi, N. and McCallum, A. 2005. Collective multi-label classification. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management. 195--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Golub, G. H. and Van Loan, C. F. 1996. Matrix Computations 3rd Ed. The Johns Hopkins University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Grauman, K. and Darrell, T. 2006. Approximate correspondences in high dimensions. Adv. Neural Inform. Proces. Syst. 19. 505--512.Google ScholarGoogle Scholar
  18. Grauman, K. and Darrell, T. 2007. The pyramid match kernel: Efficient learning with sets of features. J. Mach. Learn. Res. 8, 725--760. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hoerl, A. and Kennard, R. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 3, 55--67.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hotelling, H. 1936. Relations between two sets of variates. Biometrika 28, 3-4, 321--377.Google ScholarGoogle ScholarCross RefCross Ref
  21. Jacob, L., Bach, F., and Vert, J.-P. 2009. Clustered multi-task learning: A convex formulation. Adv. Neural Inform. Proces. Syst. 21, D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, Eds. 745--752.Google ScholarGoogle Scholar
  22. Ji, S., Li, Y.-X., Zhou, Z.-H., Kumar, S., and Ye, J. 2009a. A bag-of-words approach for Drosophila gene expression pattern annotation. Bioinformatics 10, 1, 119.Google ScholarGoogle Scholar
  23. Ji, S., Sun, L., Jin, R., Kumar, S., and Ye, J. 2008. Automated annotation of Drosophila gene expression patterns using a controlled vocabulary. Bioinformatics 24, 17, 1881--1888. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ji, S. and Ye, J. 2009. Linear dimensionality reduction for multi-label classification. In Proceedings of the 21st International Joint Conference on Artificial Intelligence. 1077--1082. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ji, S., Yuan, L., Li, Y.-X., Zhou, Z.-H., Kumar, S., and Ye, J. 2009b. Drosophila gene expression pattern annotation using sparse features and term-term interactions. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 407--416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jin, R. and Ghahramani, Z. 2002. Learning with multiple labels. Adv. Neural Inform. Proces. Syst. 15. 897--904.Google ScholarGoogle Scholar
  27. Joachims, T. 1998. Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the 10th European Conference on Machine Learning. 137--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Joachims, T. 2005. A support vector method for multivariate performance measures. In Proceedings of the 22nd International Conference on Machine Learning. 377--384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Joachims, T. 2006. Training linear SVMs in linear time. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 217--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Kang, F., Jin, R., and Sukthankar, R. 2006. Correlated label propagation with application to multi-label learning. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 1719--1726. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Kazawa, H., Izumitani, T., Taira, H., and Maeda, E. 2005. Maximal margin labeling for multi-topic text categorization. Adv. Neural Inform. Proces. Syst. 17. 649--656.Google ScholarGoogle Scholar
  32. Kim, S., Sohn, K.-A., and Xing, E. P. 2008. A multivariate regression approach to association analysis of quantitative trait network. Tech. rep. CMU-ML-08-113, Carnegie Mellon University.Google ScholarGoogle Scholar
  33. Kumar, S., Jayaraman, K., Panchanathan, S., Gurunathan, R., Marti-Subirana, A., and Newfeld, S. J. 2002. BEST: A novel computational approach for comparing gene expression patterns from early stages of Drosophlia melanogaster develeopment. Genetics 169, 2037--2047.Google ScholarGoogle ScholarCross RefCross Ref
  34. Larsen, R. M. 2000. Computing the SVD for large and sparse matrices. http://soi.stanford.edu/~rmunk/PROPAC.Google ScholarGoogle Scholar
  35. Lewis, D. D., Yang, Y., Rose, T. G., and Li, F. 2004. RCV1: A new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361--397. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Li, J. and Wang, J. Z. 2008. Real-time computerized annotation of pictures. IEEE Trans. Patt. Anal. Mach. Intel. 3, 6, 985--1002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Li, Y.-X., Ji, S., Kumar, S., Ye, J., and Zhou, Z.-H. 2009. Drosophila gene expression pattern annotation through multi-instance multi-label learning. In Proceedings of the 21st International Joint Conference on Artificial Intelligence. 1445--1450. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. McCallum, A. 1999. Multi-label text classification with a mixture model trained by EM. In Proceedings of the AAAI Workshop on Text Learning.Google ScholarGoogle Scholar
  39. Mikolajczyk, K. and Schmid, C. 2005. A performance evaluation of local descriptors. IEEE Trans. Patt. Anal. Mach. Intel. 27, 10, 1615--1630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Monay, F. and Gatica-Perez, D. 2007. Modeling semantic aspects for cross-media image indexing. IEEE Trans. Patt. Anal. Mach. Intel. 29, 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Park, H., Jeon, M., and Rosen, J. B. 2003. Lower dimensional representation of text data based on centroids and least squares. BIT 43, 2, 1--22.Google ScholarGoogle ScholarCross RefCross Ref
  42. Rifkin, R. and Klautau, A. 2004. In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Roth, V. and Fischer, B. 2007. Improved functional prediction of proteins by learning kernel combinations in multilabel settings. Bioinformatics 8, S12.Google ScholarGoogle Scholar
  44. Schölkopf, S. and Smola, A. 2002. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond. MIT Press.Google ScholarGoogle Scholar
  45. Sun, L., Ji, S., and Ye, J. 2008a. Hypergraph spectral learning for multi-label classification. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Sun, L., Ji, S., and Ye, J. 2008b. A least squares formulation for canonical correlation analysis. In Proceedings of the 25th International Conference on Machine Learning. 1024--1031. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Sun, L., Ji, S., and Ye, J. 2009. On the equivalence between canonical correlation analysis and orthonormalized partial least squares. In Proceedings of the 21st International Joint Conference on Artificial Intelligence. 1230--1235. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Tang, L., Chen, J., and Ye, J. 2009. On multiple kernel learning with multiple labels. In Proceedings of the 21st International Joint Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Tang, L., Rajan, S., and Narayanan, V. K. 2009. Large scale multi-label classification via MetaLabeler. In Proceedings of the 18th International World Wide Web Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Tomancak, P., Beaton, A., Weiszmann, R., Kwan, E., Shu, S. Q., Lewis, S. E., Richards, S., Ashburner, M., Hartenstein, V., Celniker, S. E., and Rubin, G. 2002. Systematic determination of patterns of gene expression during Drosophila embryogenesis. Genome Biol. 3, 12.Google ScholarGoogle ScholarCross RefCross Ref
  51. Tomancak, P., Berman, B., Beaton, A., Weiszmann, R., Kwan, E., Hartenstein, V., Celniker, S., and Rubin, G. 2007. Global analysis of patterns of gene expression during Drosophila embryogenesis. Genome Biol. 8, 7, R145.Google ScholarGoogle ScholarCross RefCross Ref
  52. Ueda, N. and Saito, K. 2002a. Parametric mixture models for multi-labeled text. Adv. Neural Inform. Proces. Syst. 15. 721--728.Google ScholarGoogle Scholar
  53. Ueda, N. and Saito, K. 2002b. Single-shot detection of multiple categories of text using parametric mixture models. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 626--631. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Wold, H. 1966. Estimation of principal components and related models by iterative least squares. P. R. Krishnaiah, Ed., Multivariate Analysis. Academic Press, New York, 391--420.Google ScholarGoogle Scholar
  55. Yan, R., Tesic, J., and Smith, J. R. 2007. Model-shared subspace boosting for multi-label classification. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 834--843. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Yang, Y. and Pedersen, J. O. 1997. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning. 412--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Ye, J. 2005. Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems. J. Mach. Learn. Resear. 6, 483--502. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Ye, J. 2007. Least squares linear discriminant analysis. In Proceedings of the 24th International Conference on Machine Learning. 1087--1093. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Yu, K., Yu, S., and Tresp, V. 2005. Multi-label informed latent semantic indexing. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. 258--265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Zhang, J., Marszalek, M., Lazebnik, S., and Schmid, C. 2007. Local features and kernels for classification of texture and object categories: A comprehensive study. Int. J. Comput. Vision 73, 2, 213--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Zhang, M.-L. and Zhou, Z.-H. 2006. Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans. Knowl. Data Engin. 18, 10, 1338--1351. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Zhang, M.-L. and Zhou, Z.-H. 2007. ML-KNN: A lazy learning approach to multi-label learning. Patt. Recog. 40, 7, 2038--2048. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Zhou, Z.-H. and Zhang, M.-L. 2007. Multi-instance multi-label learning with application to scene classification. Adv. Neural Inform. Process. Syst. 19. 1609--1616.Google ScholarGoogle Scholar

Index Terms

  1. A shared-subspace learning framework for multi-label classification

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!