Abstract
This article investigates how to automatically complete the missing labels for the partially annotated images, without image segmentation. The label completion procedure is formulated as a nonnegative data factorization problem, to decompose the global image representations that are used for describing the entire images, for instance, various image feature descriptors, into their corresponding label representations, that are used for describing the local semantic regions within images. The solution provided in this work is motivated by following observations. First, label representations of the regions with the same label often share certain commonness, yet may be essentially different due to the large intraclass variations. Thus, each label or concept should be represented by using a subspace spanned by an ensemble of basis, instead of a single one, to characterize the intralabel diversities. Second, the subspaces for different labels are different from each other. Third, while two images are similar with each other, the corresponding label representations should be similar. We formulate this cross-image context as well as the given partial label annotations in the framework of nonnegative data factorization and then propose an efficient multiplicative nonnegative update rules to alternately optimize the subspaces and the reconstruction coefficients. We also provide the theoretic proof of algorithmic convergence and correctness. Extensive experiments over several challenging image datasets clearly demonstrate the effectiveness of our proposed solution in boosting the quality of image label completion and image annotation accuracy. Based on the same formulation, we further develop a label ranking algorithms, to refine the noised image labels without any manual supervision. We compare the proposed label ranking algorithm with the state-of-the-arts over the popular evaluation databases and achieve encouragingly improvements.
- Ahonen, T., Hadid, A., and Pietikäinen, M. 2006. Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intel. 28, 12, 2037--2041. Google Scholar
Digital Library
- Belhumeur, P., Hespanha, J., and Kriegman, D. 2002. Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intel. 711--720. Google Scholar
Digital Library
- Belkin, M. and Niyogi, P. 2001. Laplacian eigenmaps and spectral techniques for embedding and clustering. Adv. Neural Inform. Process. Syst.Google Scholar
- Boutell, M., Luo, J., Shen, X., and Brown, C. 2004. Learning multilabel scene classification. Pattern Recog. 37, 9, 1757--1771.Google Scholar
Cross Ref
- Chen, Y., Bi, J., and Wang, J. 2006. Miles: Multiple-instance learning via embedded instance selection. IEEE Trans. Pattern Anal. Mach. Intel. 28, 12, 1931--1947. Google Scholar
Digital Library
- Chua, T., Tang, J., Hong, R., Li, H., Luo, Z., and Zheng, Y. 2009. Nus-wide: A real-world web image database from National University of Singapore. In Proceedings of the ACM Conference on Image and Video Retrieval. Google Scholar
Digital Library
- Dalal, N. and Triggs, B. 2009. Histogram of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google Scholar
Digital Library
- Fan, R., Chen, P., and Lin, C. 2005. Working set selection using the second order information for training svm. J. Mach. Learn. Resear. 6, 1889--1918. Google Scholar
Digital Library
- Godbole, S. and Sarawagi, S. 2004. Discriminative methods for multi-labeled classification. In Advances in Knowledge Discovery and Data Mining, 22--30.Google Scholar
- He, X., Yan, S., Hu, Y., Niyogi, P., and Zhang, H. 2005. Face recognition using laplacianfaces. IEEE Trans. Pattern Anal. Mach. Intel. 27, 3, 328--340. Google Scholar
Digital Library
- Hyvarinen, A., Karhunen, J., and Oja, E. 1999. Survey on independent component analysis. Neural Comput. Surv. 2, 94--138.Google Scholar
- J. Shotton, J. Winn, C. R. and Criminisi, A. 2006. Textonboost: Joint appearance, shape and context modeling for mulit-class object recognition and segmentation. In Proceedings of the European Conference on Computer Vision. 1--15. Google Scholar
Digital Library
- Kang, F., Jin, R., and Sukthankar, R. 2006. Correlated label propagation with application to multi-label learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1719--1726. Google Scholar
Digital Library
- Kuhn, H. and Tucker, A. 1951. Nonlinear programming. In Proceedings of the 2nd Berkeley Symposium.Google Scholar
- Lee, D. and Seung, H. 1999. Learning the parts of objects by nonnegative matrix factorization. Nature 401, 788--791.Google Scholar
Cross Ref
- Lee, D. and Seung, H. 2001. Algorithms for non-negative matrix factorization. Adv. Neural Inform. Process. Syst. 556--562.Google Scholar
- Liu, D., Hua, X., Yand, L., Wang, M., and Zhang, H. 2010a. Tag ranking. In Proceedings of the International World Wide Web Conference. 180--187. Google Scholar
Digital Library
- Liu, X., Yan, S., and Jin, H. 2010b. Projective nonnegative graph embedding. Trans. Image Process. 19, 5, 1126--1137. Google Scholar
Digital Library
- Liu, X., Yan, S., Yan, J., and Jin, H. 2009. Unified solution to nonnegative data factorization problems. In Proceedings of the IEEE Conference on Data Mining. 307--316. Google Scholar
Digital Library
- Lowe, D. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 2, 60, 91--110. Google Scholar
Digital Library
- Oliva, A. and Torralba, A. 2001. Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vision 42, 3, 145--175. Google Scholar
Digital Library
- Rattenbury, T., Good, N., and Naaman, M. 2007. Towards automatic extraction of event and place semantics from flickr tags. In Proceedings of the ACM Special Interest Group on Information Retrieval. Google Scholar
Digital Library
- Roweis, S. and Saul, L. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290, 22, 2323--2326.Google Scholar
Cross Ref
- Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A., and Williamson, R. 2001. Estimating the support of a high-dimensional distribution. Neural Comput. 13, 7, 1443--1472. Google Scholar
Digital Library
- Shi, J. and Malik, J. 1997. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intel. 22, 888--905. Google Scholar
Digital Library
- Sigurbjornsson, B. and Zwol, R. 2008. Flickr tag recommendation based on collective knowl- edge. In Proceedings of the International World Wide Web Conference. 327--336. Google Scholar
Digital Library
- Tenenbaum, J., Silva, V., and Langford, J. 2000. A global geometric framework for nonlinear dimensionality reduction. Science 290, 22, 2319--2323.Google Scholar
Cross Ref
- Tu, Z., Chen, X., Yuille, A., and Zhu, S. 2005. Image parsing: unifying segmentation, detection and recognition. Int. J. Comput. Vision. Google Scholar
Digital Library
- Turk, M. and Pentland, A. 1991. Eigenfaces for recognition. Cognition Neurosci. 3, 71--86. Google Scholar
Digital Library
- Wang, C., Song, Z., Yan, S., Zhang, L., and Zhang, H. 2009. Multiplicative nonnegative graph embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- Wang, Z., Feng, J., Zhang, C., and Yan, S. 2010. Learning to rank tags. In Proceedings of the ACM Conference on Image and Video Retrieval. Google Scholar
Digital Library
- Xu, X. and Frank, E. 2004. Logistic regression and boosting for labeled bags of instances. Adv. Knowl. Discov. Data Mining 3056. 272--281.Google Scholar
- Yan, S., Xu, D., Zhang, B., Yang, Q., Zhang, H., and Lin, S. 2007. Graph embedding and extensions: A general framework for dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intel. 29, 1, 40--51. Google Scholar
Digital Library
- Yang, J., Yan, S., Li, X., and Huang, T. 2008. Nonnegative graph embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- Yuan, J., Li, J., and Zhang, B. 2007. Exploiting spatial context constraints for automatic image region annotation. In Proceedings of the ACM Conference on Multimedia. 595--604. Google Scholar
Digital Library
- Zhang, M. and Zhou, Z. 2007. Ml-knn: A lazy learning approach to multi-label learning. Pattern Recog. 40, 7, 2038--2048. Google Scholar
Digital Library
- Zhou, D., Weston, J., Gretton, A., Bousquet, O., and Schoelkopf, B. 2004. Ranking on data manifolds. Adv. Neural Inform. Process. Syst. 169--176.Google Scholar
- Zhou, Z. and Zhang, M. 2007. Multi-instance multi-label learning with application to scene classification. Adv. Neural Inform. Process. Syst. 1609--1616.Google Scholar
Index Terms
Image label completion by pursuing contextual decomposability
Recommendations
Multi-label learning with missing labels for image annotation and facial action unit recognition
Many problems in computer vision, such as image annotation, can be formulated as multi-label learning problems. It is typically assumed that the complete label assignment for each training image is available. However, this is often not the case in ...
Recurrent Image Annotation with Explicit Inter-label Dependencies
Computer Vision – ECCV 2020AbstractInspired by the success of the CNN-RNN framework in the image captioning task, several works have explored this in multi-label image annotation with the hope that the RNN followed by a CNN would encode inter-label dependencies better than using a ...
Multi-label linear discriminant analysis
ECCV'10: Proceedings of the 11th European conference on Computer vision: Part VIMulti-label problems arise frequently in image and video annotations, and many other related applications such as multi-topic text categorization, music classification, etc. Like other computer vision tasks, multi-label image and video annotations also ...






Comments