skip to main content
research-article

Image label completion by pursuing contextual decomposability

Authors Info & Claims
Published:22 May 2012Publication History
Skip Abstract Section

Abstract

This article investigates how to automatically complete the missing labels for the partially annotated images, without image segmentation. The label completion procedure is formulated as a nonnegative data factorization problem, to decompose the global image representations that are used for describing the entire images, for instance, various image feature descriptors, into their corresponding label representations, that are used for describing the local semantic regions within images. The solution provided in this work is motivated by following observations. First, label representations of the regions with the same label often share certain commonness, yet may be essentially different due to the large intraclass variations. Thus, each label or concept should be represented by using a subspace spanned by an ensemble of basis, instead of a single one, to characterize the intralabel diversities. Second, the subspaces for different labels are different from each other. Third, while two images are similar with each other, the corresponding label representations should be similar. We formulate this cross-image context as well as the given partial label annotations in the framework of nonnegative data factorization and then propose an efficient multiplicative nonnegative update rules to alternately optimize the subspaces and the reconstruction coefficients. We also provide the theoretic proof of algorithmic convergence and correctness. Extensive experiments over several challenging image datasets clearly demonstrate the effectiveness of our proposed solution in boosting the quality of image label completion and image annotation accuracy. Based on the same formulation, we further develop a label ranking algorithms, to refine the noised image labels without any manual supervision. We compare the proposed label ranking algorithm with the state-of-the-arts over the popular evaluation databases and achieve encouragingly improvements.

References

  1. Ahonen, T., Hadid, A., and Pietikäinen, M. 2006. Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intel. 28, 12, 2037--2041. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Belhumeur, P., Hespanha, J., and Kriegman, D. 2002. Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intel. 711--720. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Belkin, M. and Niyogi, P. 2001. Laplacian eigenmaps and spectral techniques for embedding and clustering. Adv. Neural Inform. Process. Syst.Google ScholarGoogle Scholar
  4. Boutell, M., Luo, J., Shen, X., and Brown, C. 2004. Learning multilabel scene classification. Pattern Recog. 37, 9, 1757--1771.Google ScholarGoogle ScholarCross RefCross Ref
  5. Chen, Y., Bi, J., and Wang, J. 2006. Miles: Multiple-instance learning via embedded instance selection. IEEE Trans. Pattern Anal. Mach. Intel. 28, 12, 1931--1947. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chua, T., Tang, J., Hong, R., Li, H., Luo, Z., and Zheng, Y. 2009. Nus-wide: A real-world web image database from National University of Singapore. In Proceedings of the ACM Conference on Image and Video Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Dalal, N. and Triggs, B. 2009. Histogram of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Fan, R., Chen, P., and Lin, C. 2005. Working set selection using the second order information for training svm. J. Mach. Learn. Resear. 6, 1889--1918. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Godbole, S. and Sarawagi, S. 2004. Discriminative methods for multi-labeled classification. In Advances in Knowledge Discovery and Data Mining, 22--30.Google ScholarGoogle Scholar
  10. He, X., Yan, S., Hu, Y., Niyogi, P., and Zhang, H. 2005. Face recognition using laplacianfaces. IEEE Trans. Pattern Anal. Mach. Intel. 27, 3, 328--340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hyvarinen, A., Karhunen, J., and Oja, E. 1999. Survey on independent component analysis. Neural Comput. Surv. 2, 94--138.Google ScholarGoogle Scholar
  12. J. Shotton, J. Winn, C. R. and Criminisi, A. 2006. Textonboost: Joint appearance, shape and context modeling for mulit-class object recognition and segmentation. In Proceedings of the European Conference on Computer Vision. 1--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kang, F., Jin, R., and Sukthankar, R. 2006. Correlated label propagation with application to multi-label learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1719--1726. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kuhn, H. and Tucker, A. 1951. Nonlinear programming. In Proceedings of the 2nd Berkeley Symposium.Google ScholarGoogle Scholar
  15. Lee, D. and Seung, H. 1999. Learning the parts of objects by nonnegative matrix factorization. Nature 401, 788--791.Google ScholarGoogle ScholarCross RefCross Ref
  16. Lee, D. and Seung, H. 2001. Algorithms for non-negative matrix factorization. Adv. Neural Inform. Process. Syst. 556--562.Google ScholarGoogle Scholar
  17. Liu, D., Hua, X., Yand, L., Wang, M., and Zhang, H. 2010a. Tag ranking. In Proceedings of the International World Wide Web Conference. 180--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Liu, X., Yan, S., and Jin, H. 2010b. Projective nonnegative graph embedding. Trans. Image Process. 19, 5, 1126--1137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Liu, X., Yan, S., Yan, J., and Jin, H. 2009. Unified solution to nonnegative data factorization problems. In Proceedings of the IEEE Conference on Data Mining. 307--316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Lowe, D. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 2, 60, 91--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Oliva, A. and Torralba, A. 2001. Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vision 42, 3, 145--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Rattenbury, T., Good, N., and Naaman, M. 2007. Towards automatic extraction of event and place semantics from flickr tags. In Proceedings of the ACM Special Interest Group on Information Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Roweis, S. and Saul, L. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290, 22, 2323--2326.Google ScholarGoogle ScholarCross RefCross Ref
  24. Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A., and Williamson, R. 2001. Estimating the support of a high-dimensional distribution. Neural Comput. 13, 7, 1443--1472. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Shi, J. and Malik, J. 1997. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intel. 22, 888--905. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Sigurbjornsson, B. and Zwol, R. 2008. Flickr tag recommendation based on collective knowl- edge. In Proceedings of the International World Wide Web Conference. 327--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Tenenbaum, J., Silva, V., and Langford, J. 2000. A global geometric framework for nonlinear dimensionality reduction. Science 290, 22, 2319--2323.Google ScholarGoogle ScholarCross RefCross Ref
  28. Tu, Z., Chen, X., Yuille, A., and Zhu, S. 2005. Image parsing: unifying segmentation, detection and recognition. Int. J. Comput. Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Turk, M. and Pentland, A. 1991. Eigenfaces for recognition. Cognition Neurosci. 3, 71--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Wang, C., Song, Z., Yan, S., Zhang, L., and Zhang, H. 2009. Multiplicative nonnegative graph embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  31. Wang, Z., Feng, J., Zhang, C., and Yan, S. 2010. Learning to rank tags. In Proceedings of the ACM Conference on Image and Video Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Xu, X. and Frank, E. 2004. Logistic regression and boosting for labeled bags of instances. Adv. Knowl. Discov. Data Mining 3056. 272--281.Google ScholarGoogle Scholar
  33. Yan, S., Xu, D., Zhang, B., Yang, Q., Zhang, H., and Lin, S. 2007. Graph embedding and extensions: A general framework for dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intel. 29, 1, 40--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yang, J., Yan, S., Li, X., and Huang, T. 2008. Nonnegative graph embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  35. Yuan, J., Li, J., and Zhang, B. 2007. Exploiting spatial context constraints for automatic image region annotation. In Proceedings of the ACM Conference on Multimedia. 595--604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Zhang, M. and Zhou, Z. 2007. Ml-knn: A lazy learning approach to multi-label learning. Pattern Recog. 40, 7, 2038--2048. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Zhou, D., Weston, J., Gretton, A., Bousquet, O., and Schoelkopf, B. 2004. Ranking on data manifolds. Adv. Neural Inform. Process. Syst. 169--176.Google ScholarGoogle Scholar
  38. Zhou, Z. and Zhang, M. 2007. Multi-instance multi-label learning with application to scene classification. Adv. Neural Inform. Process. Syst. 1609--1616.Google ScholarGoogle Scholar

Index Terms

  1. Image label completion by pursuing contextual decomposability

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 8, Issue 2
      May 2012
      144 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/2168996
      Issue’s Table of Contents

      Copyright © 2012 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 May 2012
      • Accepted: 1 January 2011
      • Revised: 1 December 2010
      • Received: 1 August 2010
      Published in tomm Volume 8, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!