skip to main content
research-article

Label-to-region with continuity-biased bi-layer sparsity priors

Authors Info & Claims
Published:30 November 2012Publication History
Skip Abstract Section

Abstract

In this work, we investigate how to reassign the fully annotated labels at image level to those contextually derived semantic regions, namely Label-to-Region (L2R), in a collective manner. Given a set of input images with label annotations, the basic idea of our approach to L2R is to first discover the patch correspondence across images, and then propagate the common labels shared in image pairs to these correlated patches. Specially, our approach consists of following aspects. First, each of the input images is encoded as a Bag-of-Hierarchical-Patch (BOP) for capturing the rich cues at variant scales, and the individual patches are expressed by patch-level feature descriptors. Second, we present a sparse representation formulation for discovering how well an image or a semantic region can be robustly reconstructed by all the other image patches from the input image set. The underlying philosophy of our formulation is that an image region can be sparsely reconstructed with the image patches belonging to the other images with common labels, while the robustness in label propagation across images requires that these selected patches come from very few images. This preference of being sparse at both patch and image level is named bi-layer sparsity prior. Meanwhile, we enforce the preference of choosing larger-size patches in reconstruction, referred to as continuity-biased prior in this work, which may further enhance the reliability of L2R assignment. Finally, we harness the reconstruction coefficients to propagate the image labels to the matched patches, and fuse the propagation results over all patches to finalize the L2R task. As a by-product, the proposed continuity-biased bi-layer sparse representation formulation can be naturally applied to perform image annotation on new testing images. Extensive experiments on three public image datasets clearly demonstrate the effectiveness of our proposed framework in both L2R assignment and image annotation.

References

  1. Bertsekas, D. 1999. Nonlinear Programming. Athena Scientific.Google ScholarGoogle Scholar
  2. Candes, E., Romberg, J., and Tao, T. 2006. Stable signal recovery from incomplete and inaccurate measurements. Comm. Pure Appl. Mat. 59, 8, 1207--1223.Google ScholarGoogle ScholarCross RefCross Ref
  3. Cao, L. and Li, F. 2007. Spatially coherent latent topic model for concurrent object segmentation and classification. In Proceedings of the IEEE International Conference on Computer Vision. 1--8.Google ScholarGoogle Scholar
  4. Chen, Y., Zhu, L., Yuille, A., and Zhang, H. 2008. Unsupervised learning of probabilistic object models (poms) for object classification, segmentation and recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--8.Google ScholarGoogle Scholar
  5. Chua, T., Tang, J., Hong, R., Li, H., Luo, Z., and Zheng, Y. 2009. Nus-wide: A real-world web image database from national university of singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval. 8--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Comite, F., Gilleron, R., and Tommasi, M. 2003. Learning multi-label altenating decision tree from texts and data. In Proceedings of the Machine Learning and Data Mining in Pattern Recognition, Lecture Notes in Computer Science, vol. 2734. 251--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Datta, R., Joshi, D., Li, J., and Wang, J. Z. 2008. Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40, 2, 1--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Elisseef, A. and Weston, J. 2001. A kernel method for multi-labelled classification. In Proceedings of the Advances in Neural Information Processing Systems. 681--687.Google ScholarGoogle Scholar
  9. Fan, R., Chen, P., and Lin, C. 2005. Working set selection using the second order information for training svm. J. Mach. Learn. Res. 6, 1889--1918. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Felzenszwalb, P. and Huttenlocher, D. 2004. Efficient graph-based image segmentation. Int. J. Comput. Vis. 59, 2, 167--181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Feng, S., Manmatha, R., and Lavrenko, V. 2004. Multiple bernoulli relevance models for image and video annotation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1002--1009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Fergus, R., Lee, F., Perona, P., and Zisserman, A. 2005. Learning object categories from google's image search. In Proceedings of the IEEE International Conference on Computer Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Forsyth, D. and Fleck, M. 1997. Body plans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 678--683. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Fu, W. 1998. Penalized regression: The bridge versus the lasso. J. Comput. Graph. Statist. 7, 397--416.Google ScholarGoogle Scholar
  15. Galleguillos, C., Rabinovich, A., and Belongie, S. 2008. Object categorization using co-occurrence, location and appearance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--8.Google ScholarGoogle Scholar
  16. Gu, C., Lim, J., Arbelaez, P., and Malik, J. 2009. Recognition using regions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1030--1037.Google ScholarGoogle Scholar
  17. Haering, N., Myles, Z., and Lobo, N. 1997. Locating dedicuous trees. In Proceedings of the IEEE Workshop on Contentbased Access of Image and Video Libraries. 18--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jacob, L., Obozinski, G., and Vert, J.-P. 2009. Group lasso with overlap and graph lasso. In Proceedings of the International Conference on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jeon, J., Lavrenko, V., and Manmatha, R. 2003. Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. 119--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jin, R., Chai, J., and Si, L. 2004. Effective automatic image annotation via a coherent language model and active learning. In Proceedings of the ACM International Conference on Multimedia. 892--899. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kang, F., Jin, R., and Sukthankar, R. 2006. Correlated label propagation with application to multi-label learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1719--1726. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Lavrenko, V., Manmatha, R., and Jeon, J. 2004. A model for learning the semantics of pictures. In Proceedings of the Advances in Neural Information Processing Systems. 553--560.Google ScholarGoogle Scholar
  23. Leibe, B., Leonardis, A., and Schiele, B. 2004. Combined object categorization and segmentation with an implicit shape model. In Proceedings of the ECCV Workshop on Statistical Learning in Computer Vision. 17--32.Google ScholarGoogle Scholar
  24. Li, L., Socher, R., and Li, F. 2009. Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2036--2043.Google ScholarGoogle Scholar
  25. Liu, C., Yuen, J., and Torralba, A. 2010a. Sift flow: Dense correspondence across scenes and its applications. Pattern Anal. Mach. Intell. 99, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Liu, J., Wang, B., Li, M., Li, Z., Ma, W., Lu, H., and Ma, S. 2007. Dual cross-media relevance model for image annotation. In Proceedings of the ACM International Conference on Multimedia. 605--614. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Liu, X., Cheng, B., Yan, S., Tang, J., Chua, T., and Jin, H. 2009. Label to region by bi-layer sparsity priors. In Proceedings of the ACM International Conference on Multimedia. 115--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Liu, X., Feng, J., Yan, S., and Jin, H. 2010b. Image segmentation with patch-pair density priors. In Proceedings of the ACM International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lowe, D. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2, 91--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Nesterov, Y. 2007. Gradient methods for minimizing composite objective function. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  31. Olshausen, B. and Field, D. 1997. Sparse coding with an overcomplete basis set: A strategy employed by v1? Vis. Res. 37, 23, 3311--3325.Google ScholarGoogle ScholarCross RefCross Ref
  32. Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., and Belongie, S. 2007. Objects in context. In Proceedings of the IEEE International Conference on Computer Vision. 1--8.Google ScholarGoogle Scholar
  33. Russell, B., Freeman, W., Efros, A., and Zisserman, A. 2006. Using multiple segmentations to discover objects and their extent in image collections. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1605--1614. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Serre, T., Wolf, L., and Poggio, T. 2005. Object recognition with features inspired by visual cortex. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Shen, Y. and Fan, J. 2010. Leveraging loosely-tagged images and inter-object correlations for tag recommendation. In Proceedings of the ACM Multimedia'10. 5--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Shotton, J., Winn, J., Rother, C., and Criminisi, A. 2006. Textonboost: Joint appearance, shape and context modeling for mulit-class object recognition and segmentation. In Proceedings of the European Conference on Computer Vision. 1--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Singhal, A., Luo, J., and Zhu, W. 2003. Probabilistic spatial context models for scene content understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 18--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Szummer, M. and Picard, R. 1998. Indoor-outdoor image classification. In Proceedings of the IEEE International Workshop on Content-Based Access to Image and Video Databases. 42--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. J. Roy. Statist. Soci. B 58, 267--288.Google ScholarGoogle ScholarCross RefCross Ref
  40. Tseng, P. 2008. On accelerated proximal gradient methods for convex-concave optimization. submitted to SIAM J. Optimiz.Google ScholarGoogle Scholar
  41. Winn, J. and Jojic, N. 2005. Locus: Learning object classes with unsupervised segmentation. In Proceedings of the IEEE International Conference on Computer Vision. 756--773. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Wright, J., Ganesh, A., Rao, S., Peng, Y., and Ma, Y. 2009a. Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization. J. ACM.Google ScholarGoogle Scholar
  43. Wright, J., Yang, A., Ganesh, A., Sastry, S., and Ma, Y. 2009b. Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31, 2, 210--227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Yan, S. and Wang, H. 2009. Semi-supervised learning by sparse representation. In Proceedings of the SIAM International Conference on Data Mining. 792--801.Google ScholarGoogle Scholar
  45. Yang, J., Yu, K., Gong, Y., and Huang, T. 2000. Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  46. Yuan, J., Li, J., and Zhang, B. 2007. Exploiting spatial context constraints for automatic image region annotation. In Proceedings of the ACM International Conference on Multimedia. 595--604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Yuan, X. and Yan, S. 2010. Visual classification with multi-task joint sparse representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  48. Zhang, J. 2006. A probabilistic framework for multi-task learning. Tech. rep., CMU-LTI-06-006.Google ScholarGoogle Scholar
  49. Zhang, M. and Zhou, Z. 2007. Ml-knn: A lazy learning approach to multi-label learning. Pattern Recogn. 40, 7, 2038--2048. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Label-to-region with continuity-biased bi-layer sparsity priors

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 8, Issue 4
      November 2012
      139 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/2379790
      Issue’s Table of Contents

      Copyright © 2012 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 November 2012
      • Accepted: 1 June 2011
      • Revised: 1 May 2011
      • Received: 1 January 2011
      Published in tomm Volume 8, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!