skip to main content
research-article

Attribute-Augmented Semantic Hierarchy: Towards a Unified Framework for Content-Based Image Retrieval

Published:01 October 2014Publication History
Skip Abstract Section

Abstract

This article presents a novel attribute-augmented semantic hierarchy (A2SH) and demonstrates its effectiveness in bridging both the semantic and intention gaps in content-based image retrieval (CBIR). A2SH organizes semantic concepts into multiple semantic levels and augments each concept with a set of related attributes. The attributes are used to describe the multiple facets of the concept and act as the intermediate bridge connecting the concept and low-level visual content. An hierarchical semantic similarity function is learned to characterize the semantic similarities among images for retrieval. To better capture user search intent, a hybrid feedback mechanism is developed, which collects hybrid feedback on attributes and images. This feedback is then used to refine the search results based on A2SH. We use A2SH as a basis to develop a unified content-based image retrieval system. We conduct extensive experiments on a large-scale dataset of over one million Web images. Experimental results show that the proposed A2SH can characterize the semantic affinities among images accurately and can shape user search intent quickly, leading to more accurate search results as compared to state-of-the-art CBIR solutions.

References

  1. C. F. Baker, C. J. Fillmore, and J. B. Lowe. 1998. The Berkeley FrameNet project. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Belkin and P. Niyogi. 2003. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computat. 15, 6, 1373--1396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Binder, K.-R. Müller, and M. Kawanabe. 2012. On taxonomies for multi-class image categorization. Int. J. Comput. Vision 99, 3, 281--301. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Y. Boureau, N. Le Roux, F. Bach, J. Ponce, and Y. LeCun. 2011. Ask the locals: Multi-way local pooling for image recognition. In Proceedings of the International Conference on Computer Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Crucianu, M. Ferecatu, and N. Boujemaa. 2004. Relevance feedback for image retrieval: A short survey. DELOS2 Report.Google ScholarGoogle Scholar
  6. R. Datta, D. Joshi, J. Li, and J. Wang. 2008. Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40, 2, Article 50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Deng, A. C. Berg, and F.-F. Li. 2011. Hierarchical semantic indexing for large scale image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Deng, A. C. Berg, K. Li, and F.-F. Li. 2010. What does classifying more than 10,000 image categories tell us? In Proceedings of the European Conference on Computer Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F.-F. Li. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  10. T. Deselaers and V. Ferrari. 2011. Visual and semantic similarity in ImageNet. In Proceedings of the IEEE Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Douze, A. Ramisa, and C. Schmid. 2011. Combining attributes and fisher vectors for efficient image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth. 2009. Describing objects by their attributes. In Proceedings of the IEEE Conference on Computer Vision and Patter Recognition.Google ScholarGoogle Scholar
  13. C. Fellbaum. 2010. WordNet. In Theory and Applications of Ontology: Computer Applications. Springer.Google ScholarGoogle Scholar
  14. G. Griffin and P. Perona. 2008. Learning and using taxonomies for fast visual categorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  15. A. Jaimes and S.-F. Chang. 2000. A conceptual framework for indexing visual information at multiple levels. Proc. SPIE 3964.Google ScholarGoogle Scholar
  16. A. Kovashka, D. Parikh, and K. Grauman. 2012. WhittleSearch: Image search with relative attribute feedback. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. Kumar, A. Berg, P. Belhumeur, and S. Nayar. 2011. Describable visual attributes for face verification and image search. IEEE Trans. Pattern Anal. Mach. Intell. 33, 10, 1962--1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. S. Lew, N. Sebe, C. Djeraba, and R. Jain. 2006. Content-based multimedia information retrieval: State of the art and challenges. ACM Trans. Multimedia Comput. Commun. Appl. 2, 1--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Z. Ma, Y. Yang, Z. Xu, S. Yan, N. Sebe, and A. G. Hauptmann. 2012. Complex event detection via multi-source video attributes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Marszalek and C. Schmid. 2007. Semantic hierarchies for visual object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  21. M. Naphade, J. R. Smith, J. Tesic, S.-F. Chang, W. Hsu, L. Kennedy, A. Hauptmann, and J. Curtis. 2006. Large-scale concept ontology for multimedia. IEEE Multimedia 13, 3, 86--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Over, G. Awad, M. Michel, J. Fiscus, G. Sanders, B. Shaw, W. Kraaij, A. F. Smeaton, and G. Quenot. 2012. TRECVID 2012 -- An overview of the goals, tasks, data, evaluation mechanisms and metrics. In Proceedings of the TRECVID Conference.Google ScholarGoogle Scholar
  23. D. Parikh and K. Grauman. 2011a. Interactively building a discriminative vocabulary of nameable attributes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Parikh and K. Grauman. 2011b. Relative attributes. In Proceedings of the IEEE International Conference on Computer Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Y. Rui, T. S. Huang, and S.-F. Chang. 1999. Image retrieval: Current techniques, promising directions, and open issues. J. Visual Commun. Image Represent. 10, 1, 39--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Rui, T. S. Huang, M. Ortega, and S. Mehrotra. 1998. Relevance feedback: A power tool for interactive content-based image retrieval. IEEE Trans. Circuits Syst. Video Techno. 8, 5, 644--655. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. O. Russakovsky and F.-F. Li. 2010. Attribute learning in large-scale datasets. In Trends and Topics in Computer Vision. Lecture Notes in Computer Science, vol. 6553. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. W. J. Scheirer, N. Kumar, P. N. Belhumeur, and T. E. Boult. 2012. Multi-attribute spaces: Calibration for attribute fusion and similarity search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. 2000. Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 22, 12, 1349--1380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. R. Smith and S.-F. Chang. 1997. VisualSeek: A fully automated content-based image query system. In Proceedings of the ACM International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Song, M. Zhao, J. Yagnik, and X. Wu. 2010. Taxonomic classification for web-based videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  32. D. Tao, X. Tang, X. Li, and X. Wu. 2006. Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 28, 7, 1088--1099. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Tong and E. Chang. 2001. Support vector machine active learning for image retrieval. In Proceedings of the ACM International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. N. Verma, D. Mahajan, S. Sellamanickam, and V. Nair. 2012. Learning hierarchical similarity metrics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. 2010. Locality-constrained linear coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  36. K. Q. Weinberger, J. Blitzer, and L. K. Saul. 2006. Distance metric learning for large margin nearest neighbor classification. In Proceedings of the 20th Annual Conference on Neural Information Processing Systems.Google ScholarGoogle Scholar
  37. C. Yang, M. Dong, and F. Fotouhi. 2005. Semantic feedback for interactive image retrieval. In Proceedings of the ACM International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. F. X. Yu, L. Cao, R. S. Feris, J. R. Smith, and S.-F. Chang. 2013. Designing category-level attributes for discriminative visual recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Z.-J. Zha, X.-S. Hua, T. Mei, J. Wang, G.-J. Qi, and Z. Wang. 2008. Joint multi-label multi-instance learning for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  40. Z.-J. Zha, W. Meng, Y.-T. Zheng, Y. Yang, R. Hong, and T.-S. Chua. 2012. Interactive video indexing with statistical active learning. IEEE Trans. Multimedia 14, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Z.-J. Zha, L. Yang, T. Mei, M. Wang, and Z. Wang. 2009. Visual query suggestion. In Proceedings of the ACM International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Z.-J. Zha, L. Yang, T. Mei, M. Wang, Z. Wang, T.-S. Chua, and X.-S. Hua. 2010. Visual query suggestion: Towards capturing user intent in internet image search. ACM Trans. Multimedia Comput. Commun. Appl. 6, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. H. Zhang, Z.-J. Zha, S. Yan, J. Bian, and T.-S. Chua. 2012. Attribute feedback. In Proceedings of the ACM International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. H. Zhang, Z.-J. Zha, Y. Yang, S. Yan, Y. Gao, and T.-S. Chua. 2013. Attribute-augmented semantic hierarchy: Towards bridging semantic gap and intention gap in image retrieval. In Proceedings of the ACM International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. K. Zhang, I. W. Tsang, and J. T. Kwok. 2009. Maximum margin clustering made practical. IEEE Trans. Neural Netw. 20, 4, 583--596. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Attribute-Augmented Semantic Hierarchy: Towards a Unified Framework for Content-Based Image Retrieval

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 11, Issue 1s
      Special Issue on Multiple Sensorial (MulSeMedia) Multimodal Media : Advances and Applications
      September 2014
      260 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/2675060
      Issue’s Table of Contents

      Copyright © 2014 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 October 2014
      • Accepted: 1 June 2014
      • Revised: 1 May 2014
      • Received: 1 February 2014
      Published in tomm Volume 11, Issue 1s

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!