skip to main content
research-article

A feature-word-topic model for image annotation and retrieval

Published:30 September 2013Publication History
Skip Abstract Section

Abstract

Image annotation is a process of finding appropriate semantic labels for images in order to obtain a more convenient way for indexing and searching images on the Web. This article proposes a novel method for image annotation based on combining feature-word distributions, which map from visual space to word space, and word-topic distributions, which form a structure to capture label relationships for annotation. We refer to this type of model as Feature-Word-Topic models. The introduction of topics allows us to efficiently take word associations, such as {ocean, fish, coral} or {desert, sand, cactus}, into account for image annotation. Unlike previous topic-based methods, we do not consider topics as joint distributions of words and visual features, but as distributions of words only. Feature-word distributions are utilized to define weights in computation of topic distributions for annotation. By doing so, topic models in text mining can be applied directly in our method. Our Feature-word-topic model, which exploits Gaussian Mixtures for feature-word distributions, and probabilistic Latent Semantic Analysis (pLSA) for word-topic distributions, shows that our method is able to obtain promising results in image annotation and retrieval.

References

  1. Andrews, S., Tsochantaridis, I., and Hofmann, T. 2003. Support vector machines for multiple-instance learning. In Proceedings of Advances in Neural Information Processing Systems (NIPS'03). MIT Press, 561--568.Google ScholarGoogle Scholar
  2. Blei, D. M. and Jordan, M. I. 2003. Modeling annotated data. In Proceedings of the 26th Annual International Conference on Research and Development in Information Retrieval (SIGIR'03). ACM, 127--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Blei, D. M. and Lafferty, J. 2007. A correlated topic model of science. Ann. Appl. Statist. 1, 17--35.Google ScholarGoogle ScholarCross RefCross Ref
  4. Blei, D. M., Ng, A., and Jordan, M. I. 2003. Latent Dirichlet allocation. J. Machine Learn. Res. 3, 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bunescu, R. C. and Mooney, R. J. 2007. Multiple instance learning for sparse positive bags. In Proceedings of the 24th International Conference on Machine Learning (ICML'07). ACM, New York, 105--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Carneiro, G., Chan, A. B., Moreno, P. J., and Vasconcelos, N. 2007. Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 29, 3, 394--410. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Datta, R., Joshi, D., Li, J., and Wang, J. Z. 2008. Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40, 2, 1--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Deselaers, T., Keysers, D., and Ney, H. 2008. Features for image retrieval: an experimental comparison. Inf. Retriev. 11, 77--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dietterich, T. G., Lathrop, R. H., and Lozano-Prez, T. 1997. Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 89, 31--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Duygulu, P., Barnard, K., de Freitas, J. F. G., and Forsyth, D. A. 2002. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Proceedings of the 7th European Conference on Computer Vision (ECCV'02), Part IV. Springer, 97--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Feng, S. L., Manmatha, R., and Lavrenko, V. 2004. Multiple Bernoulli relevance models for image and video annotation. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ghamrawi, N. and Mccallum, A. 2005. Collective multi-label classification. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM'05). ACM, New York, 195--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Guo, Y. and Gu, S. 2011. Multi-label classification using conditional dependency networks. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI'11). 1300--1305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Hare, J. S., Samangooei, S., Lewis, P. H., and Nixon, M. S. 2008. Semantic spaces revisited: Investigating the performance of auto-annotation and semantic retrieval using semantic spaces. In Proceedings of the International Conference on Content-Based Image and Video Retrieval (CIVR'08). ACM, New York, 359--368. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hofmann, T. 2001. Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42, 1--2, 177--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Hörster, E., Lienhart, R., and Slaney, M. 2007. Image retrieval on large-scale image databases. In Proceedings of the 6th ACM International Conference on Image and Video Retrieval (CIVR'07). ACM, New York, 17--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hörster, E., Lienhart, R., and Slaney, M. 2008. Continuous visual vocabulary models for PLSA-based scene recognition. In Proceedings of the International Conference on Content-Based Image and Video Retrieval (CIVR'08). ACM, New York, 319--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jeon, J., Lavrenko, V., and Manmatha, R. 2004. Automatic image annotation of news images with large vocabularies and low quality training data. In Proceedings of the 12th Annual ACM International Conference on Multimedia.Google ScholarGoogle Scholar
  19. Jin, R., Chai, J. Y., and Si, L. 2004. Effective automatic image annotation via a coherent language model and active learning. In Proceedings of the 12th Annual ACM International Conference on Multimedia. ACM, New York, 892--899. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jin, Y., Khan, L., Wang, L., and Awad, M. 2005. Image annotations by combining multiple evidence & wordnet. In Proceedings of the 13th Annual ACM International Conference on Multimedia. ACM, New York, 706--715. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Lavrenko, V., Manmatha, R., and Jeon, J. 2003. A model for learning the semantics of pictures. In Advances in Neural Information Processing Systems. MIT Press.Google ScholarGoogle Scholar
  22. Lienhart, R., Romberg, S., and Hörster, E. 2009. Multilayer PLSA for multimodal image retrieval. In Proceeding of the ACM International Conference on Image and Video Retrieval (CIVR'09). ACM, New York, 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Lin, H.-T., Lin, C.-J., and Weng, R. C. 2007. A note on platt's probabilistic outputs for support vector machines. Mach. Learn. 68, 3, 267--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Liu, J., Wang, B., Lu, H., and Ma, S. 2008. A graph-based image annotation framework. Pattern Recognit. Lett. 29, 4, 407--415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Liu, X.-Y., Wu, J., and Zhou, Z.-H. 2006. Exploratory under-sampling for class-imbalance learning. In Proceedings of the 6th International Conference on Data Mining (ICDM'06). IEEE, 965--969. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Makadia, A., Pavlovic, V., and Kumar, S. 2008. A new baseline for image annotation. In Proceedings of the 10th European Conference on Computer Vision (ECCV'08). Springer, 316--329. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Monay, F. and Gatica-Perez, D. 2004. PLSA-based image auto-annotation: constraining the latent space. In Proceedings of the 12th annual ACM International Conference on Multimedia. ACM, New York, 348--351. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Monay, F. and Gatica-Perez, D. 2007. Modeling semantic aspects for cross-media image indexing. IEEE Trans. Pattern Anal. Mach. Intell. 29, 10, 1802--1817. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Müller, H., Clough, P., Deselaers, T., and Caputo, B. 2010. ImageCLEF. Experimental Evaluation of Visual Information Retrieval. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Nguyen, C.-T., Kaothanthong, N., Phan, X.-H., and Tokuyama, T. 2010. A feature-word-topic model for image annotation. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM'10). ACM, New York, 1481--1484. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Nguyen, C.-T., Phan, X.-H., Horiguchi, S., Nguyen, T.-T., and Ha, Q.-T. 2009. Web search clustering and labeling with hidden topics. ACM Trans. Asian Lang. Inform. Process. 8, 3, 1--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Nowak, S., Nagel, K., and Liebetrau, J. 2011. The clef 2011 photo annotation and concept-based retrieval tasks: Clef working notes 2011. In Proceedings of the CLEF Conference on Multilingual and Multimodal Information Access Evaluation.Google ScholarGoogle Scholar
  33. Phan, X.-H., Nguyen, C.-T., Le, D.-T., Nguyen, L.-M., Horiguchi, S., and Ha, Q. 2010. A hidden topic-based framework towards building applications with short web documents. IEEE Trans. Knowl. Data Eng. 99, 1--1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Phan, X.-H., Nguyen, L.-M., and Horiguchi, S. 2008. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In Proceeding of the 17th International Conference on World Wide Web (WWW'08). ACM, New York, 91--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Qi, G.-J., Hua, X.-S., Rui, Y., Tang, J., Mei, T., and Zhang, H.-J. 2007. Correlative multi-label video annotation. In Proceedings of the 15th International Conference on Multimedia. ACM, New York, 17--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Schölkopf, B., Burges, C. J. C., and Smola, A. J. 1999. Advances in Kernel Methods: Support Vector Learning. MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  37. Smeaton, A. F., Over, P., and Kraaij, W. 2006. Evaluation campaigns and TRECVID. In Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval (MIR'06). ACM Press, New York, 321--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., and Jain, R. 2000. Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 22, 12, 1349--1380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Snoek, C. G. M. and Worring, M. 2009. Concept-based video retrieval. Found. Trends Inf, Retriev, 2, 4, 215--322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Stathopoulos, V. and Jose, J. M. 2009. Bayesian mixture hierarchies for automatic image annotation. In Proceedings of the 31st European Conference on IR Research on Advances in Information Retrieval (ECIR'09). Springer, 138--149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Torralba, A., Murphy, K. P., and Freeman, W. T. 2010. Using the forest to see the trees: Exploiting context for visual object detection and localization. Comm. ACM 53, 3, 107--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Vasconselos, N. 2001. Image indexing with mixture hierarchies. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3--10.Google ScholarGoogle ScholarCross RefCross Ref
  43. Wang, C., Blei, D., and Li, F.-F. 2009. Simultaneous image classification and annotation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1903--1910.Google ScholarGoogle Scholar
  44. Wang, Y. and Gong, S. 2007. Refining image annotation using contextual relations between words. In Proceedings of the 6th ACM International Conference on Image and Video Retrieval (CIVR'07). ACM, New York, 425--432. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Wu, T.-F., Lin, C.-J., and Weng, R. C. 2004. Probability estimates for multi-class classification by pairwise coupling. J. Mach. Learn. Res. 5, 975--1005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Zha, Z.-J., Hua, X.-S., Mei, T., Wang, J., Qi, G.-J., and Wang, Z. 2008. Joint multi-label multi-instance learning for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08). 1--8.Google ScholarGoogle Scholar
  47. Zhang, M.-L. and Zhang, K. 2010. Multi-label learning by exploiting label dependency. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'10). ACM, New York, 999--1008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Zhang, Z. and Zhang, R. 2009. Multimedia Data Mining. Chapman & Hall/CRC Press.Google ScholarGoogle Scholar
  49. Zhou, Z.-H. and Zhang, M.-L. 2006. Multi-instance multi-label learning with application to scene classification. In Advances in Neural Information Processing Systems 19, 1609--1616.Google ScholarGoogle Scholar

Index Terms

  1. A feature-word-topic model for image annotation and retrieval

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!