skip to main content
research-article

Effective transfer tagging from image to video

Published:10 May 2013Publication History
Skip Abstract Section

Abstract

Recent years have witnessed a great explosion of user-generated videos on the Web. In order to achieve an effective and efficient video search, it is critical for modern video search engines to associate videos with semantic keywords automatically. Most of the existing video tagging methods can hardly achieve reliable performance due to deficiency of training data. It is noticed that abundant well-tagged data are available in other relevant types of media (e.g., images). In this article, we propose a novel video tagging framework, termed as Cross-Media Tag Transfer (CMTT), which utilizes the abundance of well-tagged images to facilitate video tagging. Specifically, we build a “cross-media tunnel” to transfer knowledge from images to videos. To this end, an optimal kernel space, in which distribution distance between images and video is minimized, is found to tackle the domain-shift problem. A novel cross-media video tagging model is proposed to infer tags by exploring the intrinsic local structures of both labeled and unlabeled data, and learn reliable video classifiers. An efficient algorithm is designed to optimize the proposed model in an iterative and alternative way. Extensive experiments illustrate the superiority of our proposal compared to the state-of-the-art algorithms.

References

  1. Belkin, M., Niyogi, P., and Sindhwani, V. 2006. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7, 2399--2434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Borgwardt, K. M., Gretton, A., Rasch, M. J., Kriegel, H.-P., Scholkopf, B., and Smola, A. J. 2006. Integrating structured biological data by kernel maximum mean discrepancy. Bioinf. 22, e49--e57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chua, T.-S., Tang, J., Hong, R., Li, H., Luo, Z., and Zheng, Y. 2009. Nus-Wide: A real-world web image database from national university of singapore. In Proceeedings of the ACM International Conference on Image and Video Retrieval (CIVR'09). 48:1--48:9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cortes, C., Mohri, M., and Rostamizadeh, A. 2009. L2 regularization for learning kernels. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI'09). 109--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Dai, W., Yang, Q., Xue, G., and Yu, Y. 2007. Boosting for transfer learning. In Proceedings of the International Conference on Machine Learning (ICML'07). 193--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Duan, L., Xu, D., Tsang, I. W.-H., and Luo, J. 2010. Visual event recognition in videos by learning from web data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10). 1959--1966.Google ScholarGoogle Scholar
  7. Fan, J., Shen, Y., Zhou, N., and Gao, Y. 2010. Harvesting large-scale weakly-tagged image databases from the web. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recogntion (CVPR'10). 802--809.Google ScholarGoogle Scholar
  8. Grant, M. and Boyd, S. 2011. CVX: Matlab software for disciplined convex programming, version 1.21. http://cvxr.com/cvx/.Google ScholarGoogle Scholar
  9. Huiskes, M. J. and Lew, M. S. 2008. The mir flickr retrieval evaluation. In Proceedings of the ACM International Conference on Multimedia Information Retrieval (MIR'08). 39--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jiang, W., Zavesky, E., Chang, S., and Loui, A. 2008. Cross-Domain learning methods for high-level visual concept classification. In Proceedings of the International Conference on Image Processing (ICIP'08). 161--164.Google ScholarGoogle Scholar
  11. Jiang, Y.-G., Ngo, C.-W., and Chang, S.-F. 2009a. Semantic context transfer across heterogeneous sources for domain adaptive video search. In Proceedings of the ACM Multimedia Conference. 155--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jiang, Y.-G., Wang, J., Chang, S.-F., and Ngo, C.-W. 2009b. Domain adaptive semantic diffusion for large scale context-based video annotation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'09). 1420--1427.Google ScholarGoogle Scholar
  13. Liu, X., Yao, H., Ji, R., Xu, P., Sun, X., and Tian, Q. 2011. Learning heterogeneous data for hierarchical web video classification. In Proceedings of the ACM Multimedia Conference. 433--442. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Loui, A. C., Chang, S.-F., Ellis, D., Jiang, W., Kennedy, L., Lee, K., and Yanagawa, A. 2008. Kodak's consumer video benchmark data set: concept definition and annotation. http://www.ee.columbia.edu/~wjiang/references/datamir07.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ojala, T., Pietikainen, M., and Harwood, D. 1996. A comparative study of texture measures with classification based on featured distributions. Pattern Recogn. 29, 51--59.Google ScholarGoogle ScholarCross RefCross Ref
  16. Pan, S. J., Kwok, J. T., and Yang, Q. 2008. Transfer learning via dimensionality reduction. In Proceedings of the AAAI Conference on Artificial Intelligence. 677--682. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Rakotomamonjy, A., Bach, F. R., Canu, S., and Grandvalet, Y. 2008. Simplemkl. J. Mach. Learn. Res. 9, 2491--2521.Google ScholarGoogle Scholar
  18. Rockafellar, R. and Roger, J. 2005. Variational Analysis. Springer.Google ScholarGoogle Scholar
  19. Tang, J., Hua, X.-S., Qi, G.-J., Song, Y., and Wu, X. 2008. Video annotation based on kernel linear neighborhood propagation. IEEE Trans. Multimedia 10, 4, 620--628. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Tang, J., Hua, X.-S., Qi, G.-J., Wang, M., Mei, T., and Wu, X. 2007. Structure-Sensitive manifold ranking for video concept detection. In Proceedings of the ACM Multimedia Conference. 852--861. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Tang, J., Yan, S., Hong, R., Qi, G.-J., and Chua, T.-S. 2009. Inferring semantic concepts from community-contributed images and noisy tags. In Proceedings of the ACM Multimedia Conference. 223--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Torralba, A., Fergus, R., and Freeman, W. 2008. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30, 11, 1958--1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Trecvid. 2007. Trec video retrieval evaluation. http://www.nlpir.nist.gov/projects/trecvid.Google ScholarGoogle Scholar
  24. Wang, M., Hong, R., Li, G., Zha, Z., Yan, S., and Chua, T. 2011. Event driven web video summarization by tag localization and key-shot identification. IEEE Trans. Multimedia 14, 99, 1--1.Google ScholarGoogle Scholar
  25. Wang, M., Hua, X., Mei, T., Hong, R., Qi, G., Song, Y., and Dai, L. 2009a. Semi-Supervised kernel density estimation for video annotation. J. Comput. Vis. Image Understand. 113, 3, 384--396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Wang, M., Hua, X., Tang, J., and Hong, R. 2009b. Beyond distance measurement: Constructing neighborhood similarity for video annotation. IEEE Trans. Multimedia 11, 3, 465--476. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Wang, M, Hua, X.-S., Hong, R., Tang, J., Qi, G.-J., and Song, Y. 2009c. Unified video annotation via multigraph learning. IEEE Trans. Circ. Syst. Video Technol. 19, 5, 733--746. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Wang, M., Yang, K., Hua, X., and Zhang, H. 2010. Towards a relevant and diverse search of social images. IEEE Trans. Multimedia 12, 8, 829--842. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yang, J., Yan, R., and Hauptmann, A. G. 2007. Cross-Domain video concept detection using adaptive svms. In Proceedings of the ACM Multimedia Conference. 188--197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Yang, Y., Huang, Z., Shen, H. T., and Zhou, X. 2011a. Mining multi-tag association for image tagging. World Wide Web 14, 2, 133--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Yang, Y., Xu, D., Nie, F., Luo, J., and Zhuang, Y. 2009. Ranking with local regression and global alignment for cross media retrieval. In Proceedings of the ACM Multimedia Conference. 175--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Yang, Y., Yang, Y., Huang, Z., and Ma, Z. 2012. Robust cross-media transfer for visual event detection. In Proceedings of the ACM Multimedia Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yang, Y., Yang, Y., Huang, Z., and Shen, H. 2011b. Transfer tagging from image to video. In Proceedings of the ACM Multimedia Conference. 1137--1140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yang, Y., Yang, Y., Huang, Z., Shen, H., and Nie, F. 2011c. Tag localization with spatial correlations and joint group sparsity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'11). 881--888. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Yao, Y. and Doretto, G. 2010. Boosting for transfer learning with multiple sources. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10). 1855--1862.Google ScholarGoogle Scholar
  36. Zha, Z.-J., Hua, X.-S., Mei, T., Wang, J., Qi, G.-J., and Wang, Z. 2008. Joint multi-label multi-instance learning for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08). 1--8.Google ScholarGoogle Scholar
  37. Zha, Z.-J., Wang, M., Zheng, Y.-T., Yang, Y., Hong, R., and Chua, T.-S. 2012. Interactive video indexing with statistical active learning. IEEE Trans. Multimedia 14, 1, 17--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Zha, Z.-J., Yang, L., Mei, T., Wang, M., and Wang, Z. 2009. Visual query suggestion. In Proceedings of the ACM Multimedia Conference. 15--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Zha, Z.-J., Yang, L., Mei, T., Wang, M., Wang, Z., Chua, T.-S., and Hua, X.-S. 2010. Visual query suggestion: Towards capturing user intent in internet image search. ACM Trans. Multimedia Comput. Comm. Appl. 6, 3, 1--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Zhu, X. 2008. Semi-Supervised learning literature survey. http://pages.cs.wisc.edu/~jerryzhu/pub/ssl_survey_7_19_2008.pdf.Google ScholarGoogle Scholar
  41. Zhu, X., Huang, Z., and Shen, H. T. 2011a. Video-to-Shot tag allocation by weighted sparse group lasso. In Proceedings of the ACM Multimedia Conference. 1501--1504. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Zhu, Y., Chen, Y., Lu, Z., Pan, S., Xue, G., Yu, Y., and Yang, Q. 2011b. Heterogeneous transfer learning for image classification. In Proceedings of the AAAI Conference on Artificial Intelligence. 1304--1309.Google ScholarGoogle Scholar

Index Terms

  1. Effective transfer tagging from image to video

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 9, Issue 2
        May 2013
        144 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/2457450
        Issue’s Table of Contents

        Copyright © 2013 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 May 2013
        • Accepted: 1 February 2013
        • Revised: 1 August 2012
        • Received: 1 June 2012
        Published in tomm Volume 9, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!