skip to main content
research-article

Bilateral Correspondence Model for Words-and-Pictures Association in Multimedia-Rich Microblogs

Published:04 July 2014Publication History
Skip Abstract Section

Abstract

Nowadays, the amount of multimedia contents in microblogs is growing significantly. More than 20% of microblogs link to a picture or video in certain large systems. The rich semantics in microblogs provides an opportunity to endow images with higher-level semantics beyond object labels. However, this raises new challenges for understanding the association between multimodal multimedia contents in multimedia-rich microblogs. Disobeying the fundamental assumptions of traditional annotation, tagging, and retrieval systems, pictures and words in multimedia-rich microblogs are loosely associated and a correspondence between pictures and words cannot be established. To address the aforementioned challenges, we present the first study analyzing and modeling the associations between multimodal contents in microblog streams, aiming to discover multimodal topics from microblogs by establishing correspondences between pictures and words in microblogs. We first use a data-driven approach to analyze the new characteristics of the words, pictures, and their association types in microblogs. We then propose a novel generative model called the Bilateral Correspondence Latent Dirichlet Allocation (BC-LDA) model. Our BC-LDA model can assign flexible associations between pictures and words and is able to not only allow picture-word co-occurrence with bilateral directions, but also single modal association. This flexible association can best fit the data distribution, so that the model can discover various types of joint topics and generate pictures and words with the topics accordingly. We evaluate this model extensively on a large-scale real multimedia-rich microblogs dataset. We demonstrate the advantages of the proposed model in several application scenarios, including image tagging, text illustration, and topic discovery. The experimental results demonstrate that our proposed model can significantly and consistently outperform traditional approaches.

References

  1. K. Barnard, P. Duygulu, D. Forsyth, N. De Freitas, D. M. Blei, and M. I. Jordan. 2003. Matching words and pictures. J. Mach. Learn. Res. 3, 1107--1135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. M. Blei and M. I. Jordan. 2003. Modeling annotated data. In Proceedings of the 26th Annual ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR'03). ACM Press, New York, 127--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. M. Blei and J. D. Lafferty. 2007. A correlated topic model of science. Ann. Appl. Statist. 1, 1, 17--35.Google ScholarGoogle ScholarCross RefCross Ref
  4. D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Casella and E. I. George. 1992. Explaining the Gibbs sampler. Amer. Statist. 46, 3, 167--174.Google ScholarGoogle Scholar
  6. T. Chen, D. Lu, M.-Y. Kan, and P. Cui. 2013. Understanding and classifying image tweets. In Proceedings of the 21st ACM International Conference on Multimedia. ACM Press, New York, 781--784. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. X. Chen, X. Hu, Y. An, Z. Xiong, T. He, and E. K. Park. 2011. Perspective hierarchical dirichlet process for user-tagged image modeling. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM Press, New York, 1341--1346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. X. Chen, X. Hu, Z. Zhou, C. Lu, G. Rosen, T. He, and E. K. Park. 2010. A probabilistic topic-connection model for automatic image annotation. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management. ACM Press, New York, 899--908. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. China Internet Watch Team Staff. 2011. Total WEIBO users: Sina v.s. Tencent. http://www.chinainternetwatch.com/1296/total-weibo-users-sina-tencent.Google ScholarGoogle Scholar
  10. T. S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng. 2009. Nus-wide: A real-world web image database from National University of Singapore. In Proceedings of the International Conference on Image and Video Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. Cui, C. Zhang, and G. Cong. 2010. Content-enriched classifier for web video classification. In Proceedings of the Annual ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR'10). 619--626. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09). 248--255.Google ScholarGoogle Scholar
  13. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. 2011. The Pascal visual object classes challenge 2011 (voc2011) results. http://www.pascalnetwork.org/challenges/VOC.Google ScholarGoogle Scholar
  14. R. Fagin, R. Kumar, and D. Sivakumar. 2003. Comparing top k lists. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 28--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. Fukumasu, K. Eguchi, and E. Xing. 2012. Symmetric correspondence topic models for multilingual text analysis. Adv. Neural Inf. Process. Syst. 25, 1295--1303.Google ScholarGoogle Scholar
  16. J. Jeon, V. Lavrenko, and R. Manmatha. 2003. Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the 26th Annual ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR'03). 119--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Jiang, P. Cui, R. Liu, Q. Yang, F. Wang, W. Zhu, and S. Yang. 2012. Social contextual recommendation. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM Press, New York, 45--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Joshi, J. Z. Wang, and J. Li. 2006. The story picturing engine—A system for automatic text illustration. ACM Trans. Multimedia Comput. Comm. Appl. 2, 1, 68--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. J. Li, C. Wang, Y. Lim, D. M. Blei, and L. Fei-Fei. 2010. Building and using a semantivisual image hierarchy. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10). 3336--3343.Google ScholarGoogle Scholar
  20. S. Liu, P. Cui, H. Luan, W. Zhu, S. Yang, and Q. Tian. 2014. Social-oriented visual image search. Comput. Vis. Image Understand. 118, 30--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the 7th IEEE International Conference on Computer Vision (ICCV'99). 1150--1157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. A. Miller. 1995. WordNet: A lexical database for English. Comm. ACM 38, 11, 39--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Miller. 2010. Twitter unveils new website with picture and video content embedded on site. http://www.engadget.com/2010/09/14/twitter-relaunches-main-site-with-content-embedded-on-site.Google ScholarGoogle Scholar
  24. F. Moosmann, B. Triggs, and F. Jurie. 2007. Fast discriminative visual codebooks using randomized clustering forests. In Proceedings of the 20th Annual Conference on Neural Information Processing Systems. 985--992.Google ScholarGoogle Scholar
  25. L. Nie, M. Wang, Z. Zha, G. Li, and T.-S. Chua. 2011. Multimedia answering: Enriching text QA with media information. In Proceedings of the 34th ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR'11). 695--704. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. T.-G. Noh, S.-B. Park, H.-G. Yoon, S.-J. Lee, and S.-Y. Park. 2009. An automatic translation of tags for multimedia contents using folksonomy networks. In Proceedings of the 32nd ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR'09). 492--499. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Ou, P. Cui, F. Wang, J. Wang, W. Zhu, and S. Yang. 2013. Comparing apples to oranges: A scalable solution with heterogeneous hashing. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, New York, 230--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. G. J. Qi, C. Aggarwal, and T. Huang. 2011. Towards semantic knowledge propagation from text corpus to web images. In Proceedings of the 20th International Conference on World Wide Web (WWW'11). ACM Press, New York, 297--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Z. Qi, M. Yang, Z. M. Zhang, and Z. Zhang. 2012. Multi-view learning from imperfect tagging. In Proceedings of the 20th ACM International Conference on Multimedia. ACM Press, New York, 479--488. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Y. Qu, C. Huang, P. Zhang, and J. Zhang. 2011. Microblogging after a major disaster in China: A case study of the 2010 Yushu earthquake. In Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW'11). ACM Press, New York, 25--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Ramage, S. Dumais, and D. Liebling. 2010. Characterizing microblogs with topic models. In Proceedings of the AAAI International Conference on Weblogs and Social Media. The AAAI Press.Google ScholarGoogle Scholar
  32. J. San Pedro, T. Yeh, and N. Oliver. 2012. Leveraging user comments for aesthetic aware image search reranking. In Proceedings of the 21st International Conference on World Wide Web (WWW'12). ACM Press, New York, 439--448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Shi, X. Sun, D. Tao, and C. Xu. 2012. Exploiting visual word co-occurrence for image retrieval. In Proceedings of the 20th ACM International Conference on Multimedia. ACM Press, New York, 69--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. B. Sigurbjornsson and R. Van Zwol. 2008. Flickr tag recommendation based on collective knowledge. In Proceedings of the 17th International Conference on World Wide Web (WWW'08). ACM Press, New York, 327--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Sivic and A. Zisserman. 2003. Video google: A text retrieval approach to object matching in videos. In Proceedings of the 9th IEEE International Conference on Computer Vision. 1470--1477. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. R. Sproat and T. Emerson. 2003. The first international chinese word segmentation bakeoff. In Proceedings of the SIGHAN Workshop on Chinese Language Processing. ACL, 133--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. R. Van Zwol and L. G. Pueyo. 2012. Spatially-aware indexing for image object retrieval. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining (WSDM'12). ACM Press, New York, 3--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Z. Wang, P. Cui, L. Xie, H. Chen, W. Zhu, and S. Yang. 2012. Analyzing social media via event facets. In Proceedings of the 20th ACM International Conference on Multimedia. ACM Press, New York, 1359--1360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. P. Wu, S. C. H. Hoi, P. Zhao, and Y. He. 2011. Mining social images with distance metric learning for automated image tagging. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM'11). ACM Press, New York, 197--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Y. Yang, P. Cui, W. Zhu, and S. Yang. 2013. User interest and social influence based emotion prediction for individuals. In Proceedings of the 21st ACM International Conference on Multimedia. ACM Press, New York, 785--788. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Bilateral Correspondence Model for Words-and-Pictures Association in Multimedia-Rich Microblogs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 10, Issue 4
      June 2014
      132 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/2656131
      Issue’s Table of Contents

      Copyright © 2014 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 July 2014
      • Accepted: 1 February 2014
      • Revised: 1 January 2014
      • Received: 1 August 2013
      Published in tomm Volume 10, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!