Abstract
Nowadays, the amount of multimedia contents in microblogs is growing significantly. More than 20% of microblogs link to a picture or video in certain large systems. The rich semantics in microblogs provides an opportunity to endow images with higher-level semantics beyond object labels. However, this raises new challenges for understanding the association between multimodal multimedia contents in multimedia-rich microblogs. Disobeying the fundamental assumptions of traditional annotation, tagging, and retrieval systems, pictures and words in multimedia-rich microblogs are loosely associated and a correspondence between pictures and words cannot be established. To address the aforementioned challenges, we present the first study analyzing and modeling the associations between multimodal contents in microblog streams, aiming to discover multimodal topics from microblogs by establishing correspondences between pictures and words in microblogs. We first use a data-driven approach to analyze the new characteristics of the words, pictures, and their association types in microblogs. We then propose a novel generative model called the Bilateral Correspondence Latent Dirichlet Allocation (BC-LDA) model. Our BC-LDA model can assign flexible associations between pictures and words and is able to not only allow picture-word co-occurrence with bilateral directions, but also single modal association. This flexible association can best fit the data distribution, so that the model can discover various types of joint topics and generate pictures and words with the topics accordingly. We evaluate this model extensively on a large-scale real multimedia-rich microblogs dataset. We demonstrate the advantages of the proposed model in several application scenarios, including image tagging, text illustration, and topic discovery. The experimental results demonstrate that our proposed model can significantly and consistently outperform traditional approaches.
- K. Barnard, P. Duygulu, D. Forsyth, N. De Freitas, D. M. Blei, and M. I. Jordan. 2003. Matching words and pictures. J. Mach. Learn. Res. 3, 1107--1135. Google Scholar
Digital Library
- D. M. Blei and M. I. Jordan. 2003. Modeling annotated data. In Proceedings of the 26th Annual ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR'03). ACM Press, New York, 127--134. Google Scholar
Digital Library
- D. M. Blei and J. D. Lafferty. 2007. A correlated topic model of science. Ann. Appl. Statist. 1, 1, 17--35.Google Scholar
Cross Ref
- D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022. Google Scholar
Digital Library
- G. Casella and E. I. George. 1992. Explaining the Gibbs sampler. Amer. Statist. 46, 3, 167--174.Google Scholar
- T. Chen, D. Lu, M.-Y. Kan, and P. Cui. 2013. Understanding and classifying image tweets. In Proceedings of the 21st ACM International Conference on Multimedia. ACM Press, New York, 781--784. Google Scholar
Digital Library
- X. Chen, X. Hu, Y. An, Z. Xiong, T. He, and E. K. Park. 2011. Perspective hierarchical dirichlet process for user-tagged image modeling. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM Press, New York, 1341--1346. Google Scholar
Digital Library
- X. Chen, X. Hu, Z. Zhou, C. Lu, G. Rosen, T. He, and E. K. Park. 2010. A probabilistic topic-connection model for automatic image annotation. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management. ACM Press, New York, 899--908. Google Scholar
Digital Library
- China Internet Watch Team Staff. 2011. Total WEIBO users: Sina v.s. Tencent. http://www.chinainternetwatch.com/1296/total-weibo-users-sina-tencent.Google Scholar
- T. S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng. 2009. Nus-wide: A real-world web image database from National University of Singapore. In Proceedings of the International Conference on Image and Video Retrieval. Google Scholar
Digital Library
- B. Cui, C. Zhang, and G. Cong. 2010. Content-enriched classifier for web video classification. In Proceedings of the Annual ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR'10). 619--626. Google Scholar
Digital Library
- J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09). 248--255.Google Scholar
- M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. 2011. The Pascal visual object classes challenge 2011 (voc2011) results. http://www.pascalnetwork.org/challenges/VOC.Google Scholar
- R. Fagin, R. Kumar, and D. Sivakumar. 2003. Comparing top k lists. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 28--36. Google Scholar
Digital Library
- K. Fukumasu, K. Eguchi, and E. Xing. 2012. Symmetric correspondence topic models for multilingual text analysis. Adv. Neural Inf. Process. Syst. 25, 1295--1303.Google Scholar
- J. Jeon, V. Lavrenko, and R. Manmatha. 2003. Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the 26th Annual ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR'03). 119--126. Google Scholar
Digital Library
- M. Jiang, P. Cui, R. Liu, Q. Yang, F. Wang, W. Zhu, and S. Yang. 2012. Social contextual recommendation. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM Press, New York, 45--54. Google Scholar
Digital Library
- D. Joshi, J. Z. Wang, and J. Li. 2006. The story picturing engine—A system for automatic text illustration. ACM Trans. Multimedia Comput. Comm. Appl. 2, 1, 68--89. Google Scholar
Digital Library
- L. J. Li, C. Wang, Y. Lim, D. M. Blei, and L. Fei-Fei. 2010. Building and using a semantivisual image hierarchy. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10). 3336--3343.Google Scholar
- S. Liu, P. Cui, H. Luan, W. Zhu, S. Yang, and Q. Tian. 2014. Social-oriented visual image search. Comput. Vis. Image Understand. 118, 30--39. Google Scholar
Digital Library
- D. G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the 7th IEEE International Conference on Computer Vision (ICCV'99). 1150--1157. Google Scholar
Digital Library
- G. A. Miller. 1995. WordNet: A lexical database for English. Comm. ACM 38, 11, 39--41. Google Scholar
Digital Library
- R. Miller. 2010. Twitter unveils new website with picture and video content embedded on site. http://www.engadget.com/2010/09/14/twitter-relaunches-main-site-with-content-embedded-on-site.Google Scholar
- F. Moosmann, B. Triggs, and F. Jurie. 2007. Fast discriminative visual codebooks using randomized clustering forests. In Proceedings of the 20th Annual Conference on Neural Information Processing Systems. 985--992.Google Scholar
- L. Nie, M. Wang, Z. Zha, G. Li, and T.-S. Chua. 2011. Multimedia answering: Enriching text QA with media information. In Proceedings of the 34th ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR'11). 695--704. Google Scholar
Digital Library
- T.-G. Noh, S.-B. Park, H.-G. Yoon, S.-J. Lee, and S.-Y. Park. 2009. An automatic translation of tags for multimedia contents using folksonomy networks. In Proceedings of the 32nd ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR'09). 492--499. Google Scholar
Digital Library
- M. Ou, P. Cui, F. Wang, J. Wang, W. Zhu, and S. Yang. 2013. Comparing apples to oranges: A scalable solution with heterogeneous hashing. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, New York, 230--238. Google Scholar
Digital Library
- G. J. Qi, C. Aggarwal, and T. Huang. 2011. Towards semantic knowledge propagation from text corpus to web images. In Proceedings of the 20th International Conference on World Wide Web (WWW'11). ACM Press, New York, 297--306. Google Scholar
Digital Library
- Z. Qi, M. Yang, Z. M. Zhang, and Z. Zhang. 2012. Multi-view learning from imperfect tagging. In Proceedings of the 20th ACM International Conference on Multimedia. ACM Press, New York, 479--488. Google Scholar
Digital Library
- Y. Qu, C. Huang, P. Zhang, and J. Zhang. 2011. Microblogging after a major disaster in China: A case study of the 2010 Yushu earthquake. In Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW'11). ACM Press, New York, 25--34. Google Scholar
Digital Library
- D. Ramage, S. Dumais, and D. Liebling. 2010. Characterizing microblogs with topic models. In Proceedings of the AAAI International Conference on Weblogs and Social Media. The AAAI Press.Google Scholar
- J. San Pedro, T. Yeh, and N. Oliver. 2012. Leveraging user comments for aesthetic aware image search reranking. In Proceedings of the 21st International Conference on World Wide Web (WWW'12). ACM Press, New York, 439--448. Google Scholar
Digital Library
- M. Shi, X. Sun, D. Tao, and C. Xu. 2012. Exploiting visual word co-occurrence for image retrieval. In Proceedings of the 20th ACM International Conference on Multimedia. ACM Press, New York, 69--78. Google Scholar
Digital Library
- B. Sigurbjornsson and R. Van Zwol. 2008. Flickr tag recommendation based on collective knowledge. In Proceedings of the 17th International Conference on World Wide Web (WWW'08). ACM Press, New York, 327--336. Google Scholar
Digital Library
- J. Sivic and A. Zisserman. 2003. Video google: A text retrieval approach to object matching in videos. In Proceedings of the 9th IEEE International Conference on Computer Vision. 1470--1477. Google Scholar
Digital Library
- R. Sproat and T. Emerson. 2003. The first international chinese word segmentation bakeoff. In Proceedings of the SIGHAN Workshop on Chinese Language Processing. ACL, 133--143. Google Scholar
Digital Library
- R. Van Zwol and L. G. Pueyo. 2012. Spatially-aware indexing for image object retrieval. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining (WSDM'12). ACM Press, New York, 3--12. Google Scholar
Digital Library
- Z. Wang, P. Cui, L. Xie, H. Chen, W. Zhu, and S. Yang. 2012. Analyzing social media via event facets. In Proceedings of the 20th ACM International Conference on Multimedia. ACM Press, New York, 1359--1360. Google Scholar
Digital Library
- P. Wu, S. C. H. Hoi, P. Zhao, and Y. He. 2011. Mining social images with distance metric learning for automated image tagging. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM'11). ACM Press, New York, 197--206. Google Scholar
Digital Library
- Y. Yang, P. Cui, W. Zhu, and S. Yang. 2013. User interest and social influence based emotion prediction for individuals. In Proceedings of the 21st ACM International Conference on Multimedia. ACM Press, New York, 785--788. Google Scholar
Digital Library
Index Terms
Bilateral Correspondence Model for Words-and-Pictures Association in Multimedia-Rich Microblogs
Recommendations
Understanding and analysing microblogs
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide WebMicroblogging is a form of blogging where posts typically consist of short content such as quick comments, phrases, URLs, or media, like images and videos. Because of the fast and compact nature of microblogs, users have adopted them for novel purposes, ...
Multimedia summarization for trending topics in microblogs
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementMicroblogging services have revolutionized the way people exchange information. Confronted with the ever-increasing numbers of microblogs with multimedia contents and trending topics, it is desirable to provide visualized summarization to help users to ...
Mining Significant Microblogs for Misinformation Identification: An Attention-Based Approach
Research Survey and Regular PapersWith the rapid growth of social media, massive misinformation is also spreading widely on social media, e.g., Weibo and Twitter, and brings negative effects to human life. Today, automatic misinformation identification has drawn attention from academic ...






Comments