Abstract
Microblogging platforms, such as Twitter, have already played an important role in recent cultural, social and political events. Discovering latent topics from social streams is therefore important for many downstream applications, such as clustering, classification or recommendation. However, traditional topic models that rely on the bag-of-words assumption are insufficient to uncover the rich semantics and temporal aspects of topics in Twitter. In particular, microblog content is often influenced by external information sources, such as Web documents linked from Twitter posts, and often focuses on specific entities, such as people or organizations. These external sources provide useful semantics to understand microblogs and we generally refer to these semantics as auxiliary semantics. In this article, we address the mentioned issues and propose a unified framework for Multifaceted Topic Modeling from Twitter streams. We first extract social semantics from Twitter by modeling the social chatter associated with hashtags. We further extract terms and named entities from linked Web documents to serve as auxiliary semantics during topic modeling. The Multifaceted Topic Model (MfTM) is then proposed to jointly model latent semantics among the social terms from Twitter, auxiliary terms from the linked Web documents and named entities. Moreover, we capture the temporal characteristics of each topic. An efficient online inference method for MfTM is developed, which enables our model to be applied to large-scale and streaming data. Our experimental evaluation shows the effectiveness and efficiency of our model compared with state-of-the-art baselines. We evaluate each aspect of our framework and show its utility in the context of tweet clustering.
- F. Abel, Q. Gao, G.-J. Houben, and K. Tao. 2011. Semantic enrichment of twitter posts for user profile construction on the social web. In Proceedings of the Extended Semantic Web Conference (ESWC'11), 375--389. Google Scholar
Digital Library
- L. AlSumait, D. Barbara, and C. Domeniconi. 2008. On-line LDA: Adaptive topic models for mining text streams with applications to topic detection and tracking. In Proceedings of the IEEE International Conference on Data Mining (ICDM'11), 3--12. Google Scholar
Digital Library
- H. Becker, M. Naaman, and L. Gravano. 2010. Learning similarity metrics for event identification in social media. In Proceedings of the International Conference on Web Search and Data Mining (WSDM'10). 291--300. Google Scholar
Digital Library
- D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022. Google Scholar
Digital Library
- I. Celik, F. Abel, and G.-J. Houben. 2011. Learning semantic relationships between entities in Twitter. In Proceedings of the International Conference on Web Engineering (ICWE'11). 167--181. Google Scholar
Digital Library
- Y. Duan, L. Jiang, T. Qin, M. Zhou, and H.-Y. Shum. 2010. An empirical study on learning to rank of tweets. In Proceedings of the International Conference on Computational Linguistics (COLING'10). 295--303. Google Scholar
Digital Library
- M. Efron. 2010. Hashtag retrieval in a microblogging environment. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'10). 787--788. Google Scholar
Digital Library
- M. Efron and G. Golovchinsky. 2011. Estimation Methods for Ranking Recent Information. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'11). 495--504. Google Scholar
Digital Library
- M. Ester, H. P. Kriegel, J. Sander, and X. Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'96). 226--231.Google Scholar
- Y. Genc, Y. Sakamoto, and J. V. Nickerson. 2011. Discovering context: Classifying tweets through a semantic transform based on Wikipedia. In Foundations of Augmented Cognition. Directing the Future of Adaptive Systems, 484--492. Google Scholar
Digital Library
- G. Groh, F. Straub, J. Eicher, and D. Grob. 2013. Geographic aspects of tie strength and value of information in social networking. In Proceedings of the Location-Based Social Networks Workshop. 1--10. Google Scholar
Digital Library
- M. Hoffman, C. Wang, and J. Paisley. 2013. Stochastic Variational Inference. J. Mach. Learn. Res. 14, 1, 1303--1347. Google Scholar
Digital Library
- L. Hong and B. D. Davison. 2010. Empirical study of topic modeling in Twitter. In Proceedings of the Social Media Analytics Workshop. 80--88. Google Scholar
Digital Library
- J. Huang, K. M. Thornton, and E. N. Efthimiadis. 2010. Conversational tagging in Twitter. In Proceedings of the 21st ACM Conference on Hypertext and Hypermedia (HT'10). 173--177. Google Scholar
Digital Library
- O. Jin, N. Liu, K. Zhao, Y. Yu, and Q. Yang. 2011. Transferring topical knowledge from auxiliary long texts for short text clustering. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'11). 775--784. Google Scholar
Digital Library
- Y. Jo and A. H. Oh. 2011. Aspect and sentiment unification model for online review analysis. In Proceedings of the International Conference on Web Search and Data Mining (WSDM'11). 815--824. Google Scholar
Digital Library
- J. H. Lau, N. Collier, and T. Baldwin. 2012. On-line trend analysis with topic models: #twitter trends detection topic model online. In Proceedings of the International Conference on Computational Linguistics. 1519--1534.Google Scholar
- C. D. Manning, P. Raghavan, and H. Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press. Google Scholar
Digital Library
- D. Newman, C. Chemudugunta, and P. Smyth. 2006. Statistical entity-topic models. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'06). 680--686. Google Scholar
Digital Library
- M. Paul and R. Girju. 2010. A two-dimensional topic-aspect model for discovering multi-faceted topics. In Proceedings of the National Conference on Artificial Intelligence (AAAI'10). 545--550.Google Scholar
- A. Ritter, S. Clark, Mausam, and O. Etzioni. 2011. Named entity recognition in tweets: An experimental study. In Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP'11). 1524--1534. Google Scholar
Digital Library
- K. D. Rosa, R. Shah, B. Lin, A. Gershman, and R. Frederking. 2010. Topical clustering of tweets. In Proceedings of the ACM SIGIR Special Interest Group on Information Retrieval's 3rd Workshop on Social Web Search and Mining (SWSM'10).Google Scholar
- M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. 2004. The author-topic model for authors and documents. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (AUAI'04). 487--494. Google Scholar
Digital Library
- O. Tsur and A. Rappoport. 2013. Efficient clustering of short messages into general domains. In Proceedings of the International Conference on Weblogs and Social Media (ICWSM'13). 621--630.Google Scholar
- J. Vosecky, D. Jiang, K. W. T. Leung, and W. Ng. 2013. Dynamic multi-faceted topic discovery in Twitter. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'13). 879--884. Google Scholar
Digital Library
- J. Vosecky. 2014. Online Appendix to: Integrating Social and Auxiliary Semantics for Multi-Faceted Topic Modeling in Twitter. http://www.cse.ust.hk/_wilfred/mftm.html.Google Scholar
- B. Walsh. 2004. Markov chain Monte Carlo and Gibbs sampling. Lecture Notes, MIT.Google Scholar
- X. Wang and A. McCallum. 2006. Topics over time: A non-Markov continuous-time model of topical trends. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'06). 424--433. Google Scholar
Digital Library
- X. Wang, F. Wei, X. Liu, M. Zhou, and M. Zhang. 2011. Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'11). 1031--1040. Google Scholar
Digital Library
- J. Weng, E.-P. Lim, Q. He, and C. W.-K. Leung. 2010. What do people want in microblogs? Measuring interestingness of hashtags in twitter. In Proceedings of the IEEE International Conference on Data Mining (ICDM'10). 1121--1126. Google Scholar
Digital Library
- W. Zhao, J. Jiang, J.Weng, J. He, E.-P. Lim, H. Yan, and X. Li. 2011. Comparing twitter and traditional media using topic models. In Proceedings of the 33rd European Conference on IR Research. Lecture Notes in Computer Science, vol. 6611, 338--349, Springer. Google Scholar
Digital Library
Index Terms
Integrating Social and Auxiliary Semantics for Multifaceted Topic Modeling in Twitter
Recommendations
A biterm topic model for short texts
WWW '13: Proceedings of the 22nd international conference on World Wide WebUncovering the topics within short texts, such as tweets and instant messages, has become an important task for many content analysis applications. However, directly applying conventional topic models (e.g. LDA and PLSA) on such short texts may not work ...
Twitter Opinion Topic Model: Extracting Product Opinions from Tweets by Leveraging Hashtags and Sentiment Lexicon
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge ManagementAspect-based opinion mining is widely applied to review data to aggregate or summarize opinions of a product, and the current state-of-the-art is achieved with Latent Dirichlet Allocation (LDA)-based model. Although social media data like tweets are ...
User-sentiment topic model: refining user's topics with sentiment information
MDS '12: Proceedings of the ACM SIGKDD Workshop on Mining Data SemanticsIn large social networks, users feel free to share their feelings about anything they are interested in and many research works have focused on modeling users' interests on social network for product recommendations or personal services. Unfortunately, ...






Comments