skip to main content
research-article

Integrating Social and Auxiliary Semantics for Multifaceted Topic Modeling in Twitter

Authors Info & Claims
Published:17 December 2014Publication History
Skip Abstract Section

Abstract

Microblogging platforms, such as Twitter, have already played an important role in recent cultural, social and political events. Discovering latent topics from social streams is therefore important for many downstream applications, such as clustering, classification or recommendation. However, traditional topic models that rely on the bag-of-words assumption are insufficient to uncover the rich semantics and temporal aspects of topics in Twitter. In particular, microblog content is often influenced by external information sources, such as Web documents linked from Twitter posts, and often focuses on specific entities, such as people or organizations. These external sources provide useful semantics to understand microblogs and we generally refer to these semantics as auxiliary semantics. In this article, we address the mentioned issues and propose a unified framework for Multifaceted Topic Modeling from Twitter streams. We first extract social semantics from Twitter by modeling the social chatter associated with hashtags. We further extract terms and named entities from linked Web documents to serve as auxiliary semantics during topic modeling. The Multifaceted Topic Model (MfTM) is then proposed to jointly model latent semantics among the social terms from Twitter, auxiliary terms from the linked Web documents and named entities. Moreover, we capture the temporal characteristics of each topic. An efficient online inference method for MfTM is developed, which enables our model to be applied to large-scale and streaming data. Our experimental evaluation shows the effectiveness and efficiency of our model compared with state-of-the-art baselines. We evaluate each aspect of our framework and show its utility in the context of tweet clustering.

References

  1. F. Abel, Q. Gao, G.-J. Houben, and K. Tao. 2011. Semantic enrichment of twitter posts for user profile construction on the social web. In Proceedings of the Extended Semantic Web Conference (ESWC'11), 375--389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. AlSumait, D. Barbara, and C. Domeniconi. 2008. On-line LDA: Adaptive topic models for mining text streams with applications to topic detection and tracking. In Proceedings of the IEEE International Conference on Data Mining (ICDM'11), 3--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. H. Becker, M. Naaman, and L. Gravano. 2010. Learning similarity metrics for event identification in social media. In Proceedings of the International Conference on Web Search and Data Mining (WSDM'10). 291--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. I. Celik, F. Abel, and G.-J. Houben. 2011. Learning semantic relationships between entities in Twitter. In Proceedings of the International Conference on Web Engineering (ICWE'11). 167--181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Duan, L. Jiang, T. Qin, M. Zhou, and H.-Y. Shum. 2010. An empirical study on learning to rank of tweets. In Proceedings of the International Conference on Computational Linguistics (COLING'10). 295--303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Efron. 2010. Hashtag retrieval in a microblogging environment. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'10). 787--788. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Efron and G. Golovchinsky. 2011. Estimation Methods for Ranking Recent Information. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'11). 495--504. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Ester, H. P. Kriegel, J. Sander, and X. Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'96). 226--231.Google ScholarGoogle Scholar
  10. Y. Genc, Y. Sakamoto, and J. V. Nickerson. 2011. Discovering context: Classifying tweets through a semantic transform based on Wikipedia. In Foundations of Augmented Cognition. Directing the Future of Adaptive Systems, 484--492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Groh, F. Straub, J. Eicher, and D. Grob. 2013. Geographic aspects of tie strength and value of information in social networking. In Proceedings of the Location-Based Social Networks Workshop. 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Hoffman, C. Wang, and J. Paisley. 2013. Stochastic Variational Inference. J. Mach. Learn. Res. 14, 1, 1303--1347. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Hong and B. D. Davison. 2010. Empirical study of topic modeling in Twitter. In Proceedings of the Social Media Analytics Workshop. 80--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Huang, K. M. Thornton, and E. N. Efthimiadis. 2010. Conversational tagging in Twitter. In Proceedings of the 21st ACM Conference on Hypertext and Hypermedia (HT'10). 173--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. O. Jin, N. Liu, K. Zhao, Y. Yu, and Q. Yang. 2011. Transferring topical knowledge from auxiliary long texts for short text clustering. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'11). 775--784. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Jo and A. H. Oh. 2011. Aspect and sentiment unification model for online review analysis. In Proceedings of the International Conference on Web Search and Data Mining (WSDM'11). 815--824. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. H. Lau, N. Collier, and T. Baldwin. 2012. On-line trend analysis with topic models: #twitter trends detection topic model online. In Proceedings of the International Conference on Computational Linguistics. 1519--1534.Google ScholarGoogle Scholar
  18. C. D. Manning, P. Raghavan, and H. Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Newman, C. Chemudugunta, and P. Smyth. 2006. Statistical entity-topic models. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'06). 680--686. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Paul and R. Girju. 2010. A two-dimensional topic-aspect model for discovering multi-faceted topics. In Proceedings of the National Conference on Artificial Intelligence (AAAI'10). 545--550.Google ScholarGoogle Scholar
  21. A. Ritter, S. Clark, Mausam, and O. Etzioni. 2011. Named entity recognition in tweets: An experimental study. In Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP'11). 1524--1534. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. D. Rosa, R. Shah, B. Lin, A. Gershman, and R. Frederking. 2010. Topical clustering of tweets. In Proceedings of the ACM SIGIR Special Interest Group on Information Retrieval's 3rd Workshop on Social Web Search and Mining (SWSM'10).Google ScholarGoogle Scholar
  23. M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. 2004. The author-topic model for authors and documents. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (AUAI'04). 487--494. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. O. Tsur and A. Rappoport. 2013. Efficient clustering of short messages into general domains. In Proceedings of the International Conference on Weblogs and Social Media (ICWSM'13). 621--630.Google ScholarGoogle Scholar
  25. J. Vosecky, D. Jiang, K. W. T. Leung, and W. Ng. 2013. Dynamic multi-faceted topic discovery in Twitter. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'13). 879--884. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Vosecky. 2014. Online Appendix to: Integrating Social and Auxiliary Semantics for Multi-Faceted Topic Modeling in Twitter. http://www.cse.ust.hk/_wilfred/mftm.html.Google ScholarGoogle Scholar
  27. B. Walsh. 2004. Markov chain Monte Carlo and Gibbs sampling. Lecture Notes, MIT.Google ScholarGoogle Scholar
  28. X. Wang and A. McCallum. 2006. Topics over time: A non-Markov continuous-time model of topical trends. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'06). 424--433. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. X. Wang, F. Wei, X. Liu, M. Zhou, and M. Zhang. 2011. Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'11). 1031--1040. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Weng, E.-P. Lim, Q. He, and C. W.-K. Leung. 2010. What do people want in microblogs? Measuring interestingness of hashtags in twitter. In Proceedings of the IEEE International Conference on Data Mining (ICDM'10). 1121--1126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. W. Zhao, J. Jiang, J.Weng, J. He, E.-P. Lim, H. Yan, and X. Li. 2011. Comparing twitter and traditional media using topic models. In Proceedings of the 33rd European Conference on IR Research. Lecture Notes in Computer Science, vol. 6611, 338--349, Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Integrating Social and Auxiliary Semantics for Multifaceted Topic Modeling in Twitter

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Internet Technology
          ACM Transactions on Internet Technology  Volume 14, Issue 4
          Special Issue on Foundations of Social Computing
          December 2014
          143 pages
          ISSN:1533-5399
          EISSN:1557-6051
          DOI:10.1145/2699996
          • Editor:
          • Munindar P. Singh
          Issue’s Table of Contents

          Copyright © 2014 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 December 2014
          • Accepted: 1 July 2014
          • Revised: 1 April 2014
          • Received: 1 November 2013
          Published in toit Volume 14, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!