skip to main content
research-article

Discovery of latent subcommunities in a blog's readership

Published:20 July 2010Publication History
Skip Abstract Section

Abstract

The blogosphere has grown to be a mainstream forum of social interaction as well as a commercially attractive source of information and influence. Tools are needed to better understand how communities that adhere to individual blogs are constituted in order to facilitate new personal, socially-focused browsing paradigms, and understand how blog content is consumed, which is of interest to blog authors, big media, and search. We present a novel approach to blog subcommunity characterization by modeling individual blog readers using mixtures of an extension to the LDA family that jointly models phrases and time, Ngram Topic over Time (NTOT), and cluster with a number of similarity measures using Affinity Propagation. We experiment with two datasets: a small set of blogs whose authors provide feedback, and a set of popular, highly commented blogs, which provide indicators of algorithm scalability and interpretability without prior knowledge of a given blog. The results offer useful insight to the blog authors about their commenting community, and are observed to offer an integrated perspective on the topics of discussion and members engaged in those discussions for unfamiliar blogs. Our approach also holds promise as a component of solutions to related problems, such as online entity resolution and role discovery.

References

  1. Abbasi, A. and Chen, H. 2005. Applying authorship analysis to extremist-group web forum messages. IEEE Intel. Syst. 20, 5, 67--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Adams, B., Phung, D., and Venkatesh, S. 2009. Social reader: Following social networks in the wilds of the blogosphere. In Proceedings of the 1st International Workshop on Social Media (WSM), in Conjuction with ACM Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Adar, E., Zhang, L., Adamic, L., and Lukose, R. 2004. Implicit structure and the dynamics of blogspace. In Proceedings of the Workshop on the Weblogging Ecosystem at the 13th International World Wide Web Conference.Google ScholarGoogle Scholar
  4. Agarwal, N., Liu, H., Tang, L., and Yu, P. 2008. Identifying the influential bloggers in a community. In Proceedings of the International Conference on Web Search and Web Data Mining (WSDM '08). ACM, New York, NY, 207--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ali-Hasan, N. and Adamic, E. 2007. Expressing social relationships on the blog through links and comments. In Proceedings of the 1st International Conference on Weblogs and Social Media (ICWSM'07).Google ScholarGoogle Scholar
  6. Baumer, E., Sueyoshi, M., and Tomlinson, B. 2008. Exploring the role of the reader in the activity of blogging. In Proceedings of the 26th Annual SIGCHI Conference on Human Factors in Computing Systems (CHI'08). ACM, New York, NY, 1111--1120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Blei, D., Ng, A., and Jordan, M. 2003. Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chi, Y., Zhu, S., Song, X., Tatemura, J., and Tseng, B. 2007. Structural and temporal analysis of the blogosphere through community factorization. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'07). ACM, New York, NY, 163--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Culotta, A., Bekkerman, R., and McCallum, A. 2004. Extracting social networks and contact information from email and the web. In Proceedings of the 1st Conference on Email and Anti-Spam.Google ScholarGoogle Scholar
  10. Duarte, F., Mattos, B., Bestavros, A., Almeida, V., and Almeida, J. 2007. Traffic characteristics and communication patterns in blogosphere. In Proceedings of the 1st International Conference on Weblogs and Social Media (ICWSM'07).Google ScholarGoogle Scholar
  11. Frey, B. and Dueck, D. 2007. Clustering by passing messages between data points. Science 315, 972--976.Google ScholarGoogle ScholarCross RefCross Ref
  12. Furukawa, T., Matsuo, Y., and Matsuzawa, T. 2006. User's behavioral analysis on weblogs. In Proceedings of the AAAI Spring Symposia on Computational Approaches to Analyzing Weblogs.Google ScholarGoogle Scholar
  13. Getoor, L. and Diehl, C. 2005. Link mining: a survey. ACM SIGKDD Explor. Newslett. 7, 2, 3--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Gilks, W., Richardson, S., and Spiegelhalter, D. 1996. Markov Chain Monte Carlo in Practice. Chapman & Hall/CRC.Google ScholarGoogle Scholar
  15. Griffiths, T. and Steyvers, M. 2004. Finding scientific topics. Proc Natl Acad Sci 101 Suppl 1, 5228--5235.Google ScholarGoogle ScholarCross RefCross Ref
  16. Halpern, D. 2004. Social Capital. Polity.Google ScholarGoogle Scholar
  17. Herring, S., Kouper, I., Paolillo, J., Scheidt, L., Tyworth, M., Welsch, P., Wright, E., and Yu, N. 2005. Conversations in the blogosphere: An analysis “from the bottom up”. In Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05). Track 4. IEEE Computer Society, Los Alamitos, CA, 107.2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Herring, S., Scheidt, L., Bonus, S., and Wright, E. 2004. Bridging the gap: A genre analysis of weblogs. In Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS'04). Track 4. IEEE Computer Society, Los Alamitos, CA, 40101.2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hofmann, T. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '99). ACM, New York, NY, 50--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hsu, W., Lancaster, J., Paradesi, M., and Weninger, T. 2007. Structural link analysis from user profiles and friends networks: A feature construction approach. In Proceedings of the 1st International Conference on Weblogs and Social Media (ICWSM'07).Google ScholarGoogle Scholar
  21. Hu, M., Sun, A., and Lim, E.-P. 2007. Comments-oriented blog summarization by sentence extraction. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM'07). ACM, New York, NY, 901--904. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Krishnamurthy, S. 2002. The Multidimensionality of Blog Conversations: The Virtual Enactment of September 11. Internet Research 3. 0. Maastricht, Netherlands.Google ScholarGoogle Scholar
  23. Kumar, R., Novak, J., Raghavan, P., and Tomkins, A. 2005. On the bursty evolution of blogspace. World Wide Web 8, 2, 159--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lin, Y.-R., Sundaram, H., Chi, Y., Tatemura, J., and Tseng, B. 2006. Discovery of blog communities based on mutual awareness. In Proceedings of the World Wide Web Conference Workshop on Weblogging Ecosystem: Aggregation, Analysis and Dynamics.Google ScholarGoogle Scholar
  25. Makrehchi, M. and Kamel, M. 2006. Learning social networks from web documents using support vector classifiers. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI'06). IEEE Computer Society, Los Alamitos, CA, 88--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Marlow, C. 2005. The structural determinants of media contagion. Ph.D. thesis, Massachusetts Institute of Technology.Google ScholarGoogle Scholar
  27. Mei, Q., Liu, C., Su, H., and Zhai, C. 2006. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of the 15th International Conference on World Wide Web (WWW'06). ACM, New York, NY, 533--542. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Mishne, G. and Glance, N. 2006. Leave a reply: An analysis of weblog comments. In Proceedings of the 3rd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, at the 15th Annual World Wide Web Conference (WWW'06).Google ScholarGoogle Scholar
  29. Nallapati, R. and Cohen, W. 2008. Link-plsa-lda: A new unsupervised model for topics and influence of blogs. In Proceedings of the International Conference on Weblogs and Social Media (ICWSM'08).Google ScholarGoogle Scholar
  30. Qamra, A., Tseng, B., and Chang, E. 2006. Mining blog stories using community-based and temporal clustering. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM'06). ACM, New York, NY, 58--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Shen, D., Sun, J.-T., Yang, Q., and Chen, Z. 2006. Latent friend mining from blog data. In Proceedings of the 6th International Conference on Data Mining (ICDM'06). IEEE Computer Society, Los Alamitos, CA, 552--561. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Staab, S., Domingos, P., Mika, P., Golbeck, J., Ding, L., Finin, T., Joshi, A., Nowak, A., and Vallacher, R. 2005. Social networks applied. IEEE Intel. Syst. 20, 1, 80--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Teh, Y., Newman, D., and Welling, M. 2007. A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In Advances in Neural Information Processing Systems (NIPS). Vol. 19.Google ScholarGoogle Scholar
  34. Wang, X. and McCallum, A. 2006. Topics over time: a non-markov continuous-time model of topical trends. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'06). ACM, New York, NY, 424--433. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Wang, X., McCallum, A., and Wei, X. 2007. Topical n-grams: Phrase and topic discovery, with an application to information retrieval. In Proceedings of the 7th IEEE International Conference on Data Mining (ICDM'07). 697--702. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Zhou, Y. and Davis, J. 2006. Community discovery and analysis in blogspace. In Proceedings of the 15th International Conference on World Wide Web (WWW'06). ACM, New York, NY, 1017--1018. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Discovery of latent subcommunities in a blog's readership

      Recommendations

      Reviews

      Jeanine M. Meyer

      Adams et al. describe tools for characterizing subgroups among people who make comments on blogs. The methods involve mathematical analysis of the texts. The particular tool developed by the authors is called Ngram Topic over Time (NTOT). It detects phrases as opposed to single words, using sequences of text postings over time to generate the findings. The techniques, built on prior approaches-latent Dirichlet allocation (LDA), affinity propagation, and the Gibbs procedure-appear to deliver reasonable results. The modeling of time seems particularly relevant to understanding the readership of blogs. This analysis consists of two phases: the first determines topics, and the second divides the people making comments into groups, based on what topics they respond to with their posts. Initially, the authors applied the NTOT technique, along with others, to a small set of blogs, with the authors of these blogs providing feedback on the reasonableness of the topics. Similarly, they used surveys and interviews to test the validity of the discovered clusters of readers. The authors then went on to apply the techniques to a larger set. They had to cope with factors such as anonymous posts and aliases. It appears that the techniques can scale up, but they are still computationally intensive. This is heavy-duty mathematics, and would be fairly daunting for many readers. The figures could be larger, and could include more detailed explanations. However, the authors do an excellent job of exposition, so most readers will gain some understanding of the techniques and the overall goals. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on the Web
        ACM Transactions on the Web  Volume 4, Issue 3
        July 2010
        166 pages
        ISSN:1559-1131
        EISSN:1559-114X
        DOI:10.1145/1806916
        Issue’s Table of Contents

        Copyright © 2010 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 July 2010
        • Accepted: 1 December 2009
        • Revised: 1 July 2009
        • Received: 1 November 2008
        Published in tweb Volume 4, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      ePub

      View this article in ePub.

      View ePub
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!