Abstract
The blogosphere has grown to be a mainstream forum of social interaction as well as a commercially attractive source of information and influence. Tools are needed to better understand how communities that adhere to individual blogs are constituted in order to facilitate new personal, socially-focused browsing paradigms, and understand how blog content is consumed, which is of interest to blog authors, big media, and search. We present a novel approach to blog subcommunity characterization by modeling individual blog readers using mixtures of an extension to the LDA family that jointly models phrases and time, Ngram Topic over Time (NTOT), and cluster with a number of similarity measures using Affinity Propagation. We experiment with two datasets: a small set of blogs whose authors provide feedback, and a set of popular, highly commented blogs, which provide indicators of algorithm scalability and interpretability without prior knowledge of a given blog. The results offer useful insight to the blog authors about their commenting community, and are observed to offer an integrated perspective on the topics of discussion and members engaged in those discussions for unfamiliar blogs. Our approach also holds promise as a component of solutions to related problems, such as online entity resolution and role discovery.
- Abbasi, A. and Chen, H. 2005. Applying authorship analysis to extremist-group web forum messages. IEEE Intel. Syst. 20, 5, 67--75. Google Scholar
Digital Library
- Adams, B., Phung, D., and Venkatesh, S. 2009. Social reader: Following social networks in the wilds of the blogosphere. In Proceedings of the 1st International Workshop on Social Media (WSM), in Conjuction with ACM Multimedia. Google Scholar
Digital Library
- Adar, E., Zhang, L., Adamic, L., and Lukose, R. 2004. Implicit structure and the dynamics of blogspace. In Proceedings of the Workshop on the Weblogging Ecosystem at the 13th International World Wide Web Conference.Google Scholar
- Agarwal, N., Liu, H., Tang, L., and Yu, P. 2008. Identifying the influential bloggers in a community. In Proceedings of the International Conference on Web Search and Web Data Mining (WSDM '08). ACM, New York, NY, 207--218. Google Scholar
Digital Library
- Ali-Hasan, N. and Adamic, E. 2007. Expressing social relationships on the blog through links and comments. In Proceedings of the 1st International Conference on Weblogs and Social Media (ICWSM'07).Google Scholar
- Baumer, E., Sueyoshi, M., and Tomlinson, B. 2008. Exploring the role of the reader in the activity of blogging. In Proceedings of the 26th Annual SIGCHI Conference on Human Factors in Computing Systems (CHI'08). ACM, New York, NY, 1111--1120. Google Scholar
Digital Library
- Blei, D., Ng, A., and Jordan, M. 2003. Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022. Google Scholar
Digital Library
- Chi, Y., Zhu, S., Song, X., Tatemura, J., and Tseng, B. 2007. Structural and temporal analysis of the blogosphere through community factorization. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'07). ACM, New York, NY, 163--172. Google Scholar
Digital Library
- Culotta, A., Bekkerman, R., and McCallum, A. 2004. Extracting social networks and contact information from email and the web. In Proceedings of the 1st Conference on Email and Anti-Spam.Google Scholar
- Duarte, F., Mattos, B., Bestavros, A., Almeida, V., and Almeida, J. 2007. Traffic characteristics and communication patterns in blogosphere. In Proceedings of the 1st International Conference on Weblogs and Social Media (ICWSM'07).Google Scholar
- Frey, B. and Dueck, D. 2007. Clustering by passing messages between data points. Science 315, 972--976.Google Scholar
Cross Ref
- Furukawa, T., Matsuo, Y., and Matsuzawa, T. 2006. User's behavioral analysis on weblogs. In Proceedings of the AAAI Spring Symposia on Computational Approaches to Analyzing Weblogs.Google Scholar
- Getoor, L. and Diehl, C. 2005. Link mining: a survey. ACM SIGKDD Explor. Newslett. 7, 2, 3--12. Google Scholar
Digital Library
- Gilks, W., Richardson, S., and Spiegelhalter, D. 1996. Markov Chain Monte Carlo in Practice. Chapman & Hall/CRC.Google Scholar
- Griffiths, T. and Steyvers, M. 2004. Finding scientific topics. Proc Natl Acad Sci 101 Suppl 1, 5228--5235.Google Scholar
Cross Ref
- Halpern, D. 2004. Social Capital. Polity.Google Scholar
- Herring, S., Kouper, I., Paolillo, J., Scheidt, L., Tyworth, M., Welsch, P., Wright, E., and Yu, N. 2005. Conversations in the blogosphere: An analysis “from the bottom up”. In Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05). Track 4. IEEE Computer Society, Los Alamitos, CA, 107.2. Google Scholar
Digital Library
- Herring, S., Scheidt, L., Bonus, S., and Wright, E. 2004. Bridging the gap: A genre analysis of weblogs. In Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS'04). Track 4. IEEE Computer Society, Los Alamitos, CA, 40101.2. Google Scholar
Digital Library
- Hofmann, T. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '99). ACM, New York, NY, 50--57. Google Scholar
Digital Library
- Hsu, W., Lancaster, J., Paradesi, M., and Weninger, T. 2007. Structural link analysis from user profiles and friends networks: A feature construction approach. In Proceedings of the 1st International Conference on Weblogs and Social Media (ICWSM'07).Google Scholar
- Hu, M., Sun, A., and Lim, E.-P. 2007. Comments-oriented blog summarization by sentence extraction. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM'07). ACM, New York, NY, 901--904. Google Scholar
Digital Library
- Krishnamurthy, S. 2002. The Multidimensionality of Blog Conversations: The Virtual Enactment of September 11. Internet Research 3. 0. Maastricht, Netherlands.Google Scholar
- Kumar, R., Novak, J., Raghavan, P., and Tomkins, A. 2005. On the bursty evolution of blogspace. World Wide Web 8, 2, 159--178. Google Scholar
Digital Library
- Lin, Y.-R., Sundaram, H., Chi, Y., Tatemura, J., and Tseng, B. 2006. Discovery of blog communities based on mutual awareness. In Proceedings of the World Wide Web Conference Workshop on Weblogging Ecosystem: Aggregation, Analysis and Dynamics.Google Scholar
- Makrehchi, M. and Kamel, M. 2006. Learning social networks from web documents using support vector classifiers. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI'06). IEEE Computer Society, Los Alamitos, CA, 88--94. Google Scholar
Digital Library
- Marlow, C. 2005. The structural determinants of media contagion. Ph.D. thesis, Massachusetts Institute of Technology.Google Scholar
- Mei, Q., Liu, C., Su, H., and Zhai, C. 2006. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of the 15th International Conference on World Wide Web (WWW'06). ACM, New York, NY, 533--542. Google Scholar
Digital Library
- Mishne, G. and Glance, N. 2006. Leave a reply: An analysis of weblog comments. In Proceedings of the 3rd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, at the 15th Annual World Wide Web Conference (WWW'06).Google Scholar
- Nallapati, R. and Cohen, W. 2008. Link-plsa-lda: A new unsupervised model for topics and influence of blogs. In Proceedings of the International Conference on Weblogs and Social Media (ICWSM'08).Google Scholar
- Qamra, A., Tseng, B., and Chang, E. 2006. Mining blog stories using community-based and temporal clustering. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM'06). ACM, New York, NY, 58--67. Google Scholar
Digital Library
- Shen, D., Sun, J.-T., Yang, Q., and Chen, Z. 2006. Latent friend mining from blog data. In Proceedings of the 6th International Conference on Data Mining (ICDM'06). IEEE Computer Society, Los Alamitos, CA, 552--561. Google Scholar
Digital Library
- Staab, S., Domingos, P., Mika, P., Golbeck, J., Ding, L., Finin, T., Joshi, A., Nowak, A., and Vallacher, R. 2005. Social networks applied. IEEE Intel. Syst. 20, 1, 80--93. Google Scholar
Digital Library
- Teh, Y., Newman, D., and Welling, M. 2007. A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In Advances in Neural Information Processing Systems (NIPS). Vol. 19.Google Scholar
- Wang, X. and McCallum, A. 2006. Topics over time: a non-markov continuous-time model of topical trends. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'06). ACM, New York, NY, 424--433. Google Scholar
Digital Library
- Wang, X., McCallum, A., and Wei, X. 2007. Topical n-grams: Phrase and topic discovery, with an application to information retrieval. In Proceedings of the 7th IEEE International Conference on Data Mining (ICDM'07). 697--702. Google Scholar
Digital Library
- Zhou, Y. and Davis, J. 2006. Community discovery and analysis in blogspace. In Proceedings of the 15th International Conference on World Wide Web (WWW'06). ACM, New York, NY, 1017--1018. Google Scholar
Digital Library
Index Terms
Discovery of latent subcommunities in a blog's readership
Recommendations
Subject-based extraction of a latent blog community
In the blogosphere, there exist posts relevant to a particular subject and blogs that show interest in the subject. In this paper, we define a set of such posts and blogs as a blog community and propose a method for extracting the blog community ...
Blog Community Discovery Based on Tag Data Clustering
PACIIA '08: Proceedings of the 2008 IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application - Volume 02Blog is increasingly becoming an important source of information. Blog community is a kind of a group of bloggers with the same interest and common topics on the Internet. To use blog resources effectively, one important way is to identify blog ...
Utilizing Social Relationships for Blog Popularity Mining
AIRS '09: Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval TechnologyDue to the ease of use in blogs, this new form of web content has become a popular online media. Detecting the popularity of blogs in the massive blogosphere is a critical issue. General search engines that ignore the social interconnection between ...








Comments