skip to main content
research-article

An online blog reading system by topic clustering and personalized ranking

Published:30 July 2009Publication History
Skip Abstract Section

Abstract

There is an increasing number of people reading, writing, and commenting on blogs. According to a recent survey made by Technorati, there are about 75,000 new blogs and 1.2 million new posts everyday. However, it is difficult and time consuming for a blog reader to find the most interesting posts in the huge and dynamic blog world. In this article, an online Personalized Blog Reader (PBR) system is proposed, which facilitates blog readers in browsing the coolest and newest blog posts of their interests by automatically clustering the most relevant stories. PBR aims to make a user's potential favorite topics always ranked higher than those nonfavorite ones. This is accomplished in the following steps. First, the system collects and provides a unified incremental index of posts coming from different blogs. Then, an incremental clustering algorithm with a flexible half-bounded window of observation is proposed to satisfy the requirements of online processing. It learns people's personalized reading preferences to present a user with a final reading list. The experimental results show that the proposed incremental clustering algorithm is effective and efficient, and the personalization of the PBR performs well.

References

  1. Adar, E., Zhang, L., Adamic, L. A., and Lukose, R. M. 2004. Implicit structure and the dynamics of blogspace. In Proceedings of the 13th International World Wide Web Conference Workshop on the Weblogging Ecosystem. 35--39.Google ScholarGoogle Scholar
  2. Allan, J., Papka, R., and Lavrenko, V. 1998. Online new event detection and tracking. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 37--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Avesani, P., Cova, M., Hayes, C., and Massa, P. 2005. Learning contextualised Weblog topics. In Proceedings of the 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, E. Adar, N. Glance, and M. Hurst, Eds.Google ScholarGoogle Scholar
  4. Baker, L. D. and McCallum, A. K. 1998. Distributional clustering of words for text classification. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 96--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Balabanovich, M. and Shoham, Y. 1997. Fab: Content-based, collaborative recommendation. Comm. ACM, 40, 3, 66--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bansal, N., Blum, A., and Chawla, S. 2004. Correlation clustering. Machine Learn. 56, 3, 89--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bansal, N., Chiang, F., Koudas, N., and Tompa, W. F. 2007. Seeking stable clusters in the blogosphere. In Proceedings of the 33rd International Conference on Very Large Databases. 806--817. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bansal, N. and Koudas, N. 2007. BLOGSCOPE: A system for online analysis of high volume text streams. In Proceedings of the 33rd International Conference on Very Large Databases. 1410--1413. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bern, M. and Eppstein, D. 1996. Approximation algorithms for geometric problems. In Approximation Algorithms for NP-Hard Problems, D. S. Hochbaum, Ed. PWS Publishing Company, Boston, 296--345. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Bonett, M. 2001. Personalization of Web services: Opportunities and challenges. Ariadne 28.Google ScholarGoogle Scholar
  11. Brants, T. and Chen, F. R. 2003. A system for new event detection. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information. 330--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Brooks, C. H. and Andmontanez, N. 2005. An analysis of the effectiveness of tagging in blogs. In AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs, vol. 4737, 1--20.Google ScholarGoogle Scholar
  13. Cayzer, S. 2004. Semantic blogging and decentralized knowledge management. Comm. ACM 47, 12, 47--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Chen, J. L., Yan, J., Zhang B. Y., Yang, Q., and Chen, Z. 2006. Diverse topic phrase extraction through latent semantic analysis. In Proceedings of the 6th International Conference on Data Mining. 834--838. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. 1990. Indexing by latent semantic analysis. J. Amer. Soc. Inform. Sci. 41, 6, 391--407.Google ScholarGoogle ScholarCross RefCross Ref
  16. Delwiche, A. 2005. Agenda-setting, opinion leadership, and the world of Weblogs. First Monday, 10, 12.Google ScholarGoogle ScholarCross RefCross Ref
  17. Fan, W. G., Gordon, M. D., and Pathak, P. 2004. Discovery of context-specific ranking functions for effective information retrieval by Genetic Programming. IEEE Trans. Knowl. Data Eng. 16, 4, 523--527. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Fan, W. G., Gordon, M. D., and Pathak, P. 2005. Genetic programming-based discovery of ranking functions for effective Web search. J. Manag. Inform. Syst. 21, 4, 37--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ferragina, P. and Gulli, A. 2005. A personalized search engine based on Web-snippet hierarchical clustering. In Proceedings of the 14th International Conference on the World Wide Web Special Interest Tracks and Posters. 801--810. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Gibson, D., Kleinberg, J., and Raghavan, P. 1998. Inferring Web communities from link topology. In Proceedings of the 9th Conference on Hypertext and Hypermedia. 225--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Giotis, I. and Guruswami, V. 2006. Correlation clustering with a fixed number of clusters. In Proceedings of the ACM Symposium on Discrete Algorithms, 1167--1176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Gordon, M. D., Fan, W. G., and Pathak, P. 2006. Adaptive Web search: Evolving a program that finds information. IEEE Intell. Syst. 21, 5, 72--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hayes, C., Avesani, P., and Veeramachaneni, S. 2006a. An analysis of the use of tags in a blog recommender system. ITC-IRST Tech. rep., IJCAI: 2772--2777. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Hayes, C., Avesani, P., and Veeramachaneni, S. 2006b. An analysis of bloggers and topics for a blog recommender system. In Proceedings of the 7th European Conference on Machine Learning and the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD) Workshop on Web Mining.Google ScholarGoogle Scholar
  25. Herring, S. C., Kouper, I., Paolillo, J. C., Scheidt, L. A., Tyworth, M., Welsch, P., Wright, E., and Yu, N. 2005. Conversations in the blogosphere: An analysis “from the bottom up.” In Proceedings of the 38th Hawaii International Conference on System Sciences. 1530--1605. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jeh, G. and Widom, J. 2003. Scaling personalized Web search. In Proceedings of the 12th International Conference on World Wide Web. 271--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kantrowitz, M., Behrang, M., and Mittal, V. 2000. Stemming and its effects on TFIDF Ranking. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 357--359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Karger, D. R. and Quan, D. 2005. What would it mean to blog on the semantic Web. Web In Semantics: Science, Services and Agents on the World Wide Web, vol. 3, 147--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Karypis, G. and Kumar, V. 1998. Multi-level k-way partitioning scheme for irregular graphs. J. Parall. Distrib. Comput., vol. 48, 96--129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Kelleher, J. and Bridge, D. 2004. An accurate and scalable collaborative recommender. Artif. Intell. Rev. 21, 3--4, 193--213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Kleinberg, J. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5, 604--632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Koller, D. and Sahami, M. 1997. Hierarchically classifying documents using very few words. In Proceedings of the 14th International Conference on Machine Learning. 170--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Kumar, R., Novak, P., Raghavan, S., and Tomkins, A. 2003. On the bursty evolution of blogspace. In Proceedings of the 12th International Conference on World Wide Web. 159--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Liu, F., Yu, C., and Meng, W. Y. 2004. Personalized Web search for improving retrieval effectiveness. IEEE Trans. Knowl. Data Eng. 16, 1, 28--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Mancoridis, S., Mitchell, B., Rorres, C., Chen, Y., and Gansner, E. 1998. Using automatic clustering to produce high-level system organizations of source code. In Proceedings of the 6th International Workshop on Program Comprehension. 45--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Marlow, C. 2004. Audience, structure and authority in the Weblog community. In Proceedings of the International Communication Association Conference.Google ScholarGoogle Scholar
  37. Page, L., Brin, S., Motwani, R., and Winograd, T. 1998. The PageRank citation ranking: Bringing order to the Web. Tech. rep. Stanford University.Google ScholarGoogle Scholar
  38. Qiu, F. and Cho, J. 2006. Automatic identification of user interest for personalized search. In Proceedings of the 15th International Conference on World Wide Web. 727--736. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Quintarelli, E. 2005. Folksonomies: Power to the people. ISKO Italy-UniMIB Meeting.Google ScholarGoogle Scholar
  40. Rand, W. M. 1971. Objective criteria for the evaluation of clustering methods. J. Amer. Statis. Assoc. 66, 336, 846--850.Google ScholarGoogle ScholarCross RefCross Ref
  41. Salton, G., and McGill, M. J. 1983. An Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Sarwar, B. M., Karypis, G., Konstan, J., and Riedl, J. 2002. Recommender systems for large-scale e-commerce: Scalable neighborhood formation using clustering. In Proceedings of the 5th International Conference on Computer and Information Technology.Google ScholarGoogle Scholar
  43. Singhal, A. and Salton, G. 1995. Automatic text browsing using vector space model. In Proceedings of the 5th Dual-Use Technologies and Applications Conference. 318--324.Google ScholarGoogle Scholar
  44. Solomonoff, A., Mielke, A., Schmidt, M., and Gish, H. 1998. Clustering speakers by their voices. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. 757--760.Google ScholarGoogle Scholar
  45. Wu, Z. and Leahy, R. 1993. An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE Trans. Patt. Anal. Mach. Intell. 15, 11, 1101--1113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Tsai, T.-M., Shih, C.-C., and Chou, S.-C. T. 2006. Personalized blog recommendation using the value, semantic, and social model. In Innovations in Information Technology. 1--5.Google ScholarGoogle Scholar
  47. Yang, Y., Pierce, T., and Carbonell. J. G. 1998. A study on retrospective and online event detection. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 28--36. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An online blog reading system by topic clustering and personalized ranking

              Recommendations

              Reviews

              Fazli Can

              The authors propose a personalized blog reader (PBR) system that helps bloggers find interesting posts. Their proposal includes an online incremental clustering algorithm that uses blog link and content information, and, for personalized blog ranking, a method to learn bloggers' reading habits. Li et al. measure various aspects of their system. For example, they assess the consistency of the results of various possible clustering approaches with a ground truth. For this purpose, they use the Rand index, although a well-known and more reliable version of this statistic exists that eliminates the chance factor [1]. The authors also evaluate the effects of some parameters on the clustering runtime efficiency, and present the results of a user study on the subjective evaluation of their system. This fast-paced paper introduces many things. It includes some unnecessary details, such as the formula for the Rand index. Also, some of the sentences are awkward and long. The paper is interesting. Blogging is a popular Web activity, and studies such as this one are needed. Online Computing Reviews Service

              Access critical reviews of Computing literature here

              Become a reviewer for Computing Reviews.

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader
              About Cookies On This Site

              We use cookies to ensure that we give you the best experience on our website.

              Learn more

              Got it!