Abstract
There is an increasing number of people reading, writing, and commenting on blogs. According to a recent survey made by Technorati, there are about 75,000 new blogs and 1.2 million new posts everyday. However, it is difficult and time consuming for a blog reader to find the most interesting posts in the huge and dynamic blog world. In this article, an online Personalized Blog Reader (PBR) system is proposed, which facilitates blog readers in browsing the coolest and newest blog posts of their interests by automatically clustering the most relevant stories. PBR aims to make a user's potential favorite topics always ranked higher than those nonfavorite ones. This is accomplished in the following steps. First, the system collects and provides a unified incremental index of posts coming from different blogs. Then, an incremental clustering algorithm with a flexible half-bounded window of observation is proposed to satisfy the requirements of online processing. It learns people's personalized reading preferences to present a user with a final reading list. The experimental results show that the proposed incremental clustering algorithm is effective and efficient, and the personalization of the PBR performs well.
- Adar, E., Zhang, L., Adamic, L. A., and Lukose, R. M. 2004. Implicit structure and the dynamics of blogspace. In Proceedings of the 13th International World Wide Web Conference Workshop on the Weblogging Ecosystem. 35--39.Google Scholar
- Allan, J., Papka, R., and Lavrenko, V. 1998. Online new event detection and tracking. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 37--45. Google Scholar
Digital Library
- Avesani, P., Cova, M., Hayes, C., and Massa, P. 2005. Learning contextualised Weblog topics. In Proceedings of the 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, E. Adar, N. Glance, and M. Hurst, Eds.Google Scholar
- Baker, L. D. and McCallum, A. K. 1998. Distributional clustering of words for text classification. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 96--103. Google Scholar
Digital Library
- Balabanovich, M. and Shoham, Y. 1997. Fab: Content-based, collaborative recommendation. Comm. ACM, 40, 3, 66--72. Google Scholar
Digital Library
- Bansal, N., Blum, A., and Chawla, S. 2004. Correlation clustering. Machine Learn. 56, 3, 89--113. Google Scholar
Digital Library
- Bansal, N., Chiang, F., Koudas, N., and Tompa, W. F. 2007. Seeking stable clusters in the blogosphere. In Proceedings of the 33rd International Conference on Very Large Databases. 806--817. Google Scholar
Digital Library
- Bansal, N. and Koudas, N. 2007. BLOGSCOPE: A system for online analysis of high volume text streams. In Proceedings of the 33rd International Conference on Very Large Databases. 1410--1413. Google Scholar
Digital Library
- Bern, M. and Eppstein, D. 1996. Approximation algorithms for geometric problems. In Approximation Algorithms for NP-Hard Problems, D. S. Hochbaum, Ed. PWS Publishing Company, Boston, 296--345. Google Scholar
Digital Library
- Bonett, M. 2001. Personalization of Web services: Opportunities and challenges. Ariadne 28.Google Scholar
- Brants, T. and Chen, F. R. 2003. A system for new event detection. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information. 330--337. Google Scholar
Digital Library
- Brooks, C. H. and Andmontanez, N. 2005. An analysis of the effectiveness of tagging in blogs. In AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs, vol. 4737, 1--20.Google Scholar
- Cayzer, S. 2004. Semantic blogging and decentralized knowledge management. Comm. ACM 47, 12, 47--52. Google Scholar
Digital Library
- Chen, J. L., Yan, J., Zhang B. Y., Yang, Q., and Chen, Z. 2006. Diverse topic phrase extraction through latent semantic analysis. In Proceedings of the 6th International Conference on Data Mining. 834--838. Google Scholar
Digital Library
- Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. 1990. Indexing by latent semantic analysis. J. Amer. Soc. Inform. Sci. 41, 6, 391--407.Google Scholar
Cross Ref
- Delwiche, A. 2005. Agenda-setting, opinion leadership, and the world of Weblogs. First Monday, 10, 12.Google Scholar
Cross Ref
- Fan, W. G., Gordon, M. D., and Pathak, P. 2004. Discovery of context-specific ranking functions for effective information retrieval by Genetic Programming. IEEE Trans. Knowl. Data Eng. 16, 4, 523--527. Google Scholar
Digital Library
- Fan, W. G., Gordon, M. D., and Pathak, P. 2005. Genetic programming-based discovery of ranking functions for effective Web search. J. Manag. Inform. Syst. 21, 4, 37--56. Google Scholar
Digital Library
- Ferragina, P. and Gulli, A. 2005. A personalized search engine based on Web-snippet hierarchical clustering. In Proceedings of the 14th International Conference on the World Wide Web Special Interest Tracks and Posters. 801--810. Google Scholar
Digital Library
- Gibson, D., Kleinberg, J., and Raghavan, P. 1998. Inferring Web communities from link topology. In Proceedings of the 9th Conference on Hypertext and Hypermedia. 225--234. Google Scholar
Digital Library
- Giotis, I. and Guruswami, V. 2006. Correlation clustering with a fixed number of clusters. In Proceedings of the ACM Symposium on Discrete Algorithms, 1167--1176. Google Scholar
Digital Library
- Gordon, M. D., Fan, W. G., and Pathak, P. 2006. Adaptive Web search: Evolving a program that finds information. IEEE Intell. Syst. 21, 5, 72--77. Google Scholar
Digital Library
- Hayes, C., Avesani, P., and Veeramachaneni, S. 2006a. An analysis of the use of tags in a blog recommender system. ITC-IRST Tech. rep., IJCAI: 2772--2777. Google Scholar
Digital Library
- Hayes, C., Avesani, P., and Veeramachaneni, S. 2006b. An analysis of bloggers and topics for a blog recommender system. In Proceedings of the 7th European Conference on Machine Learning and the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD) Workshop on Web Mining.Google Scholar
- Herring, S. C., Kouper, I., Paolillo, J. C., Scheidt, L. A., Tyworth, M., Welsch, P., Wright, E., and Yu, N. 2005. Conversations in the blogosphere: An analysis “from the bottom up.” In Proceedings of the 38th Hawaii International Conference on System Sciences. 1530--1605. Google Scholar
Digital Library
- Jeh, G. and Widom, J. 2003. Scaling personalized Web search. In Proceedings of the 12th International Conference on World Wide Web. 271--279. Google Scholar
Digital Library
- Kantrowitz, M., Behrang, M., and Mittal, V. 2000. Stemming and its effects on TFIDF Ranking. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 357--359. Google Scholar
Digital Library
- Karger, D. R. and Quan, D. 2005. What would it mean to blog on the semantic Web. Web In Semantics: Science, Services and Agents on the World Wide Web, vol. 3, 147--157. Google Scholar
Digital Library
- Karypis, G. and Kumar, V. 1998. Multi-level k-way partitioning scheme for irregular graphs. J. Parall. Distrib. Comput., vol. 48, 96--129. Google Scholar
Digital Library
- Kelleher, J. and Bridge, D. 2004. An accurate and scalable collaborative recommender. Artif. Intell. Rev. 21, 3--4, 193--213. Google Scholar
Digital Library
- Kleinberg, J. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5, 604--632. Google Scholar
Digital Library
- Koller, D. and Sahami, M. 1997. Hierarchically classifying documents using very few words. In Proceedings of the 14th International Conference on Machine Learning. 170--178. Google Scholar
Digital Library
- Kumar, R., Novak, P., Raghavan, S., and Tomkins, A. 2003. On the bursty evolution of blogspace. In Proceedings of the 12th International Conference on World Wide Web. 159--178. Google Scholar
Digital Library
- Liu, F., Yu, C., and Meng, W. Y. 2004. Personalized Web search for improving retrieval effectiveness. IEEE Trans. Knowl. Data Eng. 16, 1, 28--40. Google Scholar
Digital Library
- Mancoridis, S., Mitchell, B., Rorres, C., Chen, Y., and Gansner, E. 1998. Using automatic clustering to produce high-level system organizations of source code. In Proceedings of the 6th International Workshop on Program Comprehension. 45--53. Google Scholar
Digital Library
- Marlow, C. 2004. Audience, structure and authority in the Weblog community. In Proceedings of the International Communication Association Conference.Google Scholar
- Page, L., Brin, S., Motwani, R., and Winograd, T. 1998. The PageRank citation ranking: Bringing order to the Web. Tech. rep. Stanford University.Google Scholar
- Qiu, F. and Cho, J. 2006. Automatic identification of user interest for personalized search. In Proceedings of the 15th International Conference on World Wide Web. 727--736. Google Scholar
Digital Library
- Quintarelli, E. 2005. Folksonomies: Power to the people. ISKO Italy-UniMIB Meeting.Google Scholar
- Rand, W. M. 1971. Objective criteria for the evaluation of clustering methods. J. Amer. Statis. Assoc. 66, 336, 846--850.Google Scholar
Cross Ref
- Salton, G., and McGill, M. J. 1983. An Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York. Google Scholar
Digital Library
- Sarwar, B. M., Karypis, G., Konstan, J., and Riedl, J. 2002. Recommender systems for large-scale e-commerce: Scalable neighborhood formation using clustering. In Proceedings of the 5th International Conference on Computer and Information Technology.Google Scholar
- Singhal, A. and Salton, G. 1995. Automatic text browsing using vector space model. In Proceedings of the 5th Dual-Use Technologies and Applications Conference. 318--324.Google Scholar
- Solomonoff, A., Mielke, A., Schmidt, M., and Gish, H. 1998. Clustering speakers by their voices. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. 757--760.Google Scholar
- Wu, Z. and Leahy, R. 1993. An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE Trans. Patt. Anal. Mach. Intell. 15, 11, 1101--1113. Google Scholar
Digital Library
- Tsai, T.-M., Shih, C.-C., and Chou, S.-C. T. 2006. Personalized blog recommendation using the value, semantic, and social model. In Innovations in Information Technology. 1--5.Google Scholar
- Yang, Y., Pierce, T., and Carbonell. J. G. 1998. A study on retrospective and online event detection. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 28--36. Google Scholar
Digital Library
Index Terms
An online blog reading system by topic clustering and personalized ranking
Recommendations
Towards a Personalized Blog Site Recommendation System: A Collaborative Rating Approach
SMAP '09: Proceedings of the 2009 Fourth International Workshop on Semantic Media Adaptation and PersonalizationThe blogosphere is a part of the Web, enhanced with several characteristics that differentiate blogs from traditional websites. The number of different authors, the multitude of user-provided tags, the inherent connectivity between blogs and bloggers, ...
Trackback-Rank: An Effective Ranking Algorithm for the Blog Search
IITA '08: Proceedings of the 2008 Second International Symposium on Intelligent Information Technology Application - Volume 03Today, most web pages are being created in the blog space or evolving into the blog space. A major problem is that a blog entry (blog page) includes non-traditional features of Web pages. Those are trackback links, bloggers' authority, tags, and users’ ...
Personalized Popular Blog Recommender Service for Mobile Applications
EC-Web 2009: Proceedings of the 10th International Conference on E-Commerce and Web TechnologiesWeblogs have emerged as a new communication and publication medium on the Internet for diffusing the latest useful information. Providing value-added mobile services such as blog articles is increasingly important to attract mobile users to mobile ...








Comments