ABSTRACT
Traditional web link-based ranking schemes use a single score to measure a page's authority without concern of the community from which that authority is derived. As a result, a resource that is highly popular for one topic may dominate the results of another topic in which it is less authoritative. To address this problem, we suggest calculating a score vector for each page to distinguish the contribution from different topics, using a random walk model that probabilistically combines page topic distribution and link structure. We show how to incorporate the topical model within both PageRank and HITS without affecting the overall property and still render insight into topic-level transition. Experiments on multiple datasets indicate that our technique outperforms other ranking approaches that incorporate textual analysis.
- R. Baeza-Yates and B. Ribeiro-Neto. Modern information retrieval. In Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1999. Google Scholar
Digital Library
- K. Bharat and M. R. Henzinger. Improved algorithms for topic distillation in hyperlinked environments. In Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 104--111, Aug. 1998. Google Scholar
Digital Library
- S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. In Proc. of the htmladdnormallink7th Int'l World Wide Web Conf. pages 107--117, Brisbane, Australia, Apr. 1998. Google Scholar
Digital Library
- D. Cai, X. He, J.-R. Wen, and W.-Y. Ma. Block-level link analysis. In Proceedings of the 27th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, July 2004. Google Scholar
Digital Library
- S. Chakrabarti, B. E. Dom, D. Gibson, J. M. Kleinberg, S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Mining the Web's link structure. IEEE Computer, pages 60--67, Aug. 1999. Google Scholar
Digital Library
- S. Chakrabarti, B. E. Dom, P. Raghavan, S. Rajagopalan, D. Gibson, and J. M. Kleinberg. Automatic resource compilation by analyzing hyperlink structure and associated text. In Proc. of the 7th Int'l World Wide Web Conf., pages 65--74, Brisbane, Australia, Apr. 1998. Google Scholar
Digital Library
- Google, Inc. Google information for webmasters. Retrieved 9 November 2005 from the Google Website: http://www.google.com/webmasters/4.html, 2005.Google Scholar
- T. H. Haveliwala. Topic-sensitive PageRank. In Proceedings of the Eleventh International World Wide Web Conference, Honolulu, Hawaii, May 2002. Google Scholar
Digital Library
- IBM Almaden Research Center. The CLEVER Project. Home page: htmladdnormallinkfamily http://www.almaden.ibm.com/cs/k53/clever.html, 2000.Google Scholar
- K. M. Jiang, G. R. Xue, H. J. Zeng, X. Chen, W. Song, and W.-Y. Ma. Exploiting PageRank analysis at different block level. In Proceedings of the 5th Conference on Information Systems Engineering, 2004.Google Scholar
- J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999. Google Scholar
Digital Library
- R. Lempel and S. Moran. The stochastic approach for link-structure analysis (SALSA) and the TKC effect. In Proc. of the 9th Int. WWW Conf., May 2000. Google Scholar
Digital Library
- Open Directory Project (ODP), 2006. http://www.dmoz.com/.Google Scholar
- L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the Web. Unpublished draft, 1998.Google Scholar
- S. K. Pal and B. Narayan. A web surfer model incorporating topic continuity. IEEE Transactions on Knowledge and Data Engineering, 17:726--729, 2005. Google Scholar
Digital Library
- Rainbow: text classification tool. http://www.cs.umass.edu/~mccallum/bow/rainbow/.Google Scholar
- M. Richardson and P. Domingos. The intelligent surfer: Probabilistic combination of link and content information in PageRank. In Advances in Neural Information Processing Systems 14. MIT Press, 2002.Google Scholar
- S. E. Robertson. Overview of the OKAPI projects. Journal of Documentation, 53:3--7, 1997.Google Scholar
Cross Ref
- B. Wu and B. D. Davison. Identifying link farm spam pages. In Proc. of the 14th Int'l World Wide Web Conf., pages 820--829, Chiba, Japan, May 2005. Google Scholar
Digital Library
- Yahoo!, Inc. Yahoo! http://www.yahoo.com/ 2006.Google Scholar
Index Terms
Topical link analysis for web search
Recommendations
A study of results overlap and uniqueness among major web search engines
The performance and capabilities of Web search engines is an important and significant area of research. Millions of people world wide use Web search engines very day. This paper reports the results of a major study examining the overlap among results ...
Identifying link farm spam pages
WWW '05: Special interest tracks and posters of the 14th international conference on World Wide WebWith the increasing importance of search in guiding today's web traffic, more and more effort has been spent to create search engine spam. Since link analysis is one of the most important factors in current commercial search engines' ranking systems, ...
Searching the Web
We offer an overview of current Web search engine design. After introducing a generic search engine architecture, we examine each engine component in turn. We cover crawling, local Web page storage, indexing, and the use of link analysis for boosting ...






Comments