skip to main content
10.1145/1148170.1148189acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Topical link analysis for web search

Published:06 August 2006Publication History

ABSTRACT

Traditional web link-based ranking schemes use a single score to measure a page's authority without concern of the community from which that authority is derived. As a result, a resource that is highly popular for one topic may dominate the results of another topic in which it is less authoritative. To address this problem, we suggest calculating a score vector for each page to distinguish the contribution from different topics, using a random walk model that probabilistically combines page topic distribution and link structure. We show how to incorporate the topical model within both PageRank and HITS without affecting the overall property and still render insight into topic-level transition. Experiments on multiple datasets indicate that our technique outperforms other ranking approaches that incorporate textual analysis.

References

  1. R. Baeza-Yates and B. Ribeiro-Neto. Modern information retrieval. In Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Bharat and M. R. Henzinger. Improved algorithms for topic distillation in hyperlinked environments. In Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 104--111, Aug. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. In Proc. of the htmladdnormallink7th Int'l World Wide Web Conf. pages 107--117, Brisbane, Australia, Apr. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Cai, X. He, J.-R. Wen, and W.-Y. Ma. Block-level link analysis. In Proceedings of the 27th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, July 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Chakrabarti, B. E. Dom, D. Gibson, J. M. Kleinberg, S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Mining the Web's link structure. IEEE Computer, pages 60--67, Aug. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Chakrabarti, B. E. Dom, P. Raghavan, S. Rajagopalan, D. Gibson, and J. M. Kleinberg. Automatic resource compilation by analyzing hyperlink structure and associated text. In Proc. of the 7th Int'l World Wide Web Conf., pages 65--74, Brisbane, Australia, Apr. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Google, Inc. Google information for webmasters. Retrieved 9 November 2005 from the Google Website: http://www.google.com/webmasters/4.html, 2005.Google ScholarGoogle Scholar
  8. T. H. Haveliwala. Topic-sensitive PageRank. In Proceedings of the Eleventh International World Wide Web Conference, Honolulu, Hawaii, May 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. IBM Almaden Research Center. The CLEVER Project. Home page: htmladdnormallinkfamily http://www.almaden.ibm.com/cs/k53/clever.html, 2000.Google ScholarGoogle Scholar
  10. K. M. Jiang, G. R. Xue, H. J. Zeng, X. Chen, W. Song, and W.-Y. Ma. Exploiting PageRank analysis at different block level. In Proceedings of the 5th Conference on Information Systems Engineering, 2004.Google ScholarGoogle Scholar
  11. J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Lempel and S. Moran. The stochastic approach for link-structure analysis (SALSA) and the TKC effect. In Proc. of the 9th Int. WWW Conf., May 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Open Directory Project (ODP), 2006. http://www.dmoz.com/.Google ScholarGoogle Scholar
  14. L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the Web. Unpublished draft, 1998.Google ScholarGoogle Scholar
  15. S. K. Pal and B. Narayan. A web surfer model incorporating topic continuity. IEEE Transactions on Knowledge and Data Engineering, 17:726--729, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Rainbow: text classification tool. http://www.cs.umass.edu/~mccallum/bow/rainbow/.Google ScholarGoogle Scholar
  17. M. Richardson and P. Domingos. The intelligent surfer: Probabilistic combination of link and content information in PageRank. In Advances in Neural Information Processing Systems 14. MIT Press, 2002.Google ScholarGoogle Scholar
  18. S. E. Robertson. Overview of the OKAPI projects. Journal of Documentation, 53:3--7, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  19. B. Wu and B. D. Davison. Identifying link farm spam pages. In Proc. of the 14th Int'l World Wide Web Conf., pages 820--829, Chiba, Japan, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Yahoo!, Inc. Yahoo! http://www.yahoo.com/ 2006.Google ScholarGoogle Scholar

Index Terms

  1. Topical link analysis for web search

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
      August 2006
      768 pages
      ISBN:1595933697
      DOI:10.1145/1148170

      Copyright © 2006 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 6 August 2006

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!