skip to main content
10.1145/988672.988714acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
Article

Ranking the web frontier

Published:17 May 2004Publication History

ABSTRACT

The celebrated PageRank algorithm has proved to be a very effective paradigm for ranking results of web search algorithms. In this paper we refine this basic paradigm to take into account several evolving prominent features of the web, and propose several algorithmic innovations. First, we analyze features of the rapidly growing "frontier" of the web, namely the part of the web that crawlers are unable to cover for one reason or another. We analyze the effect of these pages and find it to be significant. We suggest ways to improve the quality of ranking by modeling the growing presence of "link rot" on the web as more sites and pages fall out of maintenance. Finally we suggest new methods of ranking that are motivated by the hierarchical structure of the web, are more efficient than PageRank, and may be more resistant to direct manipulation.

References

  1. Serge Abiteboul, Mihai Preda, and Grégory Cobena. Adaptive on-line page importance computation. In Proc. 12th World Wide Web Conference, pages 280--290, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Gianni Amati, Iadh Ounis, and Vassilis Plachouras. The dynamic absorbing model for the web. Technical Report TR-2003-137, University of Glasgow, April 2003.Google ScholarGoogle Scholar
  3. Arvind Arasu, Jasmine Novak, Andrew S. Tomkins, and John A. Tomlin. Pagerank computation and the structure of the web: Experiments and algorithms. In Poster Proc. WWW2002, Honolulu, 2002.Google ScholarGoogle Scholar
  4. T. Berners-Lee, R. Fielding, and L. Masinter. Uniform resource identifiers (URI): Generic syntax. http://www.ietf.org/rfc/rfc2396.txthttp://www.ietf.org/rfc/rfc2396.t%xt. RFC 2396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Allan Borodin, Gareth O. Robers, Jeffrey S. Rosenthal, and Panayiotis Tsaparas. Finding authorities and hubs from link structures on the world wide web. In Proc. 10th World Wide Web Conference, pages 415--429, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Sergey Brin, Rajeev Motwani, Lawrence Page, and Terry Winograd. What can you do with a web in your pocket? Data Engineering Bulletin, 21:37--47, 1998.Google ScholarGoogle Scholar
  7. Soumen Chakrabarti, Byron Dom, David Gibson, Jon M. Kleinberg, Prabhakar Raghavan, and Sridhar Rajagopalan. Automatic resource compilation by analyzing hyperlink structure and associated text. In Proc. 7th World Wide Web Conference, pages 65--74, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Yen-Yu Chen, Qingqing Gan, and Torsten Suel. I/O efficient techniques for computing pagerank. In CIKM 2002, pages 549--557, McLean, Virgina, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Steve Chien, Cynthia Dwork, Ravi Kumar, and D. Sivakumar. Towards exploiting link evolution. In Workshop on Algorithms and Models for the Web Graph, 2001.Google ScholarGoogle Scholar
  10. Junghoo Cho, Hector Garcia-Molina, and Lawrence Page. Efficient crawling through url ordering. In Proc. of 7th World Wide Web Conference, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Nick Craswell, David Hawking, and Stephen E. Robertson. Effective site finding using link anchor information. In Proc. of the 24th annual international ACM SIGIR conference on research and development in information retrieval, pages 250--257, New Orleans, Louisiana, USA, September 2001. Association for Computing Machinery. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Brian D. Davison. Recognizing nepotistic links on the web. In Artificial Intelligence for Web Search, pages 23--28. AAAI Press, July 2000.Google ScholarGoogle Scholar
  13. Chris Ding, Xiaofeng He, Parry Husbands, Hongyuan Zha, and Horst Simon. Pagerank, hits and a unified framework for link analysis. In Proc. of 25th ACM SIGIR, pages 353--354, Tampere, Finland, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Nadav Eiron and Kevin S. McCurley. Analysis of anchor text for web search. In Proc. of 26th ACM SIGIR, pages 459--460, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Nadav Eiron and Kevin S. McCurley. Locality, hierarchy, and bidirectionality in the web. In Workshop on Algorithms and Models for the Web Graph, Budapest, May 2003.Google ScholarGoogle Scholar
  16. Nadav Eiron and Kevin S. McCurley. Untangling compound documents on the web. In Proc. 14th ACM Conf. on Hypertext, pages 85--94, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ronald Fagin, Ravi Kumar, Kevin S. McCurley, Jasmine Novak, D. Sivakumar, John A. Tomlin, and David P. Williamson. Searching the workplace web. In Proc. 12th World Wide Web Conference, Budapest, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Gene H. Golub and Charles van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, 3rd edition, 1996.Google ScholarGoogle Scholar
  19. Siegfried Handschuh, Steffen Staab, and Raphael Volz. On deep annotation. In Proc. 12th World Wide Web Conference, pages 431--438, Budapest, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Taher Haveliwala. Efficient computation of pagerank. Technical report, Stanford University, 1999.Google ScholarGoogle Scholar
  21. Taher H. Haveliwala. Topic-sensitive pagerank. In Proc. 11th World Wide Web Conference, pages 517--526, Honolulu, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Sepandar Kamvar, Taher Haveliwala, and Gene Golub. Adaptive methods for the computation of pagerank. Technical report, April 2003.Google ScholarGoogle Scholar
  23. Sepandar D. Kamvar, Taher H. Haveliwala, Christopher D. Manning, and Gene H. Golub. Exploiting the block structure of the web for computing pagerank. Technical report, Stanford University, 2003.Google ScholarGoogle Scholar
  24. Sepandar D. Kamvar, Taher H. Haveliwala, Christopher D. Manning, and Gene H. Golub. Extrapolation methods for accelerating pagerank computations. In Proc. 12th World Wide Web Conference, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ronny Lempel and Shlomo Moran. SALSA:the stochastic approach for link-structure analysis. ACM Transactions on Information Systems, 19(2):131--160, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. John Markwell and David W. Brooks. Link rot limits the usefulness of web-based educational materials in biochemistry and molecular biology. Biochem. Mol. Biol. Educ., 31:69--72, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  27. Marc Najork and Janet L. Wiener. Breadth-first search crawling yields high-quality pages. In Proc. 10th World Wide Web Conference, pages 114--118, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998. Paper SIDL-WP-1999-0120 (version of 11/11/1999).Google ScholarGoogle Scholar
  29. Gopal Pandurangan, Prabhakar Raghavan, and Eli Upfal. Using pagerank to characterize web structure. In COCOON 2002, pages 330--339, Singapore, 2002. Springer-Verlag. LNCS 2387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Davood Rafiei and Alberto Mendelzon. What is this page known for? computing web page reputations. In Proc. 9th World Wide Web Conference, Amsterdam, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Diomidis Spinellis. The decay and failures of web references. Comm. ACM, 46(1):71--77, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. John A. Tomlin. A new paradigm for ranking pages on the world wide web. In Proc. 12th World Wide Web Conference, pages 350--355, Budapest, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Ranking the web frontier

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WWW '04: Proceedings of the 13th international conference on World Wide Web
      May 2004
      754 pages
      ISBN:158113844X
      DOI:10.1145/988672

      Copyright © 2004 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 May 2004

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%

      Upcoming Conference

      WWW '24
      The ACM Web Conference 2024
      May 13 - 17, 2024
      Singapore , Singapore

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!