skip to main content
10.1145/1031171.1031248acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Local methods for estimating pagerank values

Published:13 November 2004Publication History

ABSTRACT

The Google search engine uses a method called PageRank, together with term-based and other ranking techniques, to order search results returned to the user. PageRank uses link analysis to assign a global importance score to each web page. The PageRank scores of all the pages are usually determined off-line in a large-scale computation on the entire hyperlink graph of the web, and several recent studies have focused on improving the efficiency of this computation, which may require multiple hours on a workstation.

However, in some scenarios, such as online analysis of link evolution and mining of large web archives such as the Internet Archive, it may be desirable to quickly approximate or update the PageRanks of individual nodes without performing a large-scale computation on the entire graph. We address this problem by studying several methods for efficiently estimating the PageRank score of a particular web page using only a small subgraph of the entire web. In our model, we assume that the graph is accessible remotely via a link database (such as the AltaVista Connectivity Server) or is stored in a relational database that performs lookups on disks to retrieve node and connectivity information. We show that a reasonable estimate of the PageRank value of a node is possible in most cases by retrieving only a moderate number of nodes in the local neighborhood of the node.

References

  1. S. Abiteboul, M. Preda, and G. Cobena. Adaptive on-line page importance computation. In Proc. of the 12th Int. World Wide Web Conference, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Arasu, J. Cho, H. Garcia-Molina, and S. Raghavan. Searching the web. ACM Transactions on Internet Technologies, 1(1), June 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Arasu, J. Novak, Tomkins A, and J. Tomlin. Pagerank computation and the structure of the web: Experiments and algorithms. In Poster presentation at the 11th Int. World Wide Web Conference, May 2002.Google ScholarGoogle Scholar
  4. R. Baeza-Yates, F. Saint-Jean, and C. Castillo. Web dynamics, age and page quality. In String Processing and Information Retrieval (SPIRE), September 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Bharat, A. Broder, M. Henzinger, P. Kumar, and S. Venkatasubramanian. The connectivity server: Fast access to linkage information on the web. In 7th Int. World Wide Web Conference, May 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. Bharat and M. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. In Proc. 21st Int. Conf. on Research and Development in Inf. Retrieval (SIGIR), August 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proc. of the Seventh World Wide Web Conference, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan. Automatic resource list compilation by analyzing hyperlink structure and associated text. In Proc. of the 7th Int. World Wide Web Conference, May 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Chakrabarti, B. Dom, R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, David Gibson, and J. Kleinberg. Mining the web's link structure. IEEE Computer, 32(8):60--67, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Chen, Q. Gan, and T. Suel. I/O-efficient techniques for computing pagerank. In Proc. of the 11th International Conf. on Information and Knowledge Management, pages 549--557, November 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Chien, C. Dwork, R. Kumar, D. Simon, and D. Sivakumar. Link evolution: Analysis and algorithms. In Workshop on Algorithms and Models for the Web Graph, 2002.Google ScholarGoogle Scholar
  12. J. Cho, H. Garcia-Molina, and L. Page. Efficient crawling through URL ordering. In 7th Int. World Wide Web Conference, May 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T.H. Haveliwala. Efficient computation of pagerank. Technical report, Stanford University, October 1999. Available at tt http://dbpubs.stanford.edu:8090/pub/1999-31.Google ScholarGoogle Scholar
  14. T.H. Haveliwala. Topic-sensitive pagerank. In Proc. of the 11th Int. World Wide Web Conference, May 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. R. Henzinger, A. Heydon, M. Mitzenmacher, and M. Najork. On near-uniform URL sampling. In Proc. of the 9th Int. World Wide Web Conference, May 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Jeh and J. Widom. Scaling personalized web search. In 12th Int. World Wide Web Conference, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Kamvar, T. Haveliwala, C. Manning, and G. Golub. Extrapolation methods for accelerating pagerank computations. In Proc. of the 12th Int. World Wide Web Conference, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y. Li. Toward a qualitative search engine. IEEE Internet Computing, August 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Najork and J. Wiener. Breadth-first search crawling yields high-quality pages. In 10th Int. World Wide Web Conference, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Ng, A. Zheng, and M. Jordan. Stable algorithms for link analysis. In Proc. of the 24th Annual SIGIR Conf. on Research and Development in Information Retrieval, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Computer Science Department, Stanford University, 1999.Google ScholarGoogle Scholar
  23. G. Pandurangan, P. Raghavan, and E. Upfal. Using pagerank to characterize web structure. In Proc. of the 8th Annual Int. Computing and Combinatorics Conference (COCOON), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Richardson and P. Domingos. The intelligent surfer: Probabilistic combination of link and content information in pagerank. In Advances in Neural Information Processing Systems, 2002.Google ScholarGoogle Scholar
  25. V. Shkapenyuk and T. Suel. Design and implementation of a high-performance distributed web crawler. In Proc. of the Int. Conf. on Data Engineering, February 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Local methods for estimating pagerank values

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!