ABSTRACT
The Google search engine uses a method called PageRank, together with term-based and other ranking techniques, to order search results returned to the user. PageRank uses link analysis to assign a global importance score to each web page. The PageRank scores of all the pages are usually determined off-line in a large-scale computation on the entire hyperlink graph of the web, and several recent studies have focused on improving the efficiency of this computation, which may require multiple hours on a workstation.
However, in some scenarios, such as online analysis of link evolution and mining of large web archives such as the Internet Archive, it may be desirable to quickly approximate or update the PageRanks of individual nodes without performing a large-scale computation on the entire graph. We address this problem by studying several methods for efficiently estimating the PageRank score of a particular web page using only a small subgraph of the entire web. In our model, we assume that the graph is accessible remotely via a link database (such as the AltaVista Connectivity Server) or is stored in a relational database that performs lookups on disks to retrieve node and connectivity information. We show that a reasonable estimate of the PageRank value of a node is possible in most cases by retrieving only a moderate number of nodes in the local neighborhood of the node.
- S. Abiteboul, M. Preda, and G. Cobena. Adaptive on-line page importance computation. In Proc. of the 12th Int. World Wide Web Conference, May 2003. Google Scholar
Digital Library
- A. Arasu, J. Cho, H. Garcia-Molina, and S. Raghavan. Searching the web. ACM Transactions on Internet Technologies, 1(1), June 2001. Google Scholar
Digital Library
- A. Arasu, J. Novak, Tomkins A, and J. Tomlin. Pagerank computation and the structure of the web: Experiments and algorithms. In Poster presentation at the 11th Int. World Wide Web Conference, May 2002.Google Scholar
- R. Baeza-Yates, F. Saint-Jean, and C. Castillo. Web dynamics, age and page quality. In String Processing and Information Retrieval (SPIRE), September 2002. Google Scholar
Digital Library
- K. Bharat, A. Broder, M. Henzinger, P. Kumar, and S. Venkatasubramanian. The connectivity server: Fast access to linkage information on the web. In 7th Int. World Wide Web Conference, May 1998. Google Scholar
Digital Library
- K. Bharat and M. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. In Proc. 21st Int. Conf. on Research and Development in Inf. Retrieval (SIGIR), August 1998. Google Scholar
Digital Library
- S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proc. of the Seventh World Wide Web Conference, 1998. Google Scholar
Digital Library
- S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan. Automatic resource list compilation by analyzing hyperlink structure and associated text. In Proc. of the 7th Int. World Wide Web Conference, May 1998. Google Scholar
Digital Library
- S. Chakrabarti, B. Dom, R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, David Gibson, and J. Kleinberg. Mining the web's link structure. IEEE Computer, 32(8):60--67, 1999. Google Scholar
Digital Library
- Y. Chen, Q. Gan, and T. Suel. I/O-efficient techniques for computing pagerank. In Proc. of the 11th International Conf. on Information and Knowledge Management, pages 549--557, November 2002. Google Scholar
Digital Library
- S. Chien, C. Dwork, R. Kumar, D. Simon, and D. Sivakumar. Link evolution: Analysis and algorithms. In Workshop on Algorithms and Models for the Web Graph, 2002.Google Scholar
- J. Cho, H. Garcia-Molina, and L. Page. Efficient crawling through URL ordering. In 7th Int. World Wide Web Conference, May 1998. Google Scholar
Digital Library
- T.H. Haveliwala. Efficient computation of pagerank. Technical report, Stanford University, October 1999. Available at tt http://dbpubs.stanford.edu:8090/pub/1999-31.Google Scholar
- T.H. Haveliwala. Topic-sensitive pagerank. In Proc. of the 11th Int. World Wide Web Conference, May 2002. Google Scholar
Digital Library
- M. R. Henzinger, A. Heydon, M. Mitzenmacher, and M. Najork. On near-uniform URL sampling. In Proc. of the 9th Int. World Wide Web Conference, May 2000. Google Scholar
Digital Library
- G. Jeh and J. Widom. Scaling personalized web search. In 12th Int. World Wide Web Conference, 2003. Google Scholar
Digital Library
- S. Kamvar, T. Haveliwala, C. Manning, and G. Golub. Extrapolation methods for accelerating pagerank computations. In Proc. of the 12th Int. World Wide Web Conference, May 2003. Google Scholar
Digital Library
- J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999. Google Scholar
Digital Library
- Y. Li. Toward a qualitative search engine. IEEE Internet Computing, August 1998. Google Scholar
Digital Library
- M. Najork and J. Wiener. Breadth-first search crawling yields high-quality pages. In 10th Int. World Wide Web Conference, 2001. Google Scholar
Digital Library
- A. Ng, A. Zheng, and M. Jordan. Stable algorithms for link analysis. In Proc. of the 24th Annual SIGIR Conf. on Research and Development in Information Retrieval, 2001. Google Scholar
Digital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Computer Science Department, Stanford University, 1999.Google Scholar
- G. Pandurangan, P. Raghavan, and E. Upfal. Using pagerank to characterize web structure. In Proc. of the 8th Annual Int. Computing and Combinatorics Conference (COCOON), 2002. Google Scholar
Digital Library
- M. Richardson and P. Domingos. The intelligent surfer: Probabilistic combination of link and content information in pagerank. In Advances in Neural Information Processing Systems, 2002.Google Scholar
- V. Shkapenyuk and T. Suel. Design and implementation of a high-performance distributed web crawler. In Proc. of the Int. Conf. on Data Engineering, February 2002. Google Scholar
Digital Library
Index Terms
Local methods for estimating pagerank values
Recommendations
Beyond PageRank: machine learning for static ranking
WWW '06: Proceedings of the 15th international conference on World Wide WebSince the publication of Brin and Page's paper on PageRank, many in the Web community have depended on PageRank for the static (query-independent) ordering of Web pages. We show that we can significantly outperform PageRank using features that are ...
I/O-efficient techniques for computing pagerank
CIKM '02: Proceedings of the eleventh international conference on Information and knowledge managementOver the last few years, most major search engines have integrated link-based ranking techniques in order to provide more accurate search results. One widely known approach is the Pagerank technique, which forms the basis of the Google ranking scheme, ...
Associated pagerank: improved pagerank measured by frequent term sets
VECIMS'09: Proceedings of the 2009 IEEE international conference on Virtual Environments, Human-Computer Interfaces and Measurement SystemsWeb search engines encounter many new challenges while the amount of information on the web increases rapidly. Web documents have been a main resource for various purposes, and people rely on search engines to retrieve the desired documents. This paper ...






Comments