ABSTRACT
We present a framework for approximating random-walk based probability distributions over Web pages using graph aggregation. We (1) partition the Web's graph into classes of quasi-equivalent vertices, (2) project the page-based random walk to be approximated onto those classes, and (3) compute the stationary probability distribution of the resulting class-based random walk. From this distribution we can quickly reconstruct a distribution on pages. Inparticular, our framework can approximate the well-known PageRank distribution by setting the classes according to the set of pages on each Web host. We experimented on a Web-graph containing over 1.4 billion pages, and were able to produce a ranking that has Spearman rank-order correlation of 0.95 with respect to PageRank. A simplistic implementation of our method required less than half the running time of a highly optimized implementation of PageRank, implying that larger speedup factors are probably possible.
- S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proc. 7th International WWW Conference, pages 107--117, 1998. Google Scholar
Digital Library
- T. H. Haveliwala. Efficient computation of pagerank. Technical Report Technical Report, Stanford University, October 1999.Google Scholar
- T. H. Haveliwala. Topic-sensitive pagerank. In Proc. 11th International WWW Conference (WWW2002), 2002. Google Scholar
Digital Library
- G. Jeh and J. Widom. Scaling personalized web search. In Proc. 12th International WWW Conference (WWW2003), Budapest, Hungary, pages 271--279, 2003. Google Scholar
Digital Library
- S. D. Kamvar, T. H. Haveliwala, C. D. Manning, and G. H. Golub. Exploiting the block structure of the web for computating pagerank. Technical Report Technical Report, Stanford University, March 2003.Google Scholar
Index Terms
Efficient pagerank approximation via graph aggregation
Recommendations
Beyond PageRank: machine learning for static ranking
WWW '06: Proceedings of the 15th international conference on World Wide WebSince the publication of Brin and Page's paper on PageRank, many in the Web community have depended on PageRank for the static (query-independent) ordering of Web pages. We show that we can significantly outperform PageRank using features that are ...
Efficient PageRank approximation via graph aggregation
We present a framework for approximating random-walk based probability distributions over Web pages using graph aggregation. The basic idea is to partition the graph into classes of quasi-equivalent vertices, to project the page-based random walk to be ...
Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search
The original PageRank algorithm for improving the ranking of search-query results computes a single vector, using the link structure of the Web, to capture the relative “importance” of Web pages, independent of any particular search query. To yield more ...





Comments