ABSTRACT
The celebrated PageRank algorithm has proved to be a very effective paradigm for ranking results of web search algorithms. In this paper we refine this basic paradigm to take into account several evolving prominent features of the web, and propose several algorithmic innovations. First, we analyze features of the rapidly growing "frontier" of the web, namely the part of the web that crawlers are unable to cover for one reason or another. We analyze the effect of these pages and find it to be significant. We suggest ways to improve the quality of ranking by modeling the growing presence of "link rot" on the web as more sites and pages fall out of maintenance. Finally we suggest new methods of ranking that are motivated by the hierarchical structure of the web, are more efficient than PageRank, and may be more resistant to direct manipulation.
- Serge Abiteboul, Mihai Preda, and Grégory Cobena. Adaptive on-line page importance computation. In Proc. 12th World Wide Web Conference, pages 280--290, 2003. Google Scholar
Digital Library
- Gianni Amati, Iadh Ounis, and Vassilis Plachouras. The dynamic absorbing model for the web. Technical Report TR-2003-137, University of Glasgow, April 2003.Google Scholar
- Arvind Arasu, Jasmine Novak, Andrew S. Tomkins, and John A. Tomlin. Pagerank computation and the structure of the web: Experiments and algorithms. In Poster Proc. WWW2002, Honolulu, 2002.Google Scholar
- T. Berners-Lee, R. Fielding, and L. Masinter. Uniform resource identifiers (URI): Generic syntax. http://www.ietf.org/rfc/rfc2396.txthttp://www.ietf.org/rfc/rfc2396.t%xt. RFC 2396. Google Scholar
Digital Library
- Allan Borodin, Gareth O. Robers, Jeffrey S. Rosenthal, and Panayiotis Tsaparas. Finding authorities and hubs from link structures on the world wide web. In Proc. 10th World Wide Web Conference, pages 415--429, 2001. Google Scholar
Digital Library
- Sergey Brin, Rajeev Motwani, Lawrence Page, and Terry Winograd. What can you do with a web in your pocket? Data Engineering Bulletin, 21:37--47, 1998.Google Scholar
- Soumen Chakrabarti, Byron Dom, David Gibson, Jon M. Kleinberg, Prabhakar Raghavan, and Sridhar Rajagopalan. Automatic resource compilation by analyzing hyperlink structure and associated text. In Proc. 7th World Wide Web Conference, pages 65--74, 1997. Google Scholar
Digital Library
- Yen-Yu Chen, Qingqing Gan, and Torsten Suel. I/O efficient techniques for computing pagerank. In CIKM 2002, pages 549--557, McLean, Virgina, 2002. Google Scholar
Digital Library
- Steve Chien, Cynthia Dwork, Ravi Kumar, and D. Sivakumar. Towards exploiting link evolution. In Workshop on Algorithms and Models for the Web Graph, 2001.Google Scholar
- Junghoo Cho, Hector Garcia-Molina, and Lawrence Page. Efficient crawling through url ordering. In Proc. of 7th World Wide Web Conference, 1998. Google Scholar
Digital Library
- Nick Craswell, David Hawking, and Stephen E. Robertson. Effective site finding using link anchor information. In Proc. of the 24th annual international ACM SIGIR conference on research and development in information retrieval, pages 250--257, New Orleans, Louisiana, USA, September 2001. Association for Computing Machinery. Google Scholar
Digital Library
- Brian D. Davison. Recognizing nepotistic links on the web. In Artificial Intelligence for Web Search, pages 23--28. AAAI Press, July 2000.Google Scholar
- Chris Ding, Xiaofeng He, Parry Husbands, Hongyuan Zha, and Horst Simon. Pagerank, hits and a unified framework for link analysis. In Proc. of 25th ACM SIGIR, pages 353--354, Tampere, Finland, 2002. Google Scholar
Digital Library
- Nadav Eiron and Kevin S. McCurley. Analysis of anchor text for web search. In Proc. of 26th ACM SIGIR, pages 459--460, 2003. Google Scholar
Digital Library
- Nadav Eiron and Kevin S. McCurley. Locality, hierarchy, and bidirectionality in the web. In Workshop on Algorithms and Models for the Web Graph, Budapest, May 2003.Google Scholar
- Nadav Eiron and Kevin S. McCurley. Untangling compound documents on the web. In Proc. 14th ACM Conf. on Hypertext, pages 85--94, 2003. Google Scholar
Digital Library
- Ronald Fagin, Ravi Kumar, Kevin S. McCurley, Jasmine Novak, D. Sivakumar, John A. Tomlin, and David P. Williamson. Searching the workplace web. In Proc. 12th World Wide Web Conference, Budapest, 2003. Google Scholar
Digital Library
- Gene H. Golub and Charles van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, 3rd edition, 1996.Google Scholar
- Siegfried Handschuh, Steffen Staab, and Raphael Volz. On deep annotation. In Proc. 12th World Wide Web Conference, pages 431--438, Budapest, 2003. Google Scholar
Digital Library
- Taher Haveliwala. Efficient computation of pagerank. Technical report, Stanford University, 1999.Google Scholar
- Taher H. Haveliwala. Topic-sensitive pagerank. In Proc. 11th World Wide Web Conference, pages 517--526, Honolulu, 2002. Google Scholar
Digital Library
- Sepandar Kamvar, Taher Haveliwala, and Gene Golub. Adaptive methods for the computation of pagerank. Technical report, April 2003.Google Scholar
- Sepandar D. Kamvar, Taher H. Haveliwala, Christopher D. Manning, and Gene H. Golub. Exploiting the block structure of the web for computing pagerank. Technical report, Stanford University, 2003.Google Scholar
- Sepandar D. Kamvar, Taher H. Haveliwala, Christopher D. Manning, and Gene H. Golub. Extrapolation methods for accelerating pagerank computations. In Proc. 12th World Wide Web Conference, 2003. Google Scholar
Digital Library
- Ronny Lempel and Shlomo Moran. SALSA:the stochastic approach for link-structure analysis. ACM Transactions on Information Systems, 19(2):131--160, 2001. Google Scholar
Digital Library
- John Markwell and David W. Brooks. Link rot limits the usefulness of web-based educational materials in biochemistry and molecular biology. Biochem. Mol. Biol. Educ., 31:69--72, 2003.Google Scholar
Cross Ref
- Marc Najork and Janet L. Wiener. Breadth-first search crawling yields high-quality pages. In Proc. 10th World Wide Web Conference, pages 114--118, 2001. Google Scholar
Digital Library
- Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998. Paper SIDL-WP-1999-0120 (version of 11/11/1999).Google Scholar
- Gopal Pandurangan, Prabhakar Raghavan, and Eli Upfal. Using pagerank to characterize web structure. In COCOON 2002, pages 330--339, Singapore, 2002. Springer-Verlag. LNCS 2387. Google Scholar
Digital Library
- Davood Rafiei and Alberto Mendelzon. What is this page known for? computing web page reputations. In Proc. 9th World Wide Web Conference, Amsterdam, 2000. Google Scholar
Digital Library
- Diomidis Spinellis. The decay and failures of web references. Comm. ACM, 46(1):71--77, 2003. Google Scholar
Digital Library
- John A. Tomlin. A new paradigm for ranking pages on the world wide web. In Proc. 12th World Wide Web Conference, pages 350--355, Budapest, May 2003. Google Scholar
Digital Library
Index Terms
Ranking the web frontier
Recommendations
Content and link-structure perspective of ranking webpages: A review
AbstractThe delivery of ranked relevant results is probably the most important factor in making a web search engine acceptable to its users. This inspiration has led the search engine engineers and researchers to conceive ranking algorithms ...
Incorporating the surfing behavior of web users into pagerank
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementIn large-scale commercial web search engines, estimating the importance of a web page is a crucial ingredient in ranking web search results. So far, to assess the importance of web pages, two different types of feedback have been taken into account, ...
Ranking web sites with real user traffic
WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data MiningWe analyze the traffic-weighted Web host graph obtained from a large sample of real Web users over about seven months. A number of interesting structural properties are revealed by this complex dynamic network, some in line with the well-studied boolean ...





Comments