ABSTRACT
PageRank computes the importance of each node in a directed graph under a random surfer model governed by a teleportation parameter. Commonly denoted alpha, this parameter models the probability of following an edge inside the graph or, when the graph comes from a network of web pages and links, clicking a link on a web page. We empirically measure the teleportation parameter based on browser toolbar logs and a click trail analysis. For a particular user or machine, such analysis produces a value of alpha. We find that these values nicely fit a Beta distribution with mean edge-following probability between 0.3 and 0.7, depending on the site. Using these distributions, we compute PageRank scores where PageRank is computed with respect to a distribution as the teleportation parameter, rather than a constant teleportation parameter. These new metrics are evaluated on the graph of pages in Wikipedia.
- S. Abiteboul, M. Preda, and G. Cobena. Adaptive on-line page importance computation. In Proceedings of the 12th international conference on the World Wide Web, pages 280--290, New York, NY, USA, 2003. ACM Press. Google Scholar
Digital Library
- K. Avrachenkov, N. Litvak, and K. S. Pham. Distribution of PageRank mass among principle components of the web. In A. Bonato and F. C. Graham, editors, Proceedings of the 5th Workshop on Algorithms and Models for the Web Graph (WAW2007), volume 4863 of Lecture Notes in Computer Science, pages 16--28. Springer, 2007. Google Scholar
Digital Library
- P. Berkhin, U. M. Fayyad, P. Raghavan, and A. Tomkins. User-sensitive PageRank. United States Patent Application 20080010281, January 2008.Google Scholar
- P. Boldi. TotalRank: Ranking without damping. In Poster Proceedings of the 14th international conference on the World Wide Web (WWW2005), pages 898--899, 2005. Google Scholar
Digital Library
- P. Boldi, M. Santini, and S. Vigna. PageRank as a function of the damping factor. In Proceedings of the 14th international conference on the World Wide Web (WWW2005), Chiba, Japan, 2005. ACM Press. Google Scholar
Digital Library
- L. D. Catledge and J. E. Pitkow. Characterizing browsing strategies in the world-wide web. Computer Networks and ISDN Systems, 27(6):1065--1073, 1995. Google Scholar
Digital Library
- P. G. Constantine and D. F. Gleich. Using polynomial chaos to compute the influence of multiple random surfers in the PageRank model. In A. Bonato and F. C. Graham, editors, Proceedings of the 5th Workshop on Algorithms and Models for the Web Graph (WAW2007), volume 4863 of Lecture Notes in Computer Science, pages 82--95. Springer, 2007. Google Scholar
Digital Library
- P. G. Constantine, D. F. Gleich, and G. Iaccarino. Spectral methods for parameterized matrix equations. arXiv, April 2009.Google Scholar
- V. Freschi. Protein function prediction from interaction networks using a random walk ranking algorithm. In Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering (BIBE 2007), pages 42--48. IEEE, October 2007.Google Scholar
Cross Ref
- D. F. Gleich. Models and Algorithms for PageRank Sensitivity. PhD thesis, Stanford University, September 2009. Google Scholar
Digital Library
- D. J. Higham. Google PageRank as mean playing time for pinball on the reverse web. Applied Mathematics Letters, 18(12):1359 -- 1362, December 2005.Google Scholar
Cross Ref
- B. A. Huberman, P. L. T. Pirolli, J. E. Pitkow, and R. M. Lukose. Strong regularities in World Wide Web surfing. Science, 280(5360):95--97, 1998.Google Scholar
Cross Ref
- J. Kamps and M. Koolen. Is Wikipedia link structure di fferent? In WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 232--241, New York, NY, USA, 2009. ACM. Google Scholar
Digital Library
- D. Koschutzki, K. A. Lehmann, L. Peeters, S. Richter, D. Tenfelde-Podehl, , and O. Zlotowski. Centrality Indicies, volume 3418 of Lecture Notes in Computer Science, chapter 3, pages 16--61. Springer, 2005.Google Scholar
- A. N. Langville and C. D. Meyer. Google's PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, 2006. Google Scholar
Digital Library
- Y. Liu, B. Gao, T.-Y. Liu, Y. Zhang, Z. Ma, S. He, and H. Li. BrowseRank: letting web users vote for page importance. In SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 451--458, New York, NY, USA, 2008. ACM. Google Scholar
Digital Library
- J. C. Miller, G. Rae, F. Schaefer, L. A. Ward, T. LoFaro, and A. Farahat. Modifications of Kleinberg's HITS algorithm using matrix exponentiation and web log records. In SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 444--445, New York, NY, USA, 2001. ACM. Google Scholar
Digital Library
- J. L. Morrison, R. Breitling, D. J. Higham, and D. R. Gilbert. GeneRank: using search engine technology for the analysis of microarray experiments. BMC Bioinformatics, 6(1):233, 2005.Google Scholar
Cross Ref
- M. A. Najork, H. Zaragoza, and M. J. Taylor. HITS on the web: how does it compare? In Proceedings of the 30th annual international ACM SIGIR conference on Research and Development in information retrieval (SIGIR2007), pages 471--478, New York, NY, USA, 2007. ACM. Google Scholar
Digital Library
- R. Ospina and S. L. P. Ferrari. Inflated beta distributions. Statistical Papers, 51(1):111--126, January 2010.Google Scholar
Cross Ref
- L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford University, November 1999.Google Scholar
- J.-Y. Pan, H.-J. Yang, C. Faloutsos, and P. Duygulu. Automatic multimedia cross-modal correlation discovery. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 653--658, New York, NY, USA, 2004. ACM. Google Scholar
Digital Library
- R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2009. ISBN 3-900051-07-0.Google Scholar
- D. M. Stasinopoulos and R. A. Rigby. Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software, 23(7):1--46, December 2007.Google Scholar
Cross Ref
- S. Vigna, R. Posenato, M. Santini, and S. Vigna. LAW 1.3.1: Library of algorithms for the webgraph. http://law.dsi.unimi.it/software/docs/, 2008.Google Scholar
- M. Wang. A significant improvement to clever algorithm in hyperlinked environment. In Proceedings of the 11th international conference on the World Wide Web (WWW2002), 2002.Google Scholar
- R. W. White and S. M. Drucker. Investigating behavioral variability in web search. In Proceedings of the 16th international conference on the World Wide Web (WWW2007), pages 21--30, New York, NY, USA, 2007. ACM Press. Google Scholar
Digital Library
- R. S. Wills and I. C. F. Ipsen. Ordinal ranking for Google's PageRank. SIAM Journal on Matrix Analysis and Applications, 30:1677--1696, January 2009. Google Scholar
Digital Library
- A. D. Wissner-Gross. Preparation of topical reading lists from the link structure of Wikipedia. In ICALT '06: Proceedings of the Sixth IEEE International Conference on Advanced Learning Technologies, pages 825--829, Washington, DC, USA, 2006. IEEE Computer Society. Google Scholar
Digital Library
- G.-R. Xue, H.-J. Zeng, Z. Chen, W.-Y. Ma, H. Zhang, and C.-J. Lu. User access pattern enhanced small web search. In Poster Proceedings of the 12th international conference on the World Wide Web (WWW2003), 2003.Google Scholar
- D. Zhou, J. Huang, and B. Scholkopf. Learning from labeled and unlabeled data on a directed graph. In ICML '05: Proceedings of the 22nd International Conference on Machine Learning, pages 1036--1043, New York, NY, USA, 2005. ACM Press. Google Scholar
Digital Library
Index Terms
Tracking the random surfer: empirically measured teleportation parameters in PageRank
Recommendations
Random Surfer with Back Step
The World Wide Web with its billions of hyperlinked documents is a huge and important resource of information. There is a necessity of filtering this information. Link analysis of the Web graph turned out to be a powerful tool for automatically ...
Random Surfer with Back Step
The World Wide Web with its billions of hyperlinked documents is a huge and important resource of information. There is a necessity of filtering this information. Link analysis of the Web graph turned out to be a powerful tool for automatically ...
A Googol of Information about Google
Timothy P. Chartier reviews Google's PageRank and Beyond: The Science of Search Engine Rankings by Amy Langville and Carl Meyer.






Comments