skip to main content
10.1145/1772690.1772730acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Tracking the random surfer: empirically measured teleportation parameters in PageRank

Published:26 April 2010Publication History

ABSTRACT

PageRank computes the importance of each node in a directed graph under a random surfer model governed by a teleportation parameter. Commonly denoted alpha, this parameter models the probability of following an edge inside the graph or, when the graph comes from a network of web pages and links, clicking a link on a web page. We empirically measure the teleportation parameter based on browser toolbar logs and a click trail analysis. For a particular user or machine, such analysis produces a value of alpha. We find that these values nicely fit a Beta distribution with mean edge-following probability between 0.3 and 0.7, depending on the site. Using these distributions, we compute PageRank scores where PageRank is computed with respect to a distribution as the teleportation parameter, rather than a constant teleportation parameter. These new metrics are evaluated on the graph of pages in Wikipedia.

References

  1. S. Abiteboul, M. Preda, and G. Cobena. Adaptive on-line page importance computation. In Proceedings of the 12th international conference on the World Wide Web, pages 280--290, New York, NY, USA, 2003. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Avrachenkov, N. Litvak, and K. S. Pham. Distribution of PageRank mass among principle components of the web. In A. Bonato and F. C. Graham, editors, Proceedings of the 5th Workshop on Algorithms and Models for the Web Graph (WAW2007), volume 4863 of Lecture Notes in Computer Science, pages 16--28. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Berkhin, U. M. Fayyad, P. Raghavan, and A. Tomkins. User-sensitive PageRank. United States Patent Application 20080010281, January 2008.Google ScholarGoogle Scholar
  4. P. Boldi. TotalRank: Ranking without damping. In Poster Proceedings of the 14th international conference on the World Wide Web (WWW2005), pages 898--899, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Boldi, M. Santini, and S. Vigna. PageRank as a function of the damping factor. In Proceedings of the 14th international conference on the World Wide Web (WWW2005), Chiba, Japan, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. D. Catledge and J. E. Pitkow. Characterizing browsing strategies in the world-wide web. Computer Networks and ISDN Systems, 27(6):1065--1073, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. G. Constantine and D. F. Gleich. Using polynomial chaos to compute the influence of multiple random surfers in the PageRank model. In A. Bonato and F. C. Graham, editors, Proceedings of the 5th Workshop on Algorithms and Models for the Web Graph (WAW2007), volume 4863 of Lecture Notes in Computer Science, pages 82--95. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. G. Constantine, D. F. Gleich, and G. Iaccarino. Spectral methods for parameterized matrix equations. arXiv, April 2009.Google ScholarGoogle Scholar
  9. V. Freschi. Protein function prediction from interaction networks using a random walk ranking algorithm. In Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering (BIBE 2007), pages 42--48. IEEE, October 2007.Google ScholarGoogle ScholarCross RefCross Ref
  10. D. F. Gleich. Models and Algorithms for PageRank Sensitivity. PhD thesis, Stanford University, September 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. J. Higham. Google PageRank as mean playing time for pinball on the reverse web. Applied Mathematics Letters, 18(12):1359 -- 1362, December 2005.Google ScholarGoogle ScholarCross RefCross Ref
  12. B. A. Huberman, P. L. T. Pirolli, J. E. Pitkow, and R. M. Lukose. Strong regularities in World Wide Web surfing. Science, 280(5360):95--97, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  13. J. Kamps and M. Koolen. Is Wikipedia link structure di fferent? In WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 232--241, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Koschutzki, K. A. Lehmann, L. Peeters, S. Richter, D. Tenfelde-Podehl, , and O. Zlotowski. Centrality Indicies, volume 3418 of Lecture Notes in Computer Science, chapter 3, pages 16--61. Springer, 2005.Google ScholarGoogle Scholar
  15. A. N. Langville and C. D. Meyer. Google's PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Liu, B. Gao, T.-Y. Liu, Y. Zhang, Z. Ma, S. He, and H. Li. BrowseRank: letting web users vote for page importance. In SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 451--458, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. C. Miller, G. Rae, F. Schaefer, L. A. Ward, T. LoFaro, and A. Farahat. Modifications of Kleinberg's HITS algorithm using matrix exponentiation and web log records. In SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 444--445, New York, NY, USA, 2001. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. L. Morrison, R. Breitling, D. J. Higham, and D. R. Gilbert. GeneRank: using search engine technology for the analysis of microarray experiments. BMC Bioinformatics, 6(1):233, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  19. M. A. Najork, H. Zaragoza, and M. J. Taylor. HITS on the web: how does it compare? In Proceedings of the 30th annual international ACM SIGIR conference on Research and Development in information retrieval (SIGIR2007), pages 471--478, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Ospina and S. L. P. Ferrari. Inflated beta distributions. Statistical Papers, 51(1):111--126, January 2010.Google ScholarGoogle ScholarCross RefCross Ref
  21. L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford University, November 1999.Google ScholarGoogle Scholar
  22. J.-Y. Pan, H.-J. Yang, C. Faloutsos, and P. Duygulu. Automatic multimedia cross-modal correlation discovery. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 653--658, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2009. ISBN 3-900051-07-0.Google ScholarGoogle Scholar
  24. D. M. Stasinopoulos and R. A. Rigby. Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software, 23(7):1--46, December 2007.Google ScholarGoogle ScholarCross RefCross Ref
  25. S. Vigna, R. Posenato, M. Santini, and S. Vigna. LAW 1.3.1: Library of algorithms for the webgraph. http://law.dsi.unimi.it/software/docs/, 2008.Google ScholarGoogle Scholar
  26. M. Wang. A significant improvement to clever algorithm in hyperlinked environment. In Proceedings of the 11th international conference on the World Wide Web (WWW2002), 2002.Google ScholarGoogle Scholar
  27. R. W. White and S. M. Drucker. Investigating behavioral variability in web search. In Proceedings of the 16th international conference on the World Wide Web (WWW2007), pages 21--30, New York, NY, USA, 2007. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. S. Wills and I. C. F. Ipsen. Ordinal ranking for Google's PageRank. SIAM Journal on Matrix Analysis and Applications, 30:1677--1696, January 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. D. Wissner-Gross. Preparation of topical reading lists from the link structure of Wikipedia. In ICALT '06: Proceedings of the Sixth IEEE International Conference on Advanced Learning Technologies, pages 825--829, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. G.-R. Xue, H.-J. Zeng, Z. Chen, W.-Y. Ma, H. Zhang, and C.-J. Lu. User access pattern enhanced small web search. In Poster Proceedings of the 12th international conference on the World Wide Web (WWW2003), 2003.Google ScholarGoogle Scholar
  31. D. Zhou, J. Huang, and B. Scholkopf. Learning from labeled and unlabeled data on a directed graph. In ICML '05: Proceedings of the 22nd International Conference on Machine Learning, pages 1036--1043, New York, NY, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Tracking the random surfer: empirically measured teleportation parameters in PageRank

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    ePub

    View this article in ePub.

    View ePub
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!