skip to main content
research-article
Public Access

A Structural Result for Personalized PageRank and its Algorithmic Consequences

Published:19 June 2019Publication History
Skip Abstract Section

Abstract

Many systems, such as the Internet, social networks, and the power grid, can be represented as graphs. When analyzing graphs, it is often useful to compute scores describing the relative importance or distance between nodes. One example is Personalized PageRank (PPR), which assigns to each node v a vector whose i-th entry describes the importance of the i-th node from the perspective of v. PPR has proven useful in many applications, such as recommending who users should follow on social networks (if this i-th entry is large, v may be interested in following the i-th user). Unfortunately, computing n PPR vectors exactly for a graph of n nodes has complexity O(n^3), which is infeasible for many graphs of interest. In this work, we devise a scheme to estimate all n PPR vectors with bounded l_1 error and complexity O(nc), where c < 2 depends on the degrees of the graph at hand, the desired error tolerance, and a parameter that defines PPR. This improves upon existing methods, the best of which have complexity O(n2 łog n) in our setting. Our complexity guarantee holds with high probability, for certain choices of the PPR parameter, and for a certain class of random graphs (roughly speaking, the sparse directed configuration model with heavy-tailed in-degrees); our accuracy guarantee holds with probability 1 and for arbitrary graphs and PPR parameters. The complexity result arises as a consequence of our main (structural) result, which shows that the dimensionality of the set of PPR vectors scales sublinearly in n with high probability, for the same class of random graphs and for a notion of dimensionality similar to matrix rank. It is this coupling of the PPR vectors for the nodes on a common underlying graph that allows for estimating them faster. Hence, at a high level, our scheme is analogous to (but distinct from) low-rank matrix approximation. We also note that our scheme is similar to one that was proposed in [Jeh and Widom 2003] but lacked accuracy and complexity guarantees, so another contribution of our paper is to address this gap in the literature.

References

  1. David J Aldous, Antar Bandyopadhyay, et almbox. 2005. A survey of max-type recursive distributional equations. The Annals of Applied Probability, Vol. 15, 2 (2005), 1047--1110.Google ScholarGoogle ScholarCross RefCross Ref
  2. Brian Amento, Loren Terveen, and Will Hill. 2000. Does "authority" mean quality? Predicting expert quality ratings of Web documents. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 296--303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Reid Andersen, Christian Borgs, Jennifer Chayes, John Hopcroft, Vahab Mirrokni, and Shang-Hua Teng. 2008. Local computation of PageRank contributions. Internet Mathematics, Vol. 5, 1--2 (2008), 23--45.Google ScholarGoogle ScholarCross RefCross Ref
  4. Reid Andersen, Fan Chung, and Kevin Lang. 2006. Local graph partitioning using PageRank vectors. In 2006 IEEE Symposium on Foundations of Computer Science. IEEE, 475--486. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Konstantin Avrachenkov, Arun Kadavankandy, Liudmila Ostroumova Prokhorenkova, and Andrei Raigorodskii. 2015. PageRank in undirected random graphs. In International Workshop on Algorithms and Models for the Web-Graph. Springer, 151--163. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Konstantin Avrachenkov and Dmitri Lebedev. 2006. PageRank of scale-free growing networks. Internet Mathematics, Vol. 3, 2 (2006), 207--231.Google ScholarGoogle ScholarCross RefCross Ref
  7. Konstantin Avrachenkov, Nelly Litvak, Danil Nemirovsky, and Natalia Osipova. 2007. Monte Carlo methods in PageRank computation: When one iteration is sufficient. SIAM J. Numer. Anal., Vol. 45, 2 (2007), 890--904. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science, Vol. 286, 5439 (1999), 509--512.Google ScholarGoogle Scholar
  9. Edward A Bender and E Rodney Canfield. 1978. The asymptotic number of labeled graphs with given degree sequences. Journal of Combinatorial Theory, Series A, Vol. 24, 3 (1978), 296--307.Google ScholarGoogle Scholar
  10. Paolo Boldi, Massimo Santini, and Sebastiano Vigna. {n. d.}. Laboratory for Web Algorithmics datasets. http://law.di.unimi.it/datasets.php .Google ScholarGoogle Scholar
  11. Paolo Boldi, Massimo Santini, and Sebastiano Vigna. 2005. PageRank as a function of the damping factor. In Proceedings of the 14th international conference on World Wide Web. ACM, 557--566. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Paolo Boldi, Massimo Santini, and Sebastiano Vigna. 2009. PageRank: functional dependencies. ACM Transactions on Information Systems (TOIS), Vol. 27, 4 (2009), 19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph framework I: Compression techniques. In Proceedings of the 13th International Conference on World Wide Web. ACM, 595--602. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Béla Bollobás. 1980. A probabilistic proof of an asymptotic formula for the number of labelled regular graphs. European Journal of Combinatorics, Vol. 1, 4 (1980), 311--316.Google ScholarGoogle Scholar
  15. Charles Bordenave. {n. d.}. Lecture notes on random graphs and probabilistic combinatorial optimization. https://www.math.univ-toulouse.fr/ bordenave/coursRG.pdf .Google ScholarGoogle Scholar
  16. Charles Bordenave, Pietro Caputo, and Justin Salez. 2016. Cutoff at the "entropic time" for sparse Markov chains. Probability Theory and Related Fields (2016), 1--32.Google ScholarGoogle Scholar
  17. Charles Bordenave, Pietro Caputo, and Justin Salez. 2018. Random walk on sparse random digraphs. Probability Theory and Related Fields, Vol. 170, 3--4 (2018), 933--960.Google ScholarGoogle ScholarCross RefCross Ref
  18. Christian Borgs, Michael Brautbar, Jennifer Chayes, and Shang-Hua Teng. 2014. Multiscale matrix sampling and sublinear-time pagerank computation. Internet Mathematics, Vol. 10, 1--2 (2014), 20--48.Google ScholarGoogle ScholarCross RefCross Ref
  19. Moses Charikar and Amit Sahai. 2002. Dimension reduction in the l1 norm. In 2002 IEEE Symposium on Foundations of Computer Science. IEEE, 551--560. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ningyuan Chen, Nelly Litvak, and Mariana Olvera-Cravioto. 2014. PageRank in scale-free random graphs. In International Workshop on Algorithms and Models for the Web-Graph. Springer, 120--131.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ningyuan Chen, Nelly Litvak, and Mariana Olvera-Cravioto. 2017. Generalized PageRank on directed configuration networks. Random Structures & Algorithms, Vol. 51, 2 (2017), 237--274.Google ScholarGoogle ScholarCross RefCross Ref
  22. Ningyuan Chen and Mariana Olvera-Cravioto. 2013. Directed random graphs with given degree distributions. Stochastic Systems, Vol. 3, 1 (2013), 147--186.Google ScholarGoogle ScholarCross RefCross Ref
  23. Aaron Clauset, Cosma Rohilla Shalizi, and Mark EJ Newman. 2009. Power-law distributions in empirical data. SIAM Rev., Vol. 51, 4 (2009), 661--703. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D.P. Dubhashi and A. Panconesi. 2009. Concentration of Measure for the Analysis of Randomized Algorithms .Cambridge University Press. https://books.google.com/books?id=UUohAwAAQBAJ Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Santo Fortunato, Marián Bogu ná, Alessandro Flammini, and Filippo Menczer. 2006. Approximating PageRank from in-degree. In International Workshop on Algorithms and Models for the Web-Graph. Springer, 59--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R.G. Gallager. 2013. Stochastic Processes: Theory for Applications .Cambridge University Press. 2013005146 https://books.google.com/books?id=ERLrAQAAQBAJGoogle ScholarGoogle ScholarCross RefCross Ref
  27. Alessandro Garavaglia, Remco van der Hofstad, and Nelly Litvak. 2018. Local weak convergence for PageRank. arXiv preprint arXiv:1803.06146 (2018).Google ScholarGoogle Scholar
  28. David F Gleich. 2015. PageRank beyond the Web. SIAM Rev., Vol. 57, 3 (2015), 321--363.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Zadeh. 2013. WTF: The who to follow service at Twitter. In Proceedings of the 22nd international conference on World Wide Web. ACM, 505--514. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems. 1024--1034. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Taher H Haveliwala. 2002. Topic-sensitive PageRank. In Proceedings of the 11th International Conference on World Wide Web. ACM, 517--526. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Piotr Indyk. 2000. Stable distributions, pseudorandom generators, embeddings and data stream computation. In 2000 IEEE Symposium on Foundations of Computer Science. IEEE, 189--197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In Proceedings of the 12th International Conference on World Wide Web. ACM, 271--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A.J. Laub. 2005. Matrix Analysis for Scientists and Engineers .Society for Industrial and Applied Mathematics (SIAM, 3600 Market Street, Floor 6, Philadelphia, PA 19104). https://books.google.com/books?id=nCsLwYJdOlwC Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Daniel D Lee and H Sebastian Seung. 2001. Algorithms for non-negative matrix factorization. In Advances in neural information processing systems. 556--562.Google ScholarGoogle Scholar
  36. Jiung Lee and Mariana Olvera-Cravioto. 2017. PageRank on inhomogeneous random digraphs. arXiv preprint arXiv:1707.02492 (2017).Google ScholarGoogle Scholar
  37. Jure Leskovec and Andrej Krevl. {n. d.}. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data .Google ScholarGoogle Scholar
  38. Nelly Litvak, Werner RW Scheinhardt, and Yana Volkovich. 2007. In-degree and PageRank: Why do they follow similar power laws? Internet Mathematics, Vol. 4, 2--3 (2007), 175--198.Google ScholarGoogle ScholarCross RefCross Ref
  39. Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. 2016. Personalized PageRank estimation and search: A bidirectional approach. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. ACM, 163--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Peter Lofgren and Ashish Goel. 2013. Personalized pagerank to a target node. arXiv preprint arXiv:1304.4658 (2013).Google ScholarGoogle Scholar
  41. Ivan Markovsky. 2011. Low rank approximation: algorithms, implementation, applications .Springer Science & Business Media.Google ScholarGoogle Scholar
  42. Michael Molloy and Bruce Reed. 1995. A critical point for random graphs with a given degree sequence. Random structures & algorithms, Vol. 6, 2--3 (1995), 161--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.Google ScholarGoogle Scholar
  44. Kijung Shin, Jinhong Jung, Sael Lee, and U Kang. 2015. Bear: Block elimination approach for random walk with restart on large graphs. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 1571--1585. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. 2008. Random walk with restart: Fast solutions and applications. Knowledge and Information Systems, Vol. 14, 3 (2008), 327--346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Remco van der Hofstad, Gerard Hooghiemstra, and Piet Van Mieghem. 2005. Distances in random graphs with finite variance degrees. Random Structures & Algorithms, Vol. 27, 1 (2005), 76--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Yana Volkovich and Nelly Litvak. 2010. Asymptotic analysis for personalized web search. Advances in Applied Probability, Vol. 42, 2 (2010), 577--604.Google ScholarGoogle ScholarCross RefCross Ref
  48. Yana Volkovich, Nelly Litvak, and Debora Donato. 2007. Determining factors behind the PageRank log-log plot. In International Workshop on Algorithms and Models for the Web-Graph. Springer, 108--123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Nicholas C Wormald. 1980. Some problems in the enumeration of labelled graphs. Bulletin of the Australian Mathematical Society, Vol. 21, 1 (1980), 159--160.Google ScholarGoogle ScholarCross RefCross Ref
  50. Zhanxing Zhu, Zhirong Yang, and Erkki Oja. 2013. Multiplicative updates for learning with stochastic matrices. In Scandinavian Conference on Image Analysis. Springer, 143--152.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Structural Result for Personalized PageRank and its Algorithmic Consequences

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!