Abstract
Many systems, such as the Internet, social networks, and the power grid, can be represented as graphs. When analyzing graphs, it is often useful to compute scores describing the relative importance or distance between nodes. One example is Personalized PageRank (PPR), which assigns to each node v a vector whose i-th entry describes the importance of the i-th node from the perspective of v. PPR has proven useful in many applications, such as recommending who users should follow on social networks (if this i-th entry is large, v may be interested in following the i-th user). Unfortunately, computing n PPR vectors exactly for a graph of n nodes has complexity O(n^3), which is infeasible for many graphs of interest. In this work, we devise a scheme to estimate all n PPR vectors with bounded l_1 error and complexity O(nc), where c < 2 depends on the degrees of the graph at hand, the desired error tolerance, and a parameter that defines PPR. This improves upon existing methods, the best of which have complexity O(n2 łog n) in our setting. Our complexity guarantee holds with high probability, for certain choices of the PPR parameter, and for a certain class of random graphs (roughly speaking, the sparse directed configuration model with heavy-tailed in-degrees); our accuracy guarantee holds with probability 1 and for arbitrary graphs and PPR parameters. The complexity result arises as a consequence of our main (structural) result, which shows that the dimensionality of the set of PPR vectors scales sublinearly in n with high probability, for the same class of random graphs and for a notion of dimensionality similar to matrix rank. It is this coupling of the PPR vectors for the nodes on a common underlying graph that allows for estimating them faster. Hence, at a high level, our scheme is analogous to (but distinct from) low-rank matrix approximation. We also note that our scheme is similar to one that was proposed in [Jeh and Widom 2003] but lacked accuracy and complexity guarantees, so another contribution of our paper is to address this gap in the literature.
- David J Aldous, Antar Bandyopadhyay, et almbox. 2005. A survey of max-type recursive distributional equations. The Annals of Applied Probability, Vol. 15, 2 (2005), 1047--1110.Google Scholar
Cross Ref
- Brian Amento, Loren Terveen, and Will Hill. 2000. Does "authority" mean quality? Predicting expert quality ratings of Web documents. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 296--303. Google Scholar
Digital Library
- Reid Andersen, Christian Borgs, Jennifer Chayes, John Hopcroft, Vahab Mirrokni, and Shang-Hua Teng. 2008. Local computation of PageRank contributions. Internet Mathematics, Vol. 5, 1--2 (2008), 23--45.Google Scholar
Cross Ref
- Reid Andersen, Fan Chung, and Kevin Lang. 2006. Local graph partitioning using PageRank vectors. In 2006 IEEE Symposium on Foundations of Computer Science. IEEE, 475--486. Google Scholar
Digital Library
- Konstantin Avrachenkov, Arun Kadavankandy, Liudmila Ostroumova Prokhorenkova, and Andrei Raigorodskii. 2015. PageRank in undirected random graphs. In International Workshop on Algorithms and Models for the Web-Graph. Springer, 151--163. Google Scholar
Digital Library
- Konstantin Avrachenkov and Dmitri Lebedev. 2006. PageRank of scale-free growing networks. Internet Mathematics, Vol. 3, 2 (2006), 207--231.Google Scholar
Cross Ref
- Konstantin Avrachenkov, Nelly Litvak, Danil Nemirovsky, and Natalia Osipova. 2007. Monte Carlo methods in PageRank computation: When one iteration is sufficient. SIAM J. Numer. Anal., Vol. 45, 2 (2007), 890--904. Google Scholar
Digital Library
- Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science, Vol. 286, 5439 (1999), 509--512.Google Scholar
- Edward A Bender and E Rodney Canfield. 1978. The asymptotic number of labeled graphs with given degree sequences. Journal of Combinatorial Theory, Series A, Vol. 24, 3 (1978), 296--307.Google Scholar
- Paolo Boldi, Massimo Santini, and Sebastiano Vigna. {n. d.}. Laboratory for Web Algorithmics datasets. http://law.di.unimi.it/datasets.php .Google Scholar
- Paolo Boldi, Massimo Santini, and Sebastiano Vigna. 2005. PageRank as a function of the damping factor. In Proceedings of the 14th international conference on World Wide Web. ACM, 557--566. Google Scholar
Digital Library
- Paolo Boldi, Massimo Santini, and Sebastiano Vigna. 2009. PageRank: functional dependencies. ACM Transactions on Information Systems (TOIS), Vol. 27, 4 (2009), 19. Google Scholar
Digital Library
- Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph framework I: Compression techniques. In Proceedings of the 13th International Conference on World Wide Web. ACM, 595--602. Google Scholar
Digital Library
- Béla Bollobás. 1980. A probabilistic proof of an asymptotic formula for the number of labelled regular graphs. European Journal of Combinatorics, Vol. 1, 4 (1980), 311--316.Google Scholar
- Charles Bordenave. {n. d.}. Lecture notes on random graphs and probabilistic combinatorial optimization. https://www.math.univ-toulouse.fr/ bordenave/coursRG.pdf .Google Scholar
- Charles Bordenave, Pietro Caputo, and Justin Salez. 2016. Cutoff at the "entropic time" for sparse Markov chains. Probability Theory and Related Fields (2016), 1--32.Google Scholar
- Charles Bordenave, Pietro Caputo, and Justin Salez. 2018. Random walk on sparse random digraphs. Probability Theory and Related Fields, Vol. 170, 3--4 (2018), 933--960.Google Scholar
Cross Ref
- Christian Borgs, Michael Brautbar, Jennifer Chayes, and Shang-Hua Teng. 2014. Multiscale matrix sampling and sublinear-time pagerank computation. Internet Mathematics, Vol. 10, 1--2 (2014), 20--48.Google Scholar
Cross Ref
- Moses Charikar and Amit Sahai. 2002. Dimension reduction in the l1 norm. In 2002 IEEE Symposium on Foundations of Computer Science. IEEE, 551--560. Google Scholar
Digital Library
- Ningyuan Chen, Nelly Litvak, and Mariana Olvera-Cravioto. 2014. PageRank in scale-free random graphs. In International Workshop on Algorithms and Models for the Web-Graph. Springer, 120--131.Google Scholar
Digital Library
- Ningyuan Chen, Nelly Litvak, and Mariana Olvera-Cravioto. 2017. Generalized PageRank on directed configuration networks. Random Structures & Algorithms, Vol. 51, 2 (2017), 237--274.Google Scholar
Cross Ref
- Ningyuan Chen and Mariana Olvera-Cravioto. 2013. Directed random graphs with given degree distributions. Stochastic Systems, Vol. 3, 1 (2013), 147--186.Google Scholar
Cross Ref
- Aaron Clauset, Cosma Rohilla Shalizi, and Mark EJ Newman. 2009. Power-law distributions in empirical data. SIAM Rev., Vol. 51, 4 (2009), 661--703. Google Scholar
Digital Library
- D.P. Dubhashi and A. Panconesi. 2009. Concentration of Measure for the Analysis of Randomized Algorithms .Cambridge University Press. https://books.google.com/books?id=UUohAwAAQBAJ Google Scholar
Digital Library
- Santo Fortunato, Marián Bogu ná, Alessandro Flammini, and Filippo Menczer. 2006. Approximating PageRank from in-degree. In International Workshop on Algorithms and Models for the Web-Graph. Springer, 59--71. Google Scholar
Digital Library
- R.G. Gallager. 2013. Stochastic Processes: Theory for Applications .Cambridge University Press. 2013005146 https://books.google.com/books?id=ERLrAQAAQBAJGoogle Scholar
Cross Ref
- Alessandro Garavaglia, Remco van der Hofstad, and Nelly Litvak. 2018. Local weak convergence for PageRank. arXiv preprint arXiv:1803.06146 (2018).Google Scholar
- David F Gleich. 2015. PageRank beyond the Web. SIAM Rev., Vol. 57, 3 (2015), 321--363.Google Scholar
Digital Library
- Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Zadeh. 2013. WTF: The who to follow service at Twitter. In Proceedings of the 22nd international conference on World Wide Web. ACM, 505--514. Google Scholar
Digital Library
- Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems. 1024--1034. Google Scholar
Digital Library
- Taher H Haveliwala. 2002. Topic-sensitive PageRank. In Proceedings of the 11th International Conference on World Wide Web. ACM, 517--526. Google Scholar
Digital Library
- Piotr Indyk. 2000. Stable distributions, pseudorandom generators, embeddings and data stream computation. In 2000 IEEE Symposium on Foundations of Computer Science. IEEE, 189--197. Google Scholar
Digital Library
- Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In Proceedings of the 12th International Conference on World Wide Web. ACM, 271--279. Google Scholar
Digital Library
- A.J. Laub. 2005. Matrix Analysis for Scientists and Engineers .Society for Industrial and Applied Mathematics (SIAM, 3600 Market Street, Floor 6, Philadelphia, PA 19104). https://books.google.com/books?id=nCsLwYJdOlwC Google Scholar
Digital Library
- Daniel D Lee and H Sebastian Seung. 2001. Algorithms for non-negative matrix factorization. In Advances in neural information processing systems. 556--562.Google Scholar
- Jiung Lee and Mariana Olvera-Cravioto. 2017. PageRank on inhomogeneous random digraphs. arXiv preprint arXiv:1707.02492 (2017).Google Scholar
- Jure Leskovec and Andrej Krevl. {n. d.}. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data .Google Scholar
- Nelly Litvak, Werner RW Scheinhardt, and Yana Volkovich. 2007. In-degree and PageRank: Why do they follow similar power laws? Internet Mathematics, Vol. 4, 2--3 (2007), 175--198.Google Scholar
Cross Ref
- Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. 2016. Personalized PageRank estimation and search: A bidirectional approach. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. ACM, 163--172. Google Scholar
Digital Library
- Peter Lofgren and Ashish Goel. 2013. Personalized pagerank to a target node. arXiv preprint arXiv:1304.4658 (2013).Google Scholar
- Ivan Markovsky. 2011. Low rank approximation: algorithms, implementation, applications .Springer Science & Business Media.Google Scholar
- Michael Molloy and Bruce Reed. 1995. A critical point for random graphs with a given degree sequence. Random structures & algorithms, Vol. 6, 2--3 (1995), 161--180. Google Scholar
Digital Library
- Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.Google Scholar
- Kijung Shin, Jinhong Jung, Sael Lee, and U Kang. 2015. Bear: Block elimination approach for random walk with restart on large graphs. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 1571--1585. Google Scholar
Digital Library
- Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. 2008. Random walk with restart: Fast solutions and applications. Knowledge and Information Systems, Vol. 14, 3 (2008), 327--346. Google Scholar
Digital Library
- Remco van der Hofstad, Gerard Hooghiemstra, and Piet Van Mieghem. 2005. Distances in random graphs with finite variance degrees. Random Structures & Algorithms, Vol. 27, 1 (2005), 76--123. Google Scholar
Digital Library
- Yana Volkovich and Nelly Litvak. 2010. Asymptotic analysis for personalized web search. Advances in Applied Probability, Vol. 42, 2 (2010), 577--604.Google Scholar
Cross Ref
- Yana Volkovich, Nelly Litvak, and Debora Donato. 2007. Determining factors behind the PageRank log-log plot. In International Workshop on Algorithms and Models for the Web-Graph. Springer, 108--123.Google Scholar
Digital Library
- Nicholas C Wormald. 1980. Some problems in the enumeration of labelled graphs. Bulletin of the Australian Mathematical Society, Vol. 21, 1 (1980), 159--160.Google Scholar
Cross Ref
- Zhanxing Zhu, Zhirong Yang, and Erkki Oja. 2013. Multiplicative updates for learning with stochastic matrices. In Scandinavian Conference on Image Analysis. Springer, 143--152.Google Scholar
Cross Ref
Index Terms
A Structural Result for Personalized PageRank and its Algorithmic Consequences
Recommendations
Personalized PageRank to a Target Node, Revisited
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningPersonalized PageRank (PPR) is a widely used node proximity measure in graph mining and network analysis. Given a source node s and a target node t, the PPR value π(s,t) represents the probability that a random walk from s terminates at t, and thus ...
Approximate Personalized PageRank on Dynamic Graphs
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningWe propose and analyze two algorithms for maintaining approximate Personalized PageRank (PPR) vectors on a dynamic graph, where edges are added or deleted. Our algorithms are natural dynamic versions of two known local variations of power iteration. One,...
Personalized PageRank Estimation and Search: A Bidirectional Approach
WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data MiningWe present new algorithms for Personalized PageRank estimation and Personalized PageRank search. First, for the problem of estimating Personalized PageRank (PPR) from a source distribution to a target node, we present a new bidirectional estimator with ...






Comments