Abstract
This article focuses on computations on large graphs (e.g., the web-graph) where the edges of the graph are presented as a stream. The objective in the streaming model is to use small amount of memory (preferably sub-linear in the number of nodes n) and a smaller number of passes.
In the streaming model, we show how to perform several graph computations including estimating the probability distribution after a random walk of length l, the mixing time M, and other related quantities such as the conductance of the graph. By applying our algorithm for computing probability distribution on the web-graph, we can estimate the PageRank p of any node up to an additive error of √ε p+ε in Õ(√M/α) passes and Õ(min(nα+1/ε√M/α+(1/ε)Mα, α n√Mα + (1/ε)√M/α)) space, for any α ∈ (0,1]. Specifically, for ε = M/n, α = M−1/2, we can compute the approximate PageRank values in Õ(nM−1/4) space and Õ(M3/4) passes. In comparison, a standard implementation of the PageRank algorithm will take O(n) space and O(M) passes. We also give an approach to approximate the PageRank values in just Õ(1) passes although this requires Õ(nM) space.
- Alon, N., Matias, Y., and Szegedy, M. 1999. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58, 1, 137--147. Google Scholar
Digital Library
- Andersen, R., Chung, F. R. K., and Lang, K. J. 2006. Local graph partitioning using pagerank vectors. In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science (FOCS). 475--486. Google Scholar
Digital Library
- Armoni, R., Ta-Shma, A., Wigderson, A., and Zhou, S. 1997. sl ≤ l<sup>4/3</sup>. In Proceedings of the ACM Symposium on Theory of Computing. 230--239. Google Scholar
Digital Library
- Bar-Yossef, Z., Kumar, R., and Sivakumar, D. 2002. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 623--632. Google Scholar
Digital Library
- Batu, T., Fischer, E., Fortnow, L., Kumar, R., Rubenfeld, R., and White, P. 2001. Testing random variables for independence and identity. In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science (FOCS). 442--451. Google Scholar
Digital Library
- Bhuvanagiri, L., and Ganguly, S. 2006. Estimating entropy over data streams. In Proceedings of the European Symposium on Algorithms (ESA). 148--159. Google Scholar
Digital Library
- Bhuvanagiri, L., Ganguly, S., Kesh, D., and Saha, C. 2006. Simpler algorithm for estimating frequency moments of data streams. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 708--713. Google Scholar
Digital Library
- Brin, S., and Page, L. 1998. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th International Conference on World Wide Web. 107--117. Google Scholar
Digital Library
- Buriol, L. S., Frahling, G., Leonardi, S., Marchetti-Spaccamela, A., and Sohle, C. 2006. Counting triangles in data streams. In Proceedings of the ACM SIFMOD-SICACT-SILANT Symposium on Principles of Database Systems. 253--262. Google Scholar
Digital Library
- Cormode, G., and Muthukrishnan, S. 2005. Space efficient mining of multigraph streams. In Proceedings of the ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS). 271--282. Google Scholar
Digital Library
- Demetrescu, C., Finocchi, I., and Ribichini, A. 2006. Trading of space for passes in graph streaming problems. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA). 714--723. Google Scholar
Digital Library
- Feige, U. 1997. A spectrum of time-space trade-offs for undirected s-t connectivity. J. Comput. Syst. Sci. 54, 2, 305--316. Google Scholar
Digital Library
- Feigenbaum, J., Kannan, S., McGregor, A., Suri, S., and Zhang, J. 2005. Graph distances in the streaming model: the value of space. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA). 745--754. Google Scholar
Digital Library
- Feldman, J., Muthukrishnan, S., Sidiropoulos, A., Stein, C., and Svitkina, Z. 2006. On the complexity of processing massive, unordered, distributed data. In CoRR abs/cs/0611108.Google Scholar
- Greenwald, M., and Khanna, S. 2001. Space-efficient online computation of quantile summaries. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). 58--66. Google Scholar
Digital Library
- Guha, S., and McGregor, A. 2006. Approximate quantiles and the order of the stream. In Proceedings of the ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS). 273--279. Google Scholar
Digital Library
- Guha, S., and McGregor, A. 2007a. Space-efficient sampling. In Proceedings of the 11th International Conference on Artificial Intelligence and Statistics (AISTATS). 169--176.Google Scholar
- Guha, S., and McGregor, A. 2007b. Lower bounds for quantile estimation in random-order and multi-pass streaming. In Proceedings of the 34th International Colloquium on Automata, Languages and Programming (ICALP). 704--715. Google Scholar
Digital Library
- Guha, S., McGregor, A., and Venkatasubramanian, S. 2006. Streaming and sublinear approximation of entropy and information distances. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA). 733--742. Google Scholar
Digital Library
- Henzinger, M., Raghavan, P., and Rajagopalan, S. 1999. Computing on data streams. In External Memory Algorithms, DIMACS Series in Discrete Mathematics and Theoretical Computer Science. Vol. 50. 107--118. Google Scholar
Digital Library
- Indyk, P. 2004. Algorithms for dynamic geometric problems over data streams. In Proceedings of the ACM Symposium on Theory of Computing (STOC). 373--380. Google Scholar
Digital Library
- Indyk, P., and Woodruff, D. P. 2003. Optimal approximations of the frequency moments of data streams. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS). 283--292.Google Scholar
- Jerrum, M., and Sinclair, A. 1989. Approximating the permanent. SIAM J. Computing 18, 6, 1149--1178. Google Scholar
Digital Library
- Jowhari, H., and Ghodsi, M. 2005. New streaming algorithms for counting triangles in graphs. In Proceedings of the 11th International Computing and Combinatories Conference (COCOON). 710--716. Google Scholar
Digital Library
- Manku, G., Rajagopalan, S., and Lindsay, B. 1999. Randomized sampling techniques for space efficient online computation of order statistics of large datasets. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 251--262. Google Scholar
Digital Library
- McGregor, A. 2005. Finding graph matchings in data streams. In Proceedings of the 8th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems (APPROX 2005) and 9th International Workshop on Randomization and Computation (RANDOM 2005). Lecture Notes in Computer Science, Vol. 3624, Springer, 170--181. Google Scholar
Digital Library
- McSherry, F. 2005. A uniform approach to accelerated pagerank computation. In Proceedings of the 14th International World Wide Web Conference (WWW). 575--582. Google Scholar
Digital Library
- Sarlos, T., Benczur, A., Csalogany, K., Fogaras, D., and Racz, B. 2006. To randomize or not to randomize: Space optimal summaries for hyperlink analysis. In Proceedings of the the 15th International World Wide Web Conference (WWW). 297--306. Google Scholar
Digital Library
- Wicks, J., and Greenwald, A. R. 2007. Parallelizing the computation of pagerank. In Proceedings of the 5th Workshop on Algorithms and Models for the Web-Graph (WAW). 202--208. Google Scholar
Digital Library
- Woodruff, D. P. 2004. Optimal space lower bounds for all frequency moments'. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA). 167--175. Google Scholar
Digital Library
Index Terms
Estimating PageRank on graph streams
Recommendations
Estimating PageRank on graph streams
PODS '08: Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsThis study focuses on computations on large graphs (e.g., the web-graph) where the edges of the graph are presented as a stream. The objective in the streaming model is to use small amount of memory (preferably sub-linear in the number of nodes n) and a ...
The mixing time of the giant component of a random graph
We show that the total variation mixing time of the simple random walk on the giant component of supercritical Gn,p and Gn,m is ï log2n. This statement was proved, independently, by Fountoulakis and Reed. Our proof follows from a structure result for ...
Distributed PageRank computation with improved round complexities
AbstractPageRank is a classic measure that effectively evaluates the importance of nodes in large graphs. It has been applied in numerous applications spanning data mining, Web algorithms, recommendation systems, load balancing, search and ...






Comments