skip to main content
research-article

Estimating PageRank on graph streams

Published:09 June 2011Publication History
Skip Abstract Section

Abstract

This article focuses on computations on large graphs (e.g., the web-graph) where the edges of the graph are presented as a stream. The objective in the streaming model is to use small amount of memory (preferably sub-linear in the number of nodes n) and a smaller number of passes.

In the streaming model, we show how to perform several graph computations including estimating the probability distribution after a random walk of length l, the mixing time M, and other related quantities such as the conductance of the graph. By applying our algorithm for computing probability distribution on the web-graph, we can estimate the PageRank p of any node up to an additive error of √ε p+ε in Õ(√M/α) passes and Õ(min(nα+1/ε√M/α+(1/ε)Mα, α nMα + (1/ε)√M/α)) space, for any α ∈ (0,1]. Specifically, for ε = M/n, α = M−1/2, we can compute the approximate PageRank values in Õ(nM−1/4) space and Õ(M3/4) passes. In comparison, a standard implementation of the PageRank algorithm will take O(n) space and O(M) passes. We also give an approach to approximate the PageRank values in just Õ(1) passes although this requires Õ(nM) space.

References

  1. Alon, N., Matias, Y., and Szegedy, M. 1999. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58, 1, 137--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Andersen, R., Chung, F. R. K., and Lang, K. J. 2006. Local graph partitioning using pagerank vectors. In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science (FOCS). 475--486. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Armoni, R., Ta-Shma, A., Wigderson, A., and Zhou, S. 1997. sl ≤ l<sup>4/3</sup>. In Proceedings of the ACM Symposium on Theory of Computing. 230--239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bar-Yossef, Z., Kumar, R., and Sivakumar, D. 2002. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 623--632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Batu, T., Fischer, E., Fortnow, L., Kumar, R., Rubenfeld, R., and White, P. 2001. Testing random variables for independence and identity. In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science (FOCS). 442--451. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bhuvanagiri, L., and Ganguly, S. 2006. Estimating entropy over data streams. In Proceedings of the European Symposium on Algorithms (ESA). 148--159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bhuvanagiri, L., Ganguly, S., Kesh, D., and Saha, C. 2006. Simpler algorithm for estimating frequency moments of data streams. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 708--713. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Brin, S., and Page, L. 1998. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th International Conference on World Wide Web. 107--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Buriol, L. S., Frahling, G., Leonardi, S., Marchetti-Spaccamela, A., and Sohle, C. 2006. Counting triangles in data streams. In Proceedings of the ACM SIFMOD-SICACT-SILANT Symposium on Principles of Database Systems. 253--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cormode, G., and Muthukrishnan, S. 2005. Space efficient mining of multigraph streams. In Proceedings of the ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS). 271--282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Demetrescu, C., Finocchi, I., and Ribichini, A. 2006. Trading of space for passes in graph streaming problems. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA). 714--723. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Feige, U. 1997. A spectrum of time-space trade-offs for undirected s-t connectivity. J. Comput. Syst. Sci. 54, 2, 305--316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Feigenbaum, J., Kannan, S., McGregor, A., Suri, S., and Zhang, J. 2005. Graph distances in the streaming model: the value of space. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA). 745--754. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Feldman, J., Muthukrishnan, S., Sidiropoulos, A., Stein, C., and Svitkina, Z. 2006. On the complexity of processing massive, unordered, distributed data. In CoRR abs/cs/0611108.Google ScholarGoogle Scholar
  15. Greenwald, M., and Khanna, S. 2001. Space-efficient online computation of quantile summaries. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). 58--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Guha, S., and McGregor, A. 2006. Approximate quantiles and the order of the stream. In Proceedings of the ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS). 273--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Guha, S., and McGregor, A. 2007a. Space-efficient sampling. In Proceedings of the 11th International Conference on Artificial Intelligence and Statistics (AISTATS). 169--176.Google ScholarGoogle Scholar
  18. Guha, S., and McGregor, A. 2007b. Lower bounds for quantile estimation in random-order and multi-pass streaming. In Proceedings of the 34th International Colloquium on Automata, Languages and Programming (ICALP). 704--715. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Guha, S., McGregor, A., and Venkatasubramanian, S. 2006. Streaming and sublinear approximation of entropy and information distances. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA). 733--742. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Henzinger, M., Raghavan, P., and Rajagopalan, S. 1999. Computing on data streams. In External Memory Algorithms, DIMACS Series in Discrete Mathematics and Theoretical Computer Science. Vol. 50. 107--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Indyk, P. 2004. Algorithms for dynamic geometric problems over data streams. In Proceedings of the ACM Symposium on Theory of Computing (STOC). 373--380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Indyk, P., and Woodruff, D. P. 2003. Optimal approximations of the frequency moments of data streams. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS). 283--292.Google ScholarGoogle Scholar
  23. Jerrum, M., and Sinclair, A. 1989. Approximating the permanent. SIAM J. Computing 18, 6, 1149--1178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jowhari, H., and Ghodsi, M. 2005. New streaming algorithms for counting triangles in graphs. In Proceedings of the 11th International Computing and Combinatories Conference (COCOON). 710--716. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Manku, G., Rajagopalan, S., and Lindsay, B. 1999. Randomized sampling techniques for space efficient online computation of order statistics of large datasets. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 251--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. McGregor, A. 2005. Finding graph matchings in data streams. In Proceedings of the 8th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems (APPROX 2005) and 9th International Workshop on Randomization and Computation (RANDOM 2005). Lecture Notes in Computer Science, Vol. 3624, Springer, 170--181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. McSherry, F. 2005. A uniform approach to accelerated pagerank computation. In Proceedings of the 14th International World Wide Web Conference (WWW). 575--582. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sarlos, T., Benczur, A., Csalogany, K., Fogaras, D., and Racz, B. 2006. To randomize or not to randomize: Space optimal summaries for hyperlink analysis. In Proceedings of the the 15th International World Wide Web Conference (WWW). 297--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Wicks, J., and Greenwald, A. R. 2007. Parallelizing the computation of pagerank. In Proceedings of the 5th Workshop on Algorithms and Models for the Web-Graph (WAW). 202--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Woodruff, D. P. 2004. Optimal space lower bounds for all frequency moments'. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA). 167--175. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Estimating PageRank on graph streams

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!