skip to main content
10.1145/1376916.1376928acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Estimating PageRank on graph streams

Published:09 June 2008Publication History

ABSTRACT

This study focuses on computations on large graphs (e.g., the web-graph) where the edges of the graph are presented as a stream. The objective in the streaming model is to use small amount of memory (preferably sub-linear in the number of nodes n) and a few passes.

In the streaming model, we show how to perform several graph computations including estimating the probability distribution after a random walk of length l, mixing time, and the conductance. We estimate the mixing time M of a random walk in Õ(nα+Mα√n+√Mn/

α) space and Õ(√Mα) passes. Furthermore, the relation between mixing time and conductance gives us an estimate for the conductance of the graph. By applying our algorithm for computing probability distribution on the web-graph, we can estimate the PageRank p of any node up to an additive error of √εp in Õ(√M/α) passes and Õ(min(nα + 1/ε √M/α + 1/ε Mα, αnMα + 1/ε √M/α)) space, for any α ∈ (0, 1]. In particular, for ε = M/n, by setting α = M--1/2, we can compute the approximate PageRank values in Õ(nM--1/4) space and Õ(M3/4) passes. In comparison, a standard implementation of the PageRank algorithm will take O(n) space and O(M) passes.

References

  1. N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 58(1):137--147, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Reid Andersen, Fan R. K. Chung, and Kevin J. Lang. Local graph partitioning using pagerank vectors. In Proc. of the 42nd IEEE Symposium on Foundations of Computer Science (FOCS), pages 475--486, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Roy Armoni, Amnon Ta-Shma, Avi Wigderson, and Shiyu Zhou. sll 4/3. In ACM Symposium on Theory of Computing, pages 230--239, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In In Proc. ACM-SIAM Symposium on Discrete Algorithms, pages 623--632, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Batu, E. Fischer, L. Fortnow, R. Kumar, R. Rubenfeld, and P. White. Testing random variables for independence and identity. In Proc. of the 42nd IEEE Symposium on Foundations of Computer Science (FOCS), pages 442--451, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Bhuvanagiri and S. Ganguly. Estimating entropy over data streams. In European Symposium on Algorithms (ESA), pages 148--159, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Bhuvanagiri, S. Ganguly, D. Kesh, and C. Saha. Simpler algorithm for estimating frequency moments of data streams. In Proc of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 708--713, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual web search engine. In Proc. 7th international conference on World Wide Web, pages 107--117, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Luciana S. Buriol, Gereon Frahling, Stefano Leonardi, Alberto Marchetti-Spaccamela, and Christian Sohle. Counting triangles in data streams. In PODS, pages 253--262, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Cormode and S. Muthukrishnan. Space efficient mining of multigraph streams. In In ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, pages 271--282, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Demetrescu, I. Finocchi, and A. Ribichini. Trading of space for passes in graph streaming problems. In In ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 714--723, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Uriel Feige. A spectrum of time-space trade-offs for undirected s-t connectivity. Journal of Computer and System Sciences, 54(2):305--316, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Feigenbaum, S. Kannan, A McGregor, S Suri, and J. Zhang. Graph distances in the streaming model: the value of space. In In ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 745--754, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jon Feldman, S. Muthukrishnan, Anastasios Sidiropoulos, Cliff Stein, and Zoya Svitkina. On the complexity of processing massive, unordered, distributed data. In CoRR abs/cs/0611108, 2006.Google ScholarGoogle Scholar
  15. M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In In ACM International Conference on Management of Data, SIGMOD, pages 58--66, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Guha and A. McGregor. Approximate quantiles and the order of the stream. In In ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, pages 273--279, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Sudipto Guha and Andrew McGregor. Space-efficient sampling. In In AISTATS, pages 169--176, 2007.Google ScholarGoogle Scholar
  18. Sudipto Guha, Andrew McGregor, and Suresh Venkatasubramanian. Streaming and sublinear approximation of entropy and information distances. In In ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 733--742, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sudipto Guha and Andrew McGrgor. Lower bounds for quantile estimation in random-order and multi-pass streaming. In ICALP, pages 704--715, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. In In External Memory Algorithms, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, volume 50, pages 107--118, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Indyk and D. P. Woodruff. Optimal approximations of the frequency moments of data streams. In IEEE Symposium on Foundations of Computer Science, FOCS, pages 283--292, 2003.Google ScholarGoogle Scholar
  22. Piotr Indyk. Algorithms for dynamic geometric problems over data streams. In ACM Symposium on Theory of Computing, STOC, pages 373--380, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Jerrum and A. Sinclair. Approximating the permanent. SIAM Journal of Computing, 18(6):1149--1178, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. Jowhari and M. Ghodsi. New streaming algorithms for counting triangles in graphs. In In COCOON, pages 710--716, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G.S. Manku, S. Rajagopalan, and B.G. Lindsay. Randomized sampling techniques for space efficient online computation of order statistics of large datasets. In In ACM SIGMOD International Conference on Management of Data, pages 251--262, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. McGregor. Finding graph matchings in data streams. In In APPROX-RANDOM, pages 170--181, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Frank McSherry. A uniform approach to accelerated pagerank computation. In WWW, pages 575--582, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. Sarlos, A. Benczur, K. Csalogany, D. Fogaras, and B. Racz. To randomize or not to randomize: Space optimal summaries for hyperlink analysis. In In the 15th International World Wide Web Conference, WWW, pages 297--306, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. John Wicks and Amy R. Greenwald. Parallelizing the computation of pagerank. In Proc. 5th Workshop On Algorithms And Models For The Web-Graph (WAW), pages 202--208, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. David P. Woodruff. Optimal space lower bounds for all frequency moments'. In In ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 167--175, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Estimating PageRank on graph streams

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PODS '08: Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
      June 2008
      330 pages
      ISBN:9781605581521
      DOI:10.1145/1376916

      Copyright © 2008 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 June 2008

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate476of1,835submissions,26%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!