skip to main content
10.1145/2213556.2213560acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Graph sketches: sparsification, spanners, and subgraphs

Published:21 May 2012Publication History

ABSTRACT

When processing massive data sets, a core task is to construct synopses of the data. To be useful, a synopsis data structure should be easy to construct while also yielding good approximations of the relevant properties of the data set. A particularly useful class of synopses are sketches, i.e., those based on linear projections of the data. These are applicable in many models including various parallel, stream, and compressed sensing settings. A rich body of analytic and empirical work exists for sketching numerical data such as the frequencies of a set of entities. Our work investigates graph sketching where the graphs of interest encode the relationships between these entities. The main challenge is to capture this richer structure and build the necessary synopses with only linear measurements.

In this paper we consider properties of graphs including the size of the cuts, the distances between nodes, and the prevalence of dense sub-graphs. Our main result is a sketch-based sparsifier construction: we show that Õ(nε-2) random linear projections of a graph on n nodes suffice to (1+ε) approximate all cut values. Similarly, we show that Õ(ε-2) linear projections suffice for (additively) approximating the fraction of induced sub-graphs that match a given pattern such as a small clique. Finally, for distance estimation we present sketch-based spanner constructions. In this last result the sketches are adaptive, i.e., the linear projections are performed in a small number of batches where each projection may be chosen dependent on the outcome of earlier sketches. All of the above results immediately give rise to data stream algorithms that also apply to dynamic graph streams where edges are both inserted and deleted. The non-adaptive sketches, such as those for sparsification and subgraphs, give us single-pass algorithms for distributed data streams with insertion and deletions. The adaptive sketches can be used to analyze MapReduce algorithms that use a small number of rounds.

References

  1. K. J. Ahn and S. Guha. Graph sparsification in the semi-streaming model. In ICALP (2), pages 328--338, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. J. Ahn and S. Guha. Laminar families and metric embeddings: Non-bipartite maximum matching problem in the semi-streaming model. Manuscript, available at http://arxiv.org/abs/1104.4058, 2011.Google ScholarGoogle Scholar
  3. K. J. Ahn and S. Guha. Linear programming in the semi-streaming model with application to the maximum matching problem. In ICALP (2), pages 526--538, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. J. Ahn, S. Guha, and A. McGregor. Analyzing graph structure via linear measurements. In SODA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 58:137--147, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proc. of SODA, pages 623--632, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Baswana and S. Sen. A simple and linear time randomized algorithm for computing sparse spanners in weighted graphs. Random Struct. Algorithms, 30(4):532--563, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. A. Benczúr and D. R. Karger. Approximating s-t minimum cuts in Õ(n2)time. In STOC, pages 47--55, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. S. Buriol, G. Frahling, S. Leonardi, A. Marchetti-Spaccamela, and C. Sohler. Counting triangles in data streams. In PODS, pages 253--262, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. Theor. Comput. Sci., 312(1):3--15, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. L. Clarkson and D. P. Woodruff. Numerical linear algebra in the streaming model. In STOC, pages 205--214, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Cormode. Sketch techniques for approximate query processing. In G. Cormode, M. Garofalakis, P. Haas, and C. Jermaine, editors, Synposes for Approximate Query Processing: Samples, Histograms, Wavelets and Sketches, Foundations and Trends in Databases. NOW publishers, 2011.Google ScholarGoogle Scholar
  13. G. Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. J. Algorithms, 55(1):58--75, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Cormode, S. Muthukrishnan, and I. Rozenbaum. Summarizing and mining inverse distributions on data streams via dynamic inverse sampling. In VLDB, pages 25--36, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Cormode, S. Muthukrishnan, K. Yi, and Q. Zhang. Optimal sampling from distributed streams. In PODS, pages 77--86, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Elkin. A near-optimal fully dynamic distributed algorithm for maintaining sparse spanners, 2006.Google ScholarGoogle Scholar
  17. M. Elkin. Streaming and fully dynamic centralized algorithms for constructing and maintaining sparse spanners. ACM Transactions on Algorithms, 7(2):20, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. Epstein, A. Levin, J. Mestre, and D. Segev. Improved approximation guarantees for weighted matching in the semi-streaming model. CoRR, abs/00907.0305, 2000.Google ScholarGoogle Scholar
  19. J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. On graph problems in a semi-streaming model. Theor. Comput. Sci., 348(2):207--216, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. Graph distances in the data-stream model. SIAM Journal on Computing, 38(5):1709--1727, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. Frahling, P. Indyk, and C. Sohler. Sampling in dynamic data streams and applications. In Symposium on Computational Geometry, pages 142--149, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W. S. Fung, R. Hariharan, N. J. A. Harvey, and D. Panigrahi. A general framework for graph sparsification. In STOC, pages 71--80, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Ganguly and L. Bhuvanagiri. Hierarchical sampling from sketches: Estimating functions over data streams. Algorithmica, 53(4):549--582, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Gilbert and P. Indyk. Sparse recovery using sparse matrices. Proceedings of the IEEE, 98(6):937--947, june 2010.Google ScholarGoogle Scholar
  25. R. E. Gomory and T. C. Hu. Multi-Terminal Network Flows. Journal of the Society for Industrial and Applied Mathematics, 9(4):551--570, 1961.Google ScholarGoogle ScholarCross RefCross Ref
  26. S. Guha, N. Koudas, and K. Shim. Approximation and streaming algorithms for histogram construction problems. ACM Trans. Database Syst., 31(1):396--438, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. Indyk. Stable distributions, pseudorandom generators, embeddings and data stream computation. J. ACM, 53(3):307--323, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Indyk and D. Woodruff. Optimal approximations of the frequency moments of data streams. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 202--208. ACM New York, NY, USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. W. B. Johnson and J. Lindenstrauss. Extensions of Lipshitz mapping into Hilbert Space. Contemporary Mathematics, Vol 26, pages 189--206, May 1984.Google ScholarGoogle Scholar
  30. H. Jowhari and M. Ghodsi. New streaming algorithms for counting triangles in graphs. In COCOON, pages 710--716, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. H. Jowhari, M. Saglam, and G. Tardos. Tight bounds for lp samplers, finding duplicates in streams, and related problems. In PODS, pages 49--58, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. M. Kane, J. Nelson, E. Porat, and D. P. Woodruff. Fast moment estimation in data streams in optimal space. In STOC, pages 745--754, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. M. Kane, J. Nelson, and D. P. Woodruff. An optimal algorithm for the distinct elements problem. In PODS, pages 41--52, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. D. R. Karger. Random sampling in cut, flow, and network design problems. In STOC, pages 648--657, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. A. Kelner and A. Levin. Spectral sparsification in the semi-streaming setting. In STACS, pages 440--451, 2011.Google ScholarGoogle Scholar
  36. A. McGregor. Finding graph matchings in data streams. APPROX-RANDOM, pages 170--181, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. McGregor. Graph mining on streams. In Encyclopedia of Database Systems, pages 1271--1275, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  38. S. Muthukrishnan. Data Streams: Algorithms and Applications. Now Publishers, 2006.Google ScholarGoogle Scholar
  39. N. Nisan. Pseudorandom generators for space-bounded computation. Combinatorica, 12(4):449--461, 1992.Google ScholarGoogle Scholar
  40. A. Schrijver. Combinatorial Optimization - Polyhedra and Efficiency, volume 24 of Algorithms and Combinatorics. Springer, 2003.Google ScholarGoogle Scholar
  41. M. Zelke. Weighted matching in the semi-streaming model. Algorithmica DOI: 10.1007/s00453-010-9438-5, 2010. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Graph sketches: sparsification, spanners, and subgraphs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PODS '12: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems
      May 2012
      332 pages
      ISBN:9781450312486
      DOI:10.1145/2213556

      Copyright © 2012 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 May 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate476of1,835submissions,26%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!