skip to main content
10.1145/1989284.1989314acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Theory of data stream computing: where to go

Published:13 June 2011Publication History

ABSTRACT

Computing power has been growing steadily, just as communication rate and memory size. Simultaneously our ability to create data has been growing phenomenally and therefore the need to analyze it. We now have examples of massive data streams that are created in far higher rate than we can capture and store in memory economically, gathered in far more quantity than can be transported to central databases without overwhelming the communication infrastructure, and arrives far faster than we can compute with them in a sophisticated way.

This phenomenon has challenged how we store, communicate and compute with data. Theories developed over past 50 years have relied on full capture, storage and communication of data. Instead, what we need for managing modern massive data streams are new methods built around working with less. The past 10 years have seen new theories emerge in computing (data stream algorithms), communication (compressed sensing), databases (data stream management systems) and other areas to address the challenges of massive data streams. Still, lot remains open and new applications of massive data streams have emerged recently. We present an overview of these challenges.

Skip Supplemental Material Section

Supplemental Material

1989314.wmv

References

  1. S. Muthukrishnan. Data Streams: Algorithms and Applications. In Foundations and Trends in Theoretical Computer Science, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Indyk A tutorial on Streaming, Sketching and Sub-linear Space Algorithms. 2009 Information Theory and Applications Workshop, San Diego, 2009. http://people.csail.mit.edu/indyk/ita-web.pdfGoogle ScholarGoogle Scholar
  3. M. Garofalakis, J. Gehrke and R. Rastogi. Data Stream Management: Processing High-Speed Data Streams, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Cranor, T. Johnson and O. Spatscheck. Gigascope: How to monitor network traffic at 5Gbit/sec at a time. http://www2.research.att.com/~divesh/meetings/mpds2003/schedule/spatscheck.pdf.Google ScholarGoogle Scholar
  5. David Donoho. Compressed sensing. Technical Report, 2004.Google ScholarGoogle Scholar
  6. E. Candes and T. Tao. Near-optimal signal recovery from random projections and universalencoding strategies. 2004.Google ScholarGoogle Scholar
  7. http://dsp.rice.edu/cs.Google ScholarGoogle Scholar
  8. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Proc. OSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. http://en.wikipedia.org/wiki/XLDB.Google ScholarGoogle Scholar
  10. Jon Feldman, S. Muthukrishnan, Anastasios Sidiropoulos, Clifford Stein, Zoya Svitkina. On distributing symmetric streaming computations. em Proc. SODA 2008: 710--719. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. Karloff, S. Suri, S. and S. Vassilvitskii.A Model of Computation for MapReduce. Proc. ACM-SIAM SODA 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Graham Cormode, S. Muthukrishnan, Ke Yi. Algorithms for distributed functional monitoring. Proc. SODA 2008: 1076--1085 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Eyal Kushilevitz and Noam Nisan. Communication Complexity, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. http://www.scholarpedia.org/article/Slepian-Wolf_codingGoogle ScholarGoogle Scholar
  15. Kenneth L. Clarkson, David P. Woodruff. Numerical linear algebra in the streaming model. Proc STOC. 2009: 205--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kook Jin Ahn, Sudipto Guha. Graph Sparsification in the Semi-streaming Model. ICALP (2) 2009: 328--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Chakrabarti, G. Cormode, and A. McGregor. Annotations in data streams. In International Colloquium on Automata, Languages and Programming (ICALP), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Dwork, M. Naor, T. Pitassi, G. Rothblum, and S. Yekhanin. Pan-Private Streaming Algorithms. ICS, 2010.Google ScholarGoogle Scholar
  19. D. Mir, S. Muthukrishnan, A. Nikolov and R. Wright.Pan-Private Algorithms Via Statistics on Sketches. PODS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Cynthia Dwork. Differential Privacy in New Settings. SODA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Theory of data stream computing: where to go

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PODS '11: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
      June 2011
      332 pages
      ISBN:9781450306607
      DOI:10.1145/1989284

      Copyright © 2011 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 June 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Author Tags

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate476of1,835submissions,26%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!