ABSTRACT
Computing power has been growing steadily, just as communication rate and memory size. Simultaneously our ability to create data has been growing phenomenally and therefore the need to analyze it. We now have examples of massive data streams that are created in far higher rate than we can capture and store in memory economically, gathered in far more quantity than can be transported to central databases without overwhelming the communication infrastructure, and arrives far faster than we can compute with them in a sophisticated way.
This phenomenon has challenged how we store, communicate and compute with data. Theories developed over past 50 years have relied on full capture, storage and communication of data. Instead, what we need for managing modern massive data streams are new methods built around working with less. The past 10 years have seen new theories emerge in computing (data stream algorithms), communication (compressed sensing), databases (data stream management systems) and other areas to address the challenges of massive data streams. Still, lot remains open and new applications of massive data streams have emerged recently. We present an overview of these challenges.
Supplemental Material
- S. Muthukrishnan. Data Streams: Algorithms and Applications. In Foundations and Trends in Theoretical Computer Science, 2005. Google Scholar
Digital Library
- P. Indyk A tutorial on Streaming, Sketching and Sub-linear Space Algorithms. 2009 Information Theory and Applications Workshop, San Diego, 2009. http://people.csail.mit.edu/indyk/ita-web.pdfGoogle Scholar
- M. Garofalakis, J. Gehrke and R. Rastogi. Data Stream Management: Processing High-Speed Data Streams, 2007. Google Scholar
Digital Library
- C. Cranor, T. Johnson and O. Spatscheck. Gigascope: How to monitor network traffic at 5Gbit/sec at a time. http://www2.research.att.com/~divesh/meetings/mpds2003/schedule/spatscheck.pdf.Google Scholar
- David Donoho. Compressed sensing. Technical Report, 2004.Google Scholar
- E. Candes and T. Tao. Near-optimal signal recovery from random projections and universalencoding strategies. 2004.Google Scholar
- http://dsp.rice.edu/cs.Google Scholar
- J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Proc. OSDI, 2004. Google Scholar
Digital Library
- http://en.wikipedia.org/wiki/XLDB.Google Scholar
- Jon Feldman, S. Muthukrishnan, Anastasios Sidiropoulos, Clifford Stein, Zoya Svitkina. On distributing symmetric streaming computations. em Proc. SODA 2008: 710--719. Google Scholar
Digital Library
- H. Karloff, S. Suri, S. and S. Vassilvitskii.A Model of Computation for MapReduce. Proc. ACM-SIAM SODA 2010. Google Scholar
Digital Library
- Graham Cormode, S. Muthukrishnan, Ke Yi. Algorithms for distributed functional monitoring. Proc. SODA 2008: 1076--1085 Google Scholar
Digital Library
- Eyal Kushilevitz and Noam Nisan. Communication Complexity, 1997. Google Scholar
Digital Library
- http://www.scholarpedia.org/article/Slepian-Wolf_codingGoogle Scholar
- Kenneth L. Clarkson, David P. Woodruff. Numerical linear algebra in the streaming model. Proc STOC. 2009: 205--214. Google Scholar
Digital Library
- Kook Jin Ahn, Sudipto Guha. Graph Sparsification in the Semi-streaming Model. ICALP (2) 2009: 328--338. Google Scholar
Digital Library
- A. Chakrabarti, G. Cormode, and A. McGregor. Annotations in data streams. In International Colloquium on Automata, Languages and Programming (ICALP), 2009. Google Scholar
Digital Library
- C. Dwork, M. Naor, T. Pitassi, G. Rothblum, and S. Yekhanin. Pan-Private Streaming Algorithms. ICS, 2010.Google Scholar
- D. Mir, S. Muthukrishnan, A. Nikolov and R. Wright.Pan-Private Algorithms Via Statistics on Sketches. PODS, 2011. Google Scholar
Digital Library
- Cynthia Dwork. Differential Privacy in New Settings. SODA, 2010. Google Scholar
Digital Library
Index Terms
Theory of data stream computing: where to go
Recommendations
Elastic Stream Computing with Clouds
CLOUD '11: Proceedings of the 2011 IEEE 4th International Conference on Cloud ComputingStream computing, also known as data stream processing, has emerged as a new processing paradigm that processes incoming data streams from tremendous numbers of sensors in a real-time fashion. Data stream applications must have low latency even when the ...
Data Stream Mining: Challenges and Techniques
ICTAI '10: Proceedings of the 2010 22nd IEEE International Conference on Tools with Artificial Intelligence - Volume 02Data streams are continuous flows of data. Examples of data streams include network traffic, sensor data, call center records and so on. Their sheer volume and speed pose a great challenge for the data mining community to mine them. Data streams ...
IoT Big Data Stream Mining
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningThe challenge of deriving insights from the Internet of Things (IoT) has been recognized as one of the most exciting and key opportunities for both academia and industry. Advanced analysis of big data streams from sensors and devices is bound to become ...






Comments