Abstract
We consider the read/write streams model, an extension of the standard data stream model in which an algorithm can create and manipulate multiple read/write streams in addition to its input data stream. Like the data stream model, the most important parameter for this model is the amount of internal memory used by such an algorithm. The other key parameters are the number of streams the algorithm uses and the number of passes it makes on these streams. We consider how the addition of multiple streams impacts the ability of algorithms to approximate the frequency moments of the input stream.
We show that any randomized read/write stream algorithm with a fixed number of streams and a sublogarithmic number of passes that produces a constant factor approximation of the k-th frequency moment Fk of an input sequence of length of at most N from {1,..., N} requires space Ω(N 1−4/k−δ) for any δ > 0. For comparison, it is known that with a single read-only one-pass data stream there is a randomized constant-factor approximation for Fk using Õ(N1−2/k) space, and that by sorting, which can be done deterministically in O(log N) passes using 3 read/write streams, Fk can be computed exactly. Therefore, although the ability to manipulate multiple read/write streams can add substantial power to the data stream model, with a sublogarithmic number of passes this does not significantly improve the ability to approximate higher frequency moments efficiently. We obtain our results by showing a new connection between the read/write streams model and the multiparty number-in-hand communication model.
- Alon, N., Matias, Y., and Szegedy, M. 1999. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58, 1, 137--147. Google Scholar
Digital Library
- Babcock, B., Babu, S., Datar, M., Motwani, R., and Widom, J. 2002. Models and issues in data stream systems. In Proceedings of the 21st Annual ACM Symposium on Principles of Database Systems. 1--16. Google Scholar
Digital Library
- Bar-Yossef, Z., Jayram, T., Kumar, R., and Sivakumar, D. 2004. An information statistics approach to data stream and communication complexity. J. Comput. Syst. Sci. 68, 4, 702--732. Google Scholar
Digital Library
- Beame, P., Blais, E., and Huynh-Ngoc, D.-T. 2009. Longest common subsequences in sets of permutations. Tech. rep. arXiv:R0904.1615v1 {math.CO}.Google Scholar
- Beame, P. and Huynh-Ngoc, D.-T. 2008. On the value of multiple read/write streams for approximating frequency moments. In Proceedings of the 49th Annual Symposium on Foundations of Computer Science. IEEE, 499--508. Google Scholar
Digital Library
- Beame, P., Jayram, T. S., and Rudra, A. 2007. Lower bounds for randomized read/write stream algorithms. In Proceedings of the 39th Annual ACM Symposium on Theory of Computing. 689--698. Google Scholar
Digital Library
- Bhuvanagiri, L., Ganguly, S., Kesh, D., and Saha, C. 2006. Simpler algorithms for estimating frequency moments of data streams. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms. 708--713. Google Scholar
Digital Library
- Chakrabarti, A., Khot, S., and Sun, X. 2003. Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In Proceedings of the 18th Annual IEEE Conference on Computational Complexity. 107--117.Google Scholar
- Chakrabarty, A. and Regev, O. 2011. An optimal lower bound on the communication complexity of gap-hamming-distance. In Proceedings of the 43rd Annual ACM Symposium on Theory of Computing. 51--60. Google Scholar
Digital Library
- Grohe, M. and Schweikardt, N. 2005. Lower bounds for sorting with few random accesses to external memory. In Proceedings of the 24th Annual ACM Symposium on Principles of Database Systems. 238--249. Google Scholar
Digital Library
- Grohe, M., Hernich, A., and Schweikardt, N. 2006. Randomized computations on large data sets: Tight lower bounds. In Proceedings of the 25th Annual ACM Symposium on Principles of Database Systems. 243--252. Google Scholar
Digital Library
- Grohe, M., Hernich, A., and Schweikardt, N. 2009. Lower bounds for processing data with few random accesses to external memory. J. ACM 56, 3, 1--58. Google Scholar
Digital Library
- Gronemeier, A. 2009. Asymptotically optimal lower bounds on the NIH-multi-party information complexity of the AND-function and disjointness. In Proceedings of the 26th Annual Symposium on Theoretical Aspects of Computer Science. 505--516.Google Scholar
- Indyk, P. and Woodruff, D. 2003. Tight lower bounds for the distinct elements problem. In Proceedings of the 44th Annual Symposium on Foundations of Computer Science. IEEE, 283--292. Google Scholar
Digital Library
- Indyk, P. and Woodruff, D. P. 2005. Optimal approximations of frequency moments of data streams. In Proceedings of the 37th Annual ACM Symposium on Theory of Computing. 202--208. Google Scholar
Digital Library
- Jayram, T. S. 2009. Hellinger strikes back: A note on the multi-party information complexity of and. In Proceedings of APPROX-RANDOM. Lecture Notes in Computer Science Series, vol. 5687, Springer, 562--573. Google Scholar
Digital Library
- Muthukrishnan, S. 2006. Data streams: Algorithms and applications. Found. Trends Theor. Comput. Sci. 1, 2. Google Scholar
Digital Library
- Razborov, A. A. 1992. On the distributional complexity of disjointness. Theor. Comput. Sci. 106, 2, 385--390. Google Scholar
Digital Library
- Saks, M. E. and Sun, X. 2002. Space lower bounds for distance approximation in the data stream model. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing. 360--369. Google Scholar
Digital Library
- Steele, J. M. 1995. Variations on the monotone subsequence problem of Erdös and Szekeres. In Discrete Probability and Algorithms, Aldous, Diaconis, and Steele Eds., Springer, 111--132.Google Scholar
- Steele, J. M. 1997. Probability Theory and Combinatorial Optimization. SIAM, Philadelphia, PA.Google Scholar
- Woodruff, D. 2004. Optimal space lower bounds for all frequency moments. In Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms. 167--175. Google Scholar
Digital Library
Index Terms
The Value of Multiple Read/Write Streams for Approximating Frequency Moments
Recommendations
On the Value of Multiple Read/Write Streams for Approximating Frequency Moments
FOCS '08: Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer ScienceWe consider the read/write streams model, an extension of the standard data stream model in which an algorithm can create and manipulate multiple read/write streams in addition to its input data stream. We show that any randomized read/write stream ...
On the value of multiple read/write streams for data compression
Information Theory, Combinatorics, and Search TheoryWe study whether, when restricted to using polylogarithmic memory and polylogarithmic passes, we can achieve qualitatively better data compression with multiple read/write streams than we can with only one. We first show how we can achieve universal ...
Lower bounds for randomized read/write stream algorithms
STOC '07: Proceedings of the thirty-ninth annual ACM symposium on Theory of computingMotivated by the capabilities of modern storage architectures, we consider the following generalization of the data stream model where the algorithm has sequential access to multiple streams. Unlike the data stream model, where the stream is read only, ...






Comments