skip to main content
research-article

The Value of Multiple Read/Write Streams for Approximating Frequency Moments

Published:01 January 2012Publication History
Skip Abstract Section

Abstract

We consider the read/write streams model, an extension of the standard data stream model in which an algorithm can create and manipulate multiple read/write streams in addition to its input data stream. Like the data stream model, the most important parameter for this model is the amount of internal memory used by such an algorithm. The other key parameters are the number of streams the algorithm uses and the number of passes it makes on these streams. We consider how the addition of multiple streams impacts the ability of algorithms to approximate the frequency moments of the input stream.

We show that any randomized read/write stream algorithm with a fixed number of streams and a sublogarithmic number of passes that produces a constant factor approximation of the k-th frequency moment Fk of an input sequence of length of at most N from {1,..., N} requires space Ω(N 1−4/k−δ) for any δ > 0. For comparison, it is known that with a single read-only one-pass data stream there is a randomized constant-factor approximation for Fk using Õ(N1−2/k) space, and that by sorting, which can be done deterministically in O(log N) passes using 3 read/write streams, Fk can be computed exactly. Therefore, although the ability to manipulate multiple read/write streams can add substantial power to the data stream model, with a sublogarithmic number of passes this does not significantly improve the ability to approximate higher frequency moments efficiently. We obtain our results by showing a new connection between the read/write streams model and the multiparty number-in-hand communication model.

References

  1. Alon, N., Matias, Y., and Szegedy, M. 1999. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58, 1, 137--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Babcock, B., Babu, S., Datar, M., Motwani, R., and Widom, J. 2002. Models and issues in data stream systems. In Proceedings of the 21st Annual ACM Symposium on Principles of Database Systems. 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bar-Yossef, Z., Jayram, T., Kumar, R., and Sivakumar, D. 2004. An information statistics approach to data stream and communication complexity. J. Comput. Syst. Sci. 68, 4, 702--732. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Beame, P., Blais, E., and Huynh-Ngoc, D.-T. 2009. Longest common subsequences in sets of permutations. Tech. rep. arXiv:R0904.1615v1 {math.CO}.Google ScholarGoogle Scholar
  5. Beame, P. and Huynh-Ngoc, D.-T. 2008. On the value of multiple read/write streams for approximating frequency moments. In Proceedings of the 49th Annual Symposium on Foundations of Computer Science. IEEE, 499--508. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Beame, P., Jayram, T. S., and Rudra, A. 2007. Lower bounds for randomized read/write stream algorithms. In Proceedings of the 39th Annual ACM Symposium on Theory of Computing. 689--698. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bhuvanagiri, L., Ganguly, S., Kesh, D., and Saha, C. 2006. Simpler algorithms for estimating frequency moments of data streams. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms. 708--713. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chakrabarti, A., Khot, S., and Sun, X. 2003. Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In Proceedings of the 18th Annual IEEE Conference on Computational Complexity. 107--117.Google ScholarGoogle Scholar
  9. Chakrabarty, A. and Regev, O. 2011. An optimal lower bound on the communication complexity of gap-hamming-distance. In Proceedings of the 43rd Annual ACM Symposium on Theory of Computing. 51--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Grohe, M. and Schweikardt, N. 2005. Lower bounds for sorting with few random accesses to external memory. In Proceedings of the 24th Annual ACM Symposium on Principles of Database Systems. 238--249. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Grohe, M., Hernich, A., and Schweikardt, N. 2006. Randomized computations on large data sets: Tight lower bounds. In Proceedings of the 25th Annual ACM Symposium on Principles of Database Systems. 243--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Grohe, M., Hernich, A., and Schweikardt, N. 2009. Lower bounds for processing data with few random accesses to external memory. J. ACM 56, 3, 1--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gronemeier, A. 2009. Asymptotically optimal lower bounds on the NIH-multi-party information complexity of the AND-function and disjointness. In Proceedings of the 26th Annual Symposium on Theoretical Aspects of Computer Science. 505--516.Google ScholarGoogle Scholar
  14. Indyk, P. and Woodruff, D. 2003. Tight lower bounds for the distinct elements problem. In Proceedings of the 44th Annual Symposium on Foundations of Computer Science. IEEE, 283--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Indyk, P. and Woodruff, D. P. 2005. Optimal approximations of frequency moments of data streams. In Proceedings of the 37th Annual ACM Symposium on Theory of Computing. 202--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jayram, T. S. 2009. Hellinger strikes back: A note on the multi-party information complexity of and. In Proceedings of APPROX-RANDOM. Lecture Notes in Computer Science Series, vol. 5687, Springer, 562--573. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Muthukrishnan, S. 2006. Data streams: Algorithms and applications. Found. Trends Theor. Comput. Sci. 1, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Razborov, A. A. 1992. On the distributional complexity of disjointness. Theor. Comput. Sci. 106, 2, 385--390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Saks, M. E. and Sun, X. 2002. Space lower bounds for distance approximation in the data stream model. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing. 360--369. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Steele, J. M. 1995. Variations on the monotone subsequence problem of Erdös and Szekeres. In Discrete Probability and Algorithms, Aldous, Diaconis, and Steele Eds., Springer, 111--132.Google ScholarGoogle Scholar
  21. Steele, J. M. 1997. Probability Theory and Combinatorial Optimization. SIAM, Philadelphia, PA.Google ScholarGoogle Scholar
  22. Woodruff, D. 2004. Optimal space lower bounds for all frequency moments. In Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms. 167--175. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The Value of Multiple Read/Write Streams for Approximating Frequency Moments

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader
              About Cookies On This Site

              We use cookies to ensure that we give you the best experience on our website.

              Learn more

              Got it!