skip to main content
10.1145/2213556.2213596acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Randomized algorithms for tracking distributed count, frequencies, and ranks

Authors Info & Claims
Published:21 May 2012Publication History

ABSTRACT

We show that randomization can lead to significant improvements for a few fundamental problems in distributed tracking. Our basis is the count-tracking problem, where there are k players, each holding a counter ni that gets incremented over time, and the goal is to track an ∑-approximation of their sum n=∑ini continuously at all times, using minimum communication. While the deterministic communication complexity of the problem is θ(k/ε • log N), where N is the final value of n when the tracking finishes, we show that with randomization, the communication cost can be reduced to θ(√k/ε • log N). Our algorithm is simple and uses only O(1) space at each player, while the lower bound holds even assuming each player has infinite computing power. Then, we extend our techniques to two related distributed tracking problems: frequency-tracking and rank-tracking, and obtain similar improvements over previous deterministic algorithms. Both problems are of central importance in large data monitoring and analysis, and have been extensively studied in the literature.

References

  1. P. K. Agarwal, G. Cormode, Z. Huang, J. M. Phillips, Z. Wei, and K. Yi. Mergeable summaries. In Proc. ACM Symposium on Principles of Database Systems, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Arackaparambil, J. Brody, and A. Chakrabarti. Functional monitoring without monotonicity. In Proc. International Colloquium on Automata, Languages, and Programming, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Babcock and C. Olston. Distributed top-k monitoring. In Proc. ACM SIGMOD International Conference on Management of Data, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Z. Bar-Yossef. The complexity of massive data set computations. PhD thesis, University of California at Berkeley, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H.-L. Chan, T. W. Lam, L.-K. Lee, and H.-F. Ting. Continuous monitoring of distributed data streams over a time-based sliding window. Algorithmica, 62(3-4):1088--1111, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Cormode, M. Garofalakis, S. Muthukrishnan, and R. Rastogi. Holistic aggregates in a networked world: Distributed tracking of approximate quantiles. In Proc. ACM SIGMOD International Conference on Management of Data, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Cormode and M. Hadjieleftheriou. Finding frequent items in data streams. In Proc. International Conference on Very Large Data Bases, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Cormode, S. Muthukrishnan, and K. Yi. Algorithms for distributed functional monitoring. ACM Transactions on Algorithms, 7(2), Article 21, 2011. Preliminary version in SODA'08. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Cormode, S. Muthukrishnan, K. Yi, and Q. Zhang. Continuous sampling from distributed streams. Journal of the ACM, 59(2), 2012. Preliminary version in PODS'10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. Feller. An introduction to probability theory and its applications. Wiley, New York, 1968.Google ScholarGoogle Scholar
  11. P. B. Gibbons and S. Tirthapura. Estimating simple functions on the union of data streams. In Proc. ACM Symposium on Parallelism in Algorithms and Architectures, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proc. ACM SIGMOD International Conference on Management of Data, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Z. Huang, L. Wang, K. Yi, and Y. Liu. Sampling based algorithms for quantile computation in sensor networks. In Proc. ACM SIGMOD International Conference on Management of Data, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Z. Huang, K. Yi, Y. Liu, and G. Chen. Optimal sampling algorithms for frequency estimation in distributed data. In IEEE INFOCOM, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  15. R. Keralapura, G. Cormode, and J. Ramamirtham. Communication-efficient distributed monitoring of thresholded counts. In Proc. ACM SIGMOD International Conference on Management of Data, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Manjhi, V. Shkapenyuk, K. Dhamdhere, and C. Olston. Finding (recently) frequent items in distributed data streams. In Proc. IEEE International Conference on Data Engineering, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Manku and R. Motwani. Approximate frequency counts over data streams. In Proc. International Conference on Very Large Data Bases, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Metwally, D. Agrawal, and A. Abbadi. An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Transactions on Database Systems, 31(3):1095--1133, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Misra and D. Gries. Finding repeated elements. Science of Computer Programming, 2:143--152, 1982.Google ScholarGoogle ScholarCross RefCross Ref
  20. J. I. Munro and M. S. Paterson. Selection and sorting with limited storage. Theoretical Computer Science, 12:315--323, 1980.Google ScholarGoogle Scholar
  21. B. Patt-Shamir and A. Shafrir. Approximate distributed top-k queries. Distributed Computing, 21(1):1--22, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Suri, C. Toth, and Y. Zhou. Range counting over multidimensional data streams. Discrete and Computational Geometry, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Tirthapura and D. P. Woodruff. Optimal random sampling from distributed streams revisited. In Proc. International Symposium on Distributed Computing, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. V. N. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16:264--280, 1971.Google ScholarGoogle ScholarCross RefCross Ref
  25. D. P. Woodruff. Efficient and Private Distance Approximation in the Communication and Streaming Models. PhD thesis, Massachusetts Institute of Technology, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. P. Woodruff and Q. Zhang. Tight bounds for distributed functional monitoring. In Proc. ACM Symposium on Theory of Computing, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. C. Yao. Probabilistic computations: Towards a unified measure of complexity. In Proc. IEEE Symposium on Foundations of Computer Science, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. K. Yi and Q. Zhang. Optimal tracking of distributed heavy hitters and quantiles. In Proc. ACM Symposium on Principles of Database Systems, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Randomized algorithms for tracking distributed count, frequencies, and ranks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PODS '12: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems
      May 2012
      332 pages
      ISBN:9781450312486
      DOI:10.1145/2213556

      Copyright © 2012 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 May 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate476of1,835submissions,26%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!