skip to main content
10.1145/2213556.2213597acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Continuous distributed counting for non-monotonic streams

Published:21 May 2012Publication History

ABSTRACT

We consider the continual count tracking problem in a distributed environment where the input is an aggregate stream that originates from k distinct sites and the updates are allowed to be non-monotonic, i.e. both increments and decrements are allowed. The goal is to continually track the count within a prescribed relative accuracy ε at the lowest possible communication cost. Specifically, we consider an adversarial setting where the input values are selected and assigned to sites by an adversary but the order is according to a random permutation or is a random i.i.d process. The input stream of values is allowed to be non-monotonic with an unknown drift -1≤μ=1 where the case μ = 1 corresponds to the special case of a monotonic stream of only non-negative updates. We show that a randomized algorithm guarantees to track the count accurately with high probability and has the expected communication cost Õ(min√k/(|#956;|ε), √k n/ε, n}), for an input stream of length n, and establish matching lower bounds. This improves upon previously best known algorithm whose expected communication cost is Θ(min√k/ε,n]) that applies only to an important but more restrictive class of monotonic input streams, and our results are substantially more positive than the communication complexity of Ω(n) under fully adversarial input. We also show how our framework can also accommodate other types of random input streams, including fractional Brownian motion that has been widely used to model temporal long-range dependencies observed in many natural phenomena. Last but not least, we show how our non-monotonic counter can be applied to track the second frequency moment and to a Bayesian linear regression problem.

References

  1. C. Arackaparambil, J. Brody, and A. Chakrabarti. Functional monitoring without monotonicity. In Proc. of ICALP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Bishop. Pattern Recognition and Machine Learning. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. Cormode and M. Garofalakis. Sketching streams through the net: Distributed approximate query tracking. In Proc. of International Conference on Very Large Databases, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Cormode, M. Garofalakis, P. J. Haas, and C. Jermaine. Synopses for massive data: Samples, histograms, wavelets, sketches. Foundations and Trends in Databases, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Cormode, M. Garofalakis, S. Muthukrishnan, and R. Rastogi. Holistic aggregates in a networked world: Distributed tracking of approximate quantiles. In Proc. of SIGMOD, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Cormode, S. Muthhukrishnan, and W. Zhuang. What's different: Distributed, continuous monitoring of duplicate-resilient aggregates on data streams. In Proc. IEEE International Conference on Data Engineering, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Cormode, S. Muthukrishnan, and K. Yi. Algorithms for distributed functional monitoring. In Proc. of SODA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Cormode, S. Muthukrishnan, K. Yi, and Q. Zhang. Optimal sampling from distributed streams. In Proc. of PODS, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. B. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proc. of SIGMOD, pages 58--66, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. B. Greenwald and S. Khanna. Power-conserving computation of order-statistics over sensor networks. In Proc. of PODS, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W. Hoeffding. Probability inequalities for sums of bounded random variables. American Statistical Association Journal, pages 13--30, March 1963.Google ScholarGoogle ScholarCross RefCross Ref
  12. Z. Huang, K. Yi, and Q. Zhang. Randomized algorithms for tracking distributed count, frequencies, and ranks. In arXiv:1108.3413v1, Aug 2011.Google ScholarGoogle Scholar
  13. T. Konstantopoulos. Markov Chains and Random Walks. Lecture notes, 2009.Google ScholarGoogle Scholar
  14. W. Leland, M. Taqqu, W. Willinger, and D. Wilson. On the self-similar nature of ethernet traffic. IEEE/ACM Transactions on Networking, 2(1):1--15, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Z. Liu, B. Radunović, and R. Vojnović. Continuous distributed counting for non-monotonic streams. In Technical Report MSR-TR-2011-128, 2011.Google ScholarGoogle Scholar
  16. Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. Hellerstein. Graphlab: A new framework for parallel machine learning. In Proc. of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), 2010.Google ScholarGoogle Scholar
  17. S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Computer Science, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Olston, J. Jiang, and J. Widom. Adaptive filters for continuous queries over distributed data streams. In Proc. ACM SIGMOD International Conference on Management of Data, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Samorodnitsky and M. S. Taqqu. Stable non-Gaussian random processes. Chapman & Hall, 1994.Google ScholarGoogle Scholar
  20. R. J. Serfling. Probability inequalities for the sum in sampling without replacement. Ann. Statist, 2(1):39--48, 1974.Google ScholarGoogle ScholarCross RefCross Ref
  21. T. S. Team. The engineering behind twitter's new search experience, 2011.Google ScholarGoogle Scholar
  22. S. Trithapura and D. P. Woodruff. Optimal random sampling from distributed streams revisited. In Proc. of DISC, Roma, Italy, Sep 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. Yi and Q. Zhang. Optimal tracking of distributed heavy hitters and quantiles. In Proc. of PODS, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Continuous distributed counting for non-monotonic streams

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PODS '12: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems
        May 2012
        332 pages
        ISBN:9781450312486
        DOI:10.1145/2213556

        Copyright © 2012 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 May 2012

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate476of1,835submissions,26%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!