ABSTRACT
We consider the continual count tracking problem in a distributed environment where the input is an aggregate stream that originates from k distinct sites and the updates are allowed to be non-monotonic, i.e. both increments and decrements are allowed. The goal is to continually track the count within a prescribed relative accuracy ε at the lowest possible communication cost. Specifically, we consider an adversarial setting where the input values are selected and assigned to sites by an adversary but the order is according to a random permutation or is a random i.i.d process. The input stream of values is allowed to be non-monotonic with an unknown drift -1≤μ=1 where the case μ = 1 corresponds to the special case of a monotonic stream of only non-negative updates. We show that a randomized algorithm guarantees to track the count accurately with high probability and has the expected communication cost Õ(min√k/(|#956;|ε), √k n/ε, n}), for an input stream of length n, and establish matching lower bounds. This improves upon previously best known algorithm whose expected communication cost is Θ(min√k/ε,n]) that applies only to an important but more restrictive class of monotonic input streams, and our results are substantially more positive than the communication complexity of Ω(n) under fully adversarial input. We also show how our framework can also accommodate other types of random input streams, including fractional Brownian motion that has been widely used to model temporal long-range dependencies observed in many natural phenomena. Last but not least, we show how our non-monotonic counter can be applied to track the second frequency moment and to a Bayesian linear regression problem.
- C. Arackaparambil, J. Brody, and A. Chakrabarti. Functional monitoring without monotonicity. In Proc. of ICALP, 2009. Google Scholar
Digital Library
- C. Bishop. Pattern Recognition and Machine Learning. Springer, 2006. Google Scholar
Digital Library
- G. Cormode and M. Garofalakis. Sketching streams through the net: Distributed approximate query tracking. In Proc. of International Conference on Very Large Databases, 2005. Google Scholar
Digital Library
- G. Cormode, M. Garofalakis, P. J. Haas, and C. Jermaine. Synopses for massive data: Samples, histograms, wavelets, sketches. Foundations and Trends in Databases, 2011. Google Scholar
Digital Library
- G. Cormode, M. Garofalakis, S. Muthukrishnan, and R. Rastogi. Holistic aggregates in a networked world: Distributed tracking of approximate quantiles. In Proc. of SIGMOD, June 2005. Google Scholar
Digital Library
- G. Cormode, S. Muthhukrishnan, and W. Zhuang. What's different: Distributed, continuous monitoring of duplicate-resilient aggregates on data streams. In Proc. IEEE International Conference on Data Engineering, 2006. Google Scholar
Digital Library
- G. Cormode, S. Muthukrishnan, and K. Yi. Algorithms for distributed functional monitoring. In Proc. of SODA, 2008. Google Scholar
Digital Library
- G. Cormode, S. Muthukrishnan, K. Yi, and Q. Zhang. Optimal sampling from distributed streams. In Proc. of PODS, June 2010. Google Scholar
Digital Library
- M. B. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proc. of SIGMOD, pages 58--66, 2001. Google Scholar
Digital Library
- M. B. Greenwald and S. Khanna. Power-conserving computation of order-statistics over sensor networks. In Proc. of PODS, 2004. Google Scholar
Digital Library
- W. Hoeffding. Probability inequalities for sums of bounded random variables. American Statistical Association Journal, pages 13--30, March 1963.Google Scholar
Cross Ref
- Z. Huang, K. Yi, and Q. Zhang. Randomized algorithms for tracking distributed count, frequencies, and ranks. In arXiv:1108.3413v1, Aug 2011.Google Scholar
- T. Konstantopoulos. Markov Chains and Random Walks. Lecture notes, 2009.Google Scholar
- W. Leland, M. Taqqu, W. Willinger, and D. Wilson. On the self-similar nature of ethernet traffic. IEEE/ACM Transactions on Networking, 2(1):1--15, 1994. Google Scholar
Digital Library
- Z. Liu, B. Radunović, and R. Vojnović. Continuous distributed counting for non-monotonic streams. In Technical Report MSR-TR-2011-128, 2011.Google Scholar
- Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. Hellerstein. Graphlab: A new framework for parallel machine learning. In Proc. of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), 2010.Google Scholar
- S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Computer Science, 2005. Google Scholar
Digital Library
- C. Olston, J. Jiang, and J. Widom. Adaptive filters for continuous queries over distributed data streams. In Proc. ACM SIGMOD International Conference on Management of Data, 2003. Google Scholar
Digital Library
- G. Samorodnitsky and M. S. Taqqu. Stable non-Gaussian random processes. Chapman & Hall, 1994.Google Scholar
- R. J. Serfling. Probability inequalities for the sum in sampling without replacement. Ann. Statist, 2(1):39--48, 1974.Google Scholar
Cross Ref
- T. S. Team. The engineering behind twitter's new search experience, 2011.Google Scholar
- S. Trithapura and D. P. Woodruff. Optimal random sampling from distributed streams revisited. In Proc. of DISC, Roma, Italy, Sep 2011. Google Scholar
Digital Library
- K. Yi and Q. Zhang. Optimal tracking of distributed heavy hitters and quantiles. In Proc. of PODS, June 2009. Google Scholar
Digital Library
Index Terms
Continuous distributed counting for non-monotonic streams
Recommendations
Processing count queries over event streams at multiple time granularities
Management and analysis of streaming data has become crucial with its applications to web, sensor data, network traffic data, and stock market. Data streams consist of mostly numeric data but what is more interesting are the events derived from the ...
Concurrent counting is harder than queuing
We compare the complexities of two fundamental distributed coordination problems, distributed counting and distributed queuing, in a concurrent setting. In both distributed counting and queuing, processors in a distributed system issue operations which ...
Down for the count? Getting reference counting back in the ring
ISMM '12Reference counting and tracing are the two fundamental approaches that have underpinned garbage collection since 1960. However, despite some compelling advantages, reference counting is almost completely ignored in implementations of high performance ...






Comments