ABSTRACT
We show that randomization can lead to significant improvements for a few fundamental problems in distributed tracking. Our basis is the count-tracking problem, where there are k players, each holding a counter ni that gets incremented over time, and the goal is to track an ∑-approximation of their sum n=∑ini continuously at all times, using minimum communication. While the deterministic communication complexity of the problem is θ(k/ε • log N), where N is the final value of n when the tracking finishes, we show that with randomization, the communication cost can be reduced to θ(√k/ε • log N). Our algorithm is simple and uses only O(1) space at each player, while the lower bound holds even assuming each player has infinite computing power. Then, we extend our techniques to two related distributed tracking problems: frequency-tracking and rank-tracking, and obtain similar improvements over previous deterministic algorithms. Both problems are of central importance in large data monitoring and analysis, and have been extensively studied in the literature.
- P. K. Agarwal, G. Cormode, Z. Huang, J. M. Phillips, Z. Wei, and K. Yi. Mergeable summaries. In Proc. ACM Symposium on Principles of Database Systems, 2012. Google Scholar
Digital Library
- C. Arackaparambil, J. Brody, and A. Chakrabarti. Functional monitoring without monotonicity. In Proc. International Colloquium on Automata, Languages, and Programming, 2009. Google Scholar
Digital Library
- B. Babcock and C. Olston. Distributed top-k monitoring. In Proc. ACM SIGMOD International Conference on Management of Data, 2003. Google Scholar
Digital Library
- Z. Bar-Yossef. The complexity of massive data set computations. PhD thesis, University of California at Berkeley, 2002. Google Scholar
Digital Library
- H.-L. Chan, T. W. Lam, L.-K. Lee, and H.-F. Ting. Continuous monitoring of distributed data streams over a time-based sliding window. Algorithmica, 62(3-4):1088--1111, 2011. Google Scholar
Digital Library
- G. Cormode, M. Garofalakis, S. Muthukrishnan, and R. Rastogi. Holistic aggregates in a networked world: Distributed tracking of approximate quantiles. In Proc. ACM SIGMOD International Conference on Management of Data, 2005. Google Scholar
Digital Library
- G. Cormode and M. Hadjieleftheriou. Finding frequent items in data streams. In Proc. International Conference on Very Large Data Bases, 2008. Google Scholar
Digital Library
- G. Cormode, S. Muthukrishnan, and K. Yi. Algorithms for distributed functional monitoring. ACM Transactions on Algorithms, 7(2), Article 21, 2011. Preliminary version in SODA'08. Google Scholar
Digital Library
- G. Cormode, S. Muthukrishnan, K. Yi, and Q. Zhang. Continuous sampling from distributed streams. Journal of the ACM, 59(2), 2012. Preliminary version in PODS'10. Google Scholar
Digital Library
- W. Feller. An introduction to probability theory and its applications. Wiley, New York, 1968.Google Scholar
- P. B. Gibbons and S. Tirthapura. Estimating simple functions on the union of data streams. In Proc. ACM Symposium on Parallelism in Algorithms and Architectures, 2001. Google Scholar
Digital Library
- M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proc. ACM SIGMOD International Conference on Management of Data, 2001. Google Scholar
Digital Library
- Z. Huang, L. Wang, K. Yi, and Y. Liu. Sampling based algorithms for quantile computation in sensor networks. In Proc. ACM SIGMOD International Conference on Management of Data, 2011. Google Scholar
Digital Library
- Z. Huang, K. Yi, Y. Liu, and G. Chen. Optimal sampling algorithms for frequency estimation in distributed data. In IEEE INFOCOM, 2011.Google Scholar
Cross Ref
- R. Keralapura, G. Cormode, and J. Ramamirtham. Communication-efficient distributed monitoring of thresholded counts. In Proc. ACM SIGMOD International Conference on Management of Data, 2006. Google Scholar
Digital Library
- A. Manjhi, V. Shkapenyuk, K. Dhamdhere, and C. Olston. Finding (recently) frequent items in distributed data streams. In Proc. IEEE International Conference on Data Engineering, 2005. Google Scholar
Digital Library
- G. Manku and R. Motwani. Approximate frequency counts over data streams. In Proc. International Conference on Very Large Data Bases, 2002. Google Scholar
Digital Library
- A. Metwally, D. Agrawal, and A. Abbadi. An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Transactions on Database Systems, 31(3):1095--1133, 2006. Google Scholar
Digital Library
- J. Misra and D. Gries. Finding repeated elements. Science of Computer Programming, 2:143--152, 1982.Google Scholar
Cross Ref
- J. I. Munro and M. S. Paterson. Selection and sorting with limited storage. Theoretical Computer Science, 12:315--323, 1980.Google Scholar
- B. Patt-Shamir and A. Shafrir. Approximate distributed top-k queries. Distributed Computing, 21(1):1--22, 2008.Google Scholar
Digital Library
- S. Suri, C. Toth, and Y. Zhou. Range counting over multidimensional data streams. Discrete and Computational Geometry, 2006. Google Scholar
Digital Library
- S. Tirthapura and D. P. Woodruff. Optimal random sampling from distributed streams revisited. In Proc. International Symposium on Distributed Computing, 2011. Google Scholar
Digital Library
- V. N. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16:264--280, 1971.Google Scholar
Cross Ref
- D. P. Woodruff. Efficient and Private Distance Approximation in the Communication and Streaming Models. PhD thesis, Massachusetts Institute of Technology, 2007. Google Scholar
Digital Library
- D. P. Woodruff and Q. Zhang. Tight bounds for distributed functional monitoring. In Proc. ACM Symposium on Theory of Computing, 2012. Google Scholar
Digital Library
- A. C. Yao. Probabilistic computations: Towards a unified measure of complexity. In Proc. IEEE Symposium on Foundations of Computer Science, 1977. Google Scholar
Digital Library
- K. Yi and Q. Zhang. Optimal tracking of distributed heavy hitters and quantiles. In Proc. ACM Symposium on Principles of Database Systems, 2009. Google Scholar
Digital Library
Index Terms
Randomized algorithms for tracking distributed count, frequencies, and ranks
Recommendations
Randomized Algorithms for Tracking Distributed Count, Frequencies, and Ranks
AbstractWe show that randomization can lead to significant improvements for a few fundamental problems in distributed tracking. Our basis is the count-tracking problem, where there are k players, each holding a counter that gets incremented over time, ...
An optimal bit complexity randomized distributed MIS algorithm (extended abstract)
SIROCCO'09: Proceedings of the 16th international conference on Structural Information and Communication ComplexityWe present a randomized distributed maximal independent set (MIS) algorithm for arbitrary graphs of size n that halts in time O(logn) with probability 1−o(n−1), each message containing 1 bit: thus its bit complexity per channel is O(logn) (the bit ...
An optimal bit complexity randomized distributed MIS algorithm
We present a randomized distributed maximal independent set (MIS) algorithm for arbitrary graphs of size n that halts in time O(log n) with probability 1 ? o(n?1), and only needs messages containing 1 bit. Thus, its bit complexity par channel is O(log n)...






Comments