skip to main content
10.1145/1559795.1559820acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Optimal tracking of distributed heavy hitters and quantiles

Published:29 June 2009Publication History

ABSTRACT

We consider the the problem of tracking heavy hitters and quantiles in the distributed streaming model. The heavy hitters and quantiles are two important statistics for characterizing a data distribution. Let A be a multiset of elements, drawn from the universe U={1,...,u}. For a given 0 ≤ Φ ≤ 1, the Φ-heavy hitters are those elements of A whose frequency in A is at least Φ |A|; the Φ-quantile of A is an element x of U such that at most Φ|A| elements of A are smaller than A and at most (1-Φ)|A| elements of A are greater than x. Suppose the elements of A are received at k remote sites over time, and each of the sites has a two-way communication channel to a designated coordinator, whose goal is to track the set of Φ-heavy hitters and the Φ-quantile of A approximately at all times with minimum communication. We give tracking algorithms with worst-case communication cost O(k/ε ⋅ log n) for both problems, where n is the total number of items in A, and ε is the approximation error. This substantially improves upon the previous known algorithms. We also give matching lower bounds on the communication costs for both problems, showing that our algorithms are optimal. We also consider a more general version of the problem where we simultaneously track the Φ-quantiles for all 0 ≤ Φ ≤ 1.

References

  1. N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 58:137--147, 1999. See also STOC'96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proc. ACM Symposium on Principles of Database Systems, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Babcock and C. Olston. Distributed top-k monitoring. In Proc. ACM SIGMOD International Conference on Management of Data, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Cormode and M. Garofalakis. Sketching streams through the net: Distributed approximate query tracking. In Proc. International Conference on Very Large Databases, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Cormode, M. Garofalakis, S. Muthukrishnan, and R. Rastogi. Holistic aggregates in a networked world: Distributed tracking of approximate quantiles. In Proc. ACM SIGMOD International Conference on Management of Data, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Cormode and M. Hadjieleftheriou. Finding frequent items in data streams. In Proc. International Conference on Very Large Databases, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Cormode, F. Korn, S. Muthukrishnan, and D. Srivastava. Space- and time-efficient deterministic algorithms for biased quantiles over data streams. In Proc. ACM Symposium on Principles of Database Systems, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Cormode and S. Muthukrishnan. What's hot and what's not: tracking most frequent items dynamically. In Proc. ACM Symposium on Principles of Database Systems, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Cormode, S. Muthukrishnan, and K. Yi. Algorithms for distributed functional monitoring. In Proc. ACM-SIAM Symposium on Discrete Algorithms, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Cormode, S. Muthukrishnan, and W. Zhuang. What's different: Distributed, continuous monitoring of duplicate-resilient aggregates on data streams. In Proc. IEEE International Conference on Data Engineering, pages 20--31, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Cormode, S. Muthukrishnan, and W. Zhuang. Conquering the divide: Continuous clustering of distributed data streams. In Proc. IEEE International Conference on Data Engineering, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  12. A. Deshpande, C. Guestrin, S.R. Madden, J.M. Hellerstein, andW. Hong. Model-driven data acquisition in sensor networks. In Proc. International Conference on Very Large Databases, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Fuller and M. Kantardzic. FIDS: Monitoring frequent items over distributed data streams. In MLDM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A.C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M.J. Strauss. How to summarize the universe: Dynamic maintenance of quantiles. In Proc. International Conference on Very Large Databases, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proc. ACM SIGMOD International Conference on Management of Data, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R.M. Karp, S. Shenker, and C.H. Papadimitriou. A simple algorithm for finding frequent elements in streams and bags. ACM Transactions on Database Systems, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Keralapura, G. Cormode, and J. Ramamirtham. Communication-efficient distributed monitoring of thresholded counts. In Proc. ACM SIGMOD International Conference on Management of Data, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Manjhi, V. Shkapenyuk, K. Dhamdhere, and C. Olston. Finding (recently) frequent items in distributed data streams. In Proc. IEEE International Conference on Data Engineering, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Manku and R. Motwani. Approximate frequency counts over data streams. In Proc. International Conference on Very Large Databases, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Metwally, D. Agrawal, and A.E. Abbadi. An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Transactions on Database Systems, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Olston, J. Jiang, and J. Widom. Adaptive filters for continuous queries over distributed data streams. In Proc. ACM SIGMOD International Conference on Management of Data, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Olston and J. Widom. Efficient monitoring and querying of distributed, dynamic data via approximate replication. IEEE Data Engineering Bulletin, 2005.Google ScholarGoogle Scholar
  23. I. Sharfman, A. Schuster, and D. Keren. Shape sensitive geometric monitoring. In Proc. ACM Symposium on Principles of Database Systems, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A.C. Yao. Some complexity questions related to distributive computing. In Proc. ACM Symposium on Theory of Computation, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimal tracking of distributed heavy hitters and quantiles

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PODS '09: Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
        June 2009
        298 pages
        ISBN:9781605585536
        DOI:10.1145/1559795
        • General Chair:
        • Jan Paredaens,
        • Program Chair:
        • Jianwen Su

        Copyright © 2009 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 29 June 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate476of1,835submissions,26%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!