skip to main content
10.1145/2213556.2213595acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Rectangle-efficient aggregation in spatial data streams

Published:21 May 2012Publication History

ABSTRACT

We consider the estimation of aggregates over a data stream of multidimensional axis-aligned rectangles. Rectangles are a basic primitive object in spatial databases, and efficient aggregation of rectangles is a fundamental task. The data stream model has emerged as a de facto model for processing massive databases in which the data resides in external memory or the cloud and is streamed through main memory. For a point p, let n(p) denote the sum of the weights of all rectangles in the stream that contain p. We give near-optimal solutions for basic problems, including (1) the k-th frequency moment Fk = ∑ points p|n(p)|k, (2)~the counting version of stabbing queries, which seeks an estimate of n(p) given p, and (3) identification of heavy-hitters, i.e., points p for which n(p) is large. An important special case of Fk is F0, which corresponds to the volume of the union of the rectangles. This is a celebrated problem in computational geometry known as "Klee's measure problem", and our work yields the first solution in the streaming model for dimensions greater than one.

References

  1. http://www.opengeospatial.org/.Google ScholarGoogle Scholar
  2. Noga Alon, Yossi Matias, and Mario Szegedy. The Space Complexity of Approximating the Frequency Moments. J. Comput. Syst. Sci., 58(1):137--147, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Streaming algorithms via precision sampling. In FOCS, pages 363--372, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, and D. Sivakumar. An information statistics approach to data stream and communication complexity. J. Comput. Syst. Sci., 68(4):702--732, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ziv Bar-Yossef, Ravi Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In SODA, pages 623--632, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Michael Benedikt and Leonid Libkin. Exact and approximate aggregation in constraint query languages. In PODS, pages 102--113, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J.L. Bentley. Algorithms for Klee's rectangle problem. Unpublished notes, Computer Science Department, Carnegie Mellon University, 1978.Google ScholarGoogle Scholar
  8. Lakshminath Bhuvanagiri, Sumit Ganguly, Deepanjan Kesh, and Chandan Saha. Simpler algorithm for estimating frequency moments of data streams. In SODA, pages 708--713, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Vladimir Braverman and Rafail Ostrovsky. Recursive sketching for frequency moments. CoRR, abs/1011.2571, 2010.Google ScholarGoogle Scholar
  10. Vladimir Braverman and Rafail Ostrovsky. Zero-one frequency laws. In STOC, pages 281--290, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Mengchu Cai, Dinesh Keshwani, and Peter Z. Revesz. Parametric rectangles: A model for querying and animation of spatiotemporal databases. In EDBT, pages 430--444, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Robert Calderbank, Anna C. Gilbert, Kirill Levchenko, S. Muthukrishnan, and Martin Strauss. Improved range-summable random variable construction algorithms. In SODA, pages 840--849, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Timothy M. Chan. A (slightly) faster algorithm for Klee's measure problem. In SoCG, pages 94--100, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Timothy M. Chan and Mihai Patrascu. Counting inversions, offline orthogonal range counting, and related problems. In SODA, pages 161--173, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. In ICALP, pages 693--703, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Eric Y. Chen and Timothy M. Chan. Space-efficient algorithms for Klee's measure problem. In CCCG, pages 27--30, 2005.Google ScholarGoogle Scholar
  17. Bogdan S. Chlebus. On the Klee's measure problem in small dimensions. In SOFSEM '98: Proceedings of the 25th Conference on Current Trends in Theory and Practice of Informatics, pages 304--311, London, UK, 1998. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jeffrey Considine, Feifei Li, George Kollios, and John Byers. Approximate aggregation techniques for sensor databases. In ICDE, page 449, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Graham Cormode and S. Muthukrishnan. Estimating dominance norms of multiple data streams. In ESA, pages 148--160, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  20. Graham Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. J. Algorithms, 55(1):58--75, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Abhinandan Das, Johannes Gehrke, and Mirek Riedewald. Approximation techniques for spatial data. In SIGMOD, pages 695--706, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Paul Fischer and Klaus-Uwe H Ãűffgen. Computing a maximum axis-aligned rectangle in a convex polygon. Inf. Process. Lett., 51(4):189--193, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Michael L. Fredman and Bruce Weide. On the complexity of computing the measure of {{ai; bi}. CACM, 21(7):540--544, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Hillel Gazit. New upper bounds in Klee's measure problem. SIAM Journal on Computing, 20(6):1034--1045, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Anna C. Gilbert, Yannis Kotidis, S. Muthukrishnan, and Martin Strauss. Surfng wavelets on streams: One-pass summaries for approximate aggregate queries. In VLDB, pages 79--88, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Anna C. Gilbert, Yannis Kotidis, S. Muthukrishnan, and Martin Strauss. One-pass wavelet decompositions of data streams. IEEE Trans. Knowl. Data Eng., 15(3):541--554, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Oded Goldreich. A sample of samplers: A computational perspective on sampling. In Studies in Complexity and Cryptography, pages 302--332. 2011.Google ScholarGoogle ScholarCross RefCross Ref
  28. Ralf Hartmut Güting. An introduction to spatial database systems. VLDB J., 3(4):357--399, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. John Hershberger, Nisheeth Shrivastava, and Subhash Suri. Cluster hull: A technique for summarizing spatial data streams. In ICDE, page 138, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Piotr Indyk. Algorithms for dynamic geometric problems over data streams. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC), pages 373--380, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM, 53(3):307--323, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Piotr Indyk and David P. Woodruff. Optimal approximations of the frequency moments of data streams. In STOC, pages 202--208, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. V. Klee. Can the measure of {{ai; bi} be computed in less than O(n log n) steps? In American Mathematical Monthly, volume 84, pages 284--285, 1977.Google ScholarGoogle Scholar
  34. Gabriel M. Kuper, Leonid Libkin, and Jan Paredaens, editors. Constraint Databases. Springer, 2000.Google ScholarGoogle Scholar
  35. Iosif Lazaridis and Sharad Mehrotra. Progressive approximate aggregate queries with a multi-resolution tree structure. In SIGMOD, pages 401--412, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. K. Levchenko and Y.-K Liu. Counting solutions of polynomial equations, 2005. Manuscript.Google ScholarGoogle Scholar
  37. S. Muthukrishnan. Data Streams: Algorithms and Applications. Foundations and Trends in Theoretical Computer Science, 1(2):117--236, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Suman Nath, Phillip B. Gibbons, Srinivasan Seshan, and Zachary R. Anderson. Synopsis diffusion for robust aggregation in sensor networks. In SENSYS, pages 250--262, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Mark Overmars. Geometric data structures for computer graphics: an overview. In Theoretical Foundations of Computer Graphics and CAD, pages 21--49, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  40. Mark H. Overmars and Chee-Keng Yap. New upper bounds in Klee's measure problem. SICOMP, 20(6):1034--1045, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Rasmus Pagh. Compressed matrix multiplication. In ICTS, pages 442--451, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Dimitris Papadias, Panos Kalnis, Jun Zhang, and Yufei Tao. Efficient OLAP operations in spatial data warehouses. In SSTD, pages 443--459, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. A. Pavan and Srikanta Tirthapura. Range-efficient counting of distinct elements in a massive data stream. SIAM J. Comput., 37(2):359--379, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Florin Rusu and Alin Dobra. Fast range-summable random variables for efficient aggregate estimation. In SIGMOD, pages 193--204, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Gokarna Sharma, Costas Busch, and Srikanta Tirthapura. A streaming approximation algorithm for Klee's measure problem. CoRR, abs/1004.1569, 2010.Google ScholarGoogle Scholar
  46. Cheng Sheng and Yufei Tao. New results on two-dimensional orthogonal range aggregation in external memory. In PODS, pages 129--139, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. He Sun and Chung Keung Poon. Two improved range-efficient algorithms for F0 estimation. Theor. Comput. Sci., 410(11):1073--1080, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Yufei Tao and Dimitris Papadias. Range aggregate processing in spatial databases. IEEE TKDE, 16(12):1555--1570, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Nitin Thaper, Sudipto Guha, Piotr Indyk, and Nick Koudas. Dynamic multidimensional histograms. In SIGMOD, pages 428--429, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Jan Vahrenhold. An in-place algorithm for Klee's measure problem in two dimensions. IPL, 102(4):169--174, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Donghui Zhang, Alexander Markowetz, Vassilis J. Tsotras, Dimitrios Gunopulos, and Bernhard Seeger. On computing temporal aggregates with range predicates. ACM Trans. Database Syst., 33(2), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Rectangle-efficient aggregation in spatial data streams

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!