ABSTRACT
We consider the estimation of aggregates over a data stream of multidimensional axis-aligned rectangles. Rectangles are a basic primitive object in spatial databases, and efficient aggregation of rectangles is a fundamental task. The data stream model has emerged as a de facto model for processing massive databases in which the data resides in external memory or the cloud and is streamed through main memory. For a point p, let n(p) denote the sum of the weights of all rectangles in the stream that contain p. We give near-optimal solutions for basic problems, including (1) the k-th frequency moment Fk = ∑ points p|n(p)|k, (2)~the counting version of stabbing queries, which seeks an estimate of n(p) given p, and (3) identification of heavy-hitters, i.e., points p for which n(p) is large. An important special case of Fk is F0, which corresponds to the volume of the union of the rectangles. This is a celebrated problem in computational geometry known as "Klee's measure problem", and our work yields the first solution in the streaming model for dimensions greater than one.
- http://www.opengeospatial.org/.Google Scholar
- Noga Alon, Yossi Matias, and Mario Szegedy. The Space Complexity of Approximating the Frequency Moments. J. Comput. Syst. Sci., 58(1):137--147, 1999. Google Scholar
Digital Library
- Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Streaming algorithms via precision sampling. In FOCS, pages 363--372, 2011. Google Scholar
Digital Library
- Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, and D. Sivakumar. An information statistics approach to data stream and communication complexity. J. Comput. Syst. Sci., 68(4):702--732, 2004. Google Scholar
Digital Library
- Ziv Bar-Yossef, Ravi Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In SODA, pages 623--632, 2002. Google Scholar
Digital Library
- Michael Benedikt and Leonid Libkin. Exact and approximate aggregation in constraint query languages. In PODS, pages 102--113, 1999. Google Scholar
Digital Library
- J.L. Bentley. Algorithms for Klee's rectangle problem. Unpublished notes, Computer Science Department, Carnegie Mellon University, 1978.Google Scholar
- Lakshminath Bhuvanagiri, Sumit Ganguly, Deepanjan Kesh, and Chandan Saha. Simpler algorithm for estimating frequency moments of data streams. In SODA, pages 708--713, 2006. Google Scholar
Digital Library
- Vladimir Braverman and Rafail Ostrovsky. Recursive sketching for frequency moments. CoRR, abs/1011.2571, 2010.Google Scholar
- Vladimir Braverman and Rafail Ostrovsky. Zero-one frequency laws. In STOC, pages 281--290, 2010. Google Scholar
Digital Library
- Mengchu Cai, Dinesh Keshwani, and Peter Z. Revesz. Parametric rectangles: A model for querying and animation of spatiotemporal databases. In EDBT, pages 430--444, 2000. Google Scholar
Digital Library
- A. Robert Calderbank, Anna C. Gilbert, Kirill Levchenko, S. Muthukrishnan, and Martin Strauss. Improved range-summable random variable construction algorithms. In SODA, pages 840--849, 2005. Google Scholar
Digital Library
- Timothy M. Chan. A (slightly) faster algorithm for Klee's measure problem. In SoCG, pages 94--100, 2008. Google Scholar
Digital Library
- Timothy M. Chan and Mihai Patrascu. Counting inversions, offline orthogonal range counting, and related problems. In SODA, pages 161--173, 2010. Google Scholar
Digital Library
- Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. In ICALP, pages 693--703, 2002. Google Scholar
Digital Library
- Eric Y. Chen and Timothy M. Chan. Space-efficient algorithms for Klee's measure problem. In CCCG, pages 27--30, 2005.Google Scholar
- Bogdan S. Chlebus. On the Klee's measure problem in small dimensions. In SOFSEM '98: Proceedings of the 25th Conference on Current Trends in Theory and Practice of Informatics, pages 304--311, London, UK, 1998. Springer-Verlag. Google Scholar
Digital Library
- Jeffrey Considine, Feifei Li, George Kollios, and John Byers. Approximate aggregation techniques for sensor databases. In ICDE, page 449, 2004. Google Scholar
Digital Library
- Graham Cormode and S. Muthukrishnan. Estimating dominance norms of multiple data streams. In ESA, pages 148--160, 2003.Google Scholar
Cross Ref
- Graham Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. J. Algorithms, 55(1):58--75, 2005. Google Scholar
Digital Library
- Abhinandan Das, Johannes Gehrke, and Mirek Riedewald. Approximation techniques for spatial data. In SIGMOD, pages 695--706, 2004. Google Scholar
Digital Library
- Paul Fischer and Klaus-Uwe H Ãűffgen. Computing a maximum axis-aligned rectangle in a convex polygon. Inf. Process. Lett., 51(4):189--193, 1994. Google Scholar
Digital Library
- Michael L. Fredman and Bruce Weide. On the complexity of computing the measure of {{ai; bi}. CACM, 21(7):540--544, 1978. Google Scholar
Digital Library
- Hillel Gazit. New upper bounds in Klee's measure problem. SIAM Journal on Computing, 20(6):1034--1045, 1991. Google Scholar
Digital Library
- Anna C. Gilbert, Yannis Kotidis, S. Muthukrishnan, and Martin Strauss. Surfng wavelets on streams: One-pass summaries for approximate aggregate queries. In VLDB, pages 79--88, 2001. Google Scholar
Digital Library
- Anna C. Gilbert, Yannis Kotidis, S. Muthukrishnan, and Martin Strauss. One-pass wavelet decompositions of data streams. IEEE Trans. Knowl. Data Eng., 15(3):541--554, 2003. Google Scholar
Digital Library
- Oded Goldreich. A sample of samplers: A computational perspective on sampling. In Studies in Complexity and Cryptography, pages 302--332. 2011.Google Scholar
Cross Ref
- Ralf Hartmut Güting. An introduction to spatial database systems. VLDB J., 3(4):357--399, 1994. Google Scholar
Digital Library
- John Hershberger, Nisheeth Shrivastava, and Subhash Suri. Cluster hull: A technique for summarizing spatial data streams. In ICDE, page 138, 2006. Google Scholar
Digital Library
- Piotr Indyk. Algorithms for dynamic geometric problems over data streams. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC), pages 373--380, 2004. Google Scholar
Digital Library
- Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM, 53(3):307--323, 2006. Google Scholar
Digital Library
- Piotr Indyk and David P. Woodruff. Optimal approximations of the frequency moments of data streams. In STOC, pages 202--208, 2005. Google Scholar
Digital Library
- V. Klee. Can the measure of {{ai; bi} be computed in less than O(n log n) steps? In American Mathematical Monthly, volume 84, pages 284--285, 1977.Google Scholar
- Gabriel M. Kuper, Leonid Libkin, and Jan Paredaens, editors. Constraint Databases. Springer, 2000.Google Scholar
- Iosif Lazaridis and Sharad Mehrotra. Progressive approximate aggregate queries with a multi-resolution tree structure. In SIGMOD, pages 401--412, 2001. Google Scholar
Digital Library
- K. Levchenko and Y.-K Liu. Counting solutions of polynomial equations, 2005. Manuscript.Google Scholar
- S. Muthukrishnan. Data Streams: Algorithms and Applications. Foundations and Trends in Theoretical Computer Science, 1(2):117--236, 2005. Google Scholar
Digital Library
- Suman Nath, Phillip B. Gibbons, Srinivasan Seshan, and Zachary R. Anderson. Synopsis diffusion for robust aggregation in sensor networks. In SENSYS, pages 250--262, 2004. Google Scholar
Digital Library
- Mark Overmars. Geometric data structures for computer graphics: an overview. In Theoretical Foundations of Computer Graphics and CAD, pages 21--49, 1988.Google Scholar
Cross Ref
- Mark H. Overmars and Chee-Keng Yap. New upper bounds in Klee's measure problem. SICOMP, 20(6):1034--1045, 1991. Google Scholar
Digital Library
- Rasmus Pagh. Compressed matrix multiplication. In ICTS, pages 442--451, 2012. Google Scholar
Digital Library
- Dimitris Papadias, Panos Kalnis, Jun Zhang, and Yufei Tao. Efficient OLAP operations in spatial data warehouses. In SSTD, pages 443--459, 2001. Google Scholar
Digital Library
- A. Pavan and Srikanta Tirthapura. Range-efficient counting of distinct elements in a massive data stream. SIAM J. Comput., 37(2):359--379, 2007. Google Scholar
Digital Library
- Florin Rusu and Alin Dobra. Fast range-summable random variables for efficient aggregate estimation. In SIGMOD, pages 193--204, 2006. Google Scholar
Digital Library
- Gokarna Sharma, Costas Busch, and Srikanta Tirthapura. A streaming approximation algorithm for Klee's measure problem. CoRR, abs/1004.1569, 2010.Google Scholar
- Cheng Sheng and Yufei Tao. New results on two-dimensional orthogonal range aggregation in external memory. In PODS, pages 129--139, 2011. Google Scholar
Digital Library
- He Sun and Chung Keung Poon. Two improved range-efficient algorithms for F0 estimation. Theor. Comput. Sci., 410(11):1073--1080, 2009. Google Scholar
Digital Library
- Yufei Tao and Dimitris Papadias. Range aggregate processing in spatial databases. IEEE TKDE, 16(12):1555--1570, 2004. Google Scholar
Digital Library
- Nitin Thaper, Sudipto Guha, Piotr Indyk, and Nick Koudas. Dynamic multidimensional histograms. In SIGMOD, pages 428--429, 2002. Google Scholar
Digital Library
- Jan Vahrenhold. An in-place algorithm for Klee's measure problem in two dimensions. IPL, 102(4):169--174, 2007. Google Scholar
Digital Library
- Donghui Zhang, Alexander Markowetz, Vassilis J. Tsotras, Dimitrios Gunopulos, and Bernhard Seeger. On computing temporal aggregates with range predicates. ACM Trans. Database Syst., 33(2), 2008. Google Scholar
Digital Library
Index Terms
Rectangle-efficient aggregation in spatial data streams
Recommendations
Space-Efficient Estimation of Statistics Over Sub-Sampled Streams
In many stream monitoring situations, the data arrival rate is so high that it is not even possible to observe each element of the stream. The most common solution is to sub-sample the data stream and use the sample to infer properties and estimate ...
Beating CountSketch for heavy hitters in insertion streams
STOC '16: Proceedings of the forty-eighth annual ACM symposium on Theory of ComputingGiven a stream p1, …, pm of items from a universe U, which, without loss of generality we identify with the set of integers {1, 2, …, n}, we consider the problem of returning all ℓ2-heavy hitters, i.e., those items j for which fj ≥ є √F2, where fj is ...
The Value of Multiple Read/Write Streams for Approximating Frequency Moments
We consider the read/write streams model, an extension of the standard data stream model in which an algorithm can create and manipulate multiple read/write streams in addition to its input data stream. Like the data stream model, the most important ...






Comments