ABSTRACT
Querying uncertain data has emerged as an important problem in data management due to the imprecise nature of many measurement data. In this paper we study answering range queries over uncertain data. Specifically, we are given a collection P of n points in R, each represented by its one-dimensional probability density function (pdf). The goal is to build an index on P such that given a query interval I and a probability threshold τ, we can quickly report all points of P that lie in I with probability at least τ. We present various indexing schemes with linear or near-linear space and logarithmic query time. Our schemes support pdf's that are either histograms or more complex ones such as Gaussian or piecewise algebraic. They also extend to the external memory model in which the goal is to minimize the number of disk accesses when querying the index.
- P. Afshani and T.M. Chan, Optimal halfspace range reporting in three dimensions, Proc. 20th Annual ACM-SIAM Symposium on Discrete Algorithms, 2009, pp. 180--186. Google Scholar
Digital Library
- P.K. Agarwal, L. Arge, J. Erickson, P. Franciosa, and J. Vitter, Efficient searching with linear constraints, Journal of Computer and System Sciences, 61 (2000), 194--216. Google Scholar
Digital Library
- P.K. Agarwal and J. Erickson, Geometric range searching and its relatives, in: Advances in Discrete and Computational Geometry (B. Chazelle, J.E. Goodman, and R. Pollack, eds.), American Mathematical Society, Providence, RI, 1999, pp. 1--56.Google Scholar
- P.K. Agarwal and J. Matoušek, Dynamic half-space range reporting and its applications, Algorithmica, 13 (1995), 325--345.Google Scholar
- P.K. Agarwal and M. Sharir, Davenport-Schinzel sequences and their geometric applications, in: Handbook of Computational Geometry (J.-R. Sack and J. Urrutia, eds.), Elsevier Science Publishers, Amsterdam, 2000, pp. 1--47.Google Scholar
Cross Ref
- A. Aggarwal and J.S. Vitter, The input/output complexity of sorting and related problems, Communications of the ACM, 31 (1988), 1116--1127. Google Scholar
Digital Library
- P. Agrawal, O. Benjelloun, A. Das Sarma, C. Hayworth, S. Nabar, T. Sugihara, and J. Widom, Trio: A system for data, uncertainty, and lineage, Proc. International Conference on Very Large Databases, 2006, pp. 1151--1154. Google Scholar
Digital Library
- B. Aronov, S. Har-Peled, and M. Sharir, On approximate halfspace range counting and relative epsilon-approximations, Proc. 23rd Annual Symposium on Computational Geometry, 2007, pp. 327--336. Google Scholar
Digital Library
- J.L. Bentley and J.B. Saxe, Decomposable searching problems I: Static-to-dynamic transformation, Journal of Algorithms, 1 (1980), 301--358.Google Scholar
Cross Ref
- T.M. Chan, Random sampling, halfspace range reporting, and construction of (≤k)-levels in three dimensions, SIAM Journal on Computing, 30 (2000), 561--575. Google Scholar
Digital Library
- B. Chazelle and L.J. Guibas, Fractional cascading: I. A data structuring technique, Algorithmica, 1 (1986), 133--162.Google Scholar
Digital Library
- B. Chazelle, L.J. Guibas, and D.T. Lee, The power of geometric duality, BIT, 25 (1985), 76--90. Google Scholar
Digital Library
- B. Chazelle and B. Rosenberg, Simplex range reporting on a pointer machine, Computational Geometry: Theory and Applications, 5 (1996), 237--247. Google Scholar
Digital Library
- R. Cheng, Y. Xia, S. Prabhakar, R. Shah, and J.S. Vitter, Efficient indexing methods for probabilistic threshold queries over uncertain data, Proc. International Conference on Very Large Databases, 2004, pp. 876--887. Google Scholar
Digital Library
- K.L. Clarkson and P.W. Shor, Applications of random sampling in computational geometry, II, Discrete Computational Geometry, 4 (1989), 387--421.Google Scholar
Digital Library
- N. Dalvi and D. Suciu, Efficient query evaluation on probabilistic databases, Proc. International Conference on Very Large Databases, 2004, pp. 864--875. Google Scholar
Digital Library
- H. Davenport and A. Schinzel, A combinatorial problem connected with differential equations, American Journal of Mathematics, 87 (1965), 684--689.Google Scholar
- S. Hart and M. Sharir, Nonlinearity of Davenport-Schinzel sequences and of generalized path compression schemes, Combinatorica, 6 (1986), 151--177. Google Scholar
Digital Library
- V. Ljosa and A.K. Singh, APLA: Indexing arbitrary probability distributions, Proc. IEEE International Conference on Data Engineering, 2007, pp. 946--955.Google Scholar
Cross Ref
- G. Nivasch, Improved bounds and new techniques for Davenport-Schinzel sequences and their generalizations, Proc. 20th Annual ACM-SIAM Symposium on Discrete Algorithms, 2009, pp. 1--10. Google Scholar
Digital Library
- H. Samet, Foundations of Multidimensional and Metric Data Structures, Morgan Kaufmann, 2006. Google Scholar
Digital Library
- S. Singh, C. Mayfield, S. Prabhakar, R. Shah, and S. Hambrusch, Indexing uncertain categorical data, Proc. IEEE International Conference on Data Engineering, 2007, pp. 616--625.Google Scholar
Cross Ref
- Y. Tao, R. Cheng, X. Xiao, W.K. Ngai, B. Kao, and S. Prabhakar, Indexing multi-dimensional uncertain data with arbitrary probability density functions, Proc. International Conference on Very Large Databases, 2005, pp. 922--933. Google Scholar
Digital Library
- Y. Tao, X. Xiao, and R. Cheng, Range search on multidimensional uncertain data, ACM Transactions on Database Systems, 32 (2007), 15. Google Scholar
Digital Library
- M.L. Yiu, N. Mamoulis, X. Dai, Y. Tao, and M. Vaitis, Efficient evaluation of probabilistic advanced spatial queries on existentially uncertain data, IEEE Transactions on Knowledge and Data Engineering,21 (2009), 108--122. Google Scholar
Digital Library
Index Terms
Indexing uncertain data
Recommendations
Indexing Metric Uncertain Data for Range Queries
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of DataRange queries in metric spaces have applications in many areas such as multimedia retrieval, computational biology, and location-based services, where metric uncertain data exists in different forms, resulting from equipment limitations, high-throughput ...
Indexing metric uncertain data for range queries and range joins
Range queries and range joins in metric spaces have applications in many areas, including GIS, computational biology, and data integration, where metric uncertain data exist in different forms, resulting from circumstances such as equipment limitations, ...
Efficient top-(k,l) range query processing for uncertain data based on multicore architectures
Query processing over uncertain data is very important in many applications due to the existence of uncertainty in real-world data. In this paper, we first elaborate a new and important query in the context of an uncertain database, namely uncertain top-...






Comments