skip to main content
10.1145/1559795.1559816acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Indexing uncertain data

Published:29 June 2009Publication History

ABSTRACT

Querying uncertain data has emerged as an important problem in data management due to the imprecise nature of many measurement data. In this paper we study answering range queries over uncertain data. Specifically, we are given a collection P of n points in R, each represented by its one-dimensional probability density function (pdf). The goal is to build an index on P such that given a query interval I and a probability threshold τ, we can quickly report all points of P that lie in I with probability at least τ. We present various indexing schemes with linear or near-linear space and logarithmic query time. Our schemes support pdf's that are either histograms or more complex ones such as Gaussian or piecewise algebraic. They also extend to the external memory model in which the goal is to minimize the number of disk accesses when querying the index.

References

  1. P. Afshani and T.M. Chan, Optimal halfspace range reporting in three dimensions, Proc. 20th Annual ACM-SIAM Symposium on Discrete Algorithms, 2009, pp. 180--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P.K. Agarwal, L. Arge, J. Erickson, P. Franciosa, and J. Vitter, Efficient searching with linear constraints, Journal of Computer and System Sciences, 61 (2000), 194--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P.K. Agarwal and J. Erickson, Geometric range searching and its relatives, in: Advances in Discrete and Computational Geometry (B. Chazelle, J.E. Goodman, and R. Pollack, eds.), American Mathematical Society, Providence, RI, 1999, pp. 1--56.Google ScholarGoogle Scholar
  4. P.K. Agarwal and J. Matoušek, Dynamic half-space range reporting and its applications, Algorithmica, 13 (1995), 325--345.Google ScholarGoogle Scholar
  5. P.K. Agarwal and M. Sharir, Davenport-Schinzel sequences and their geometric applications, in: Handbook of Computational Geometry (J.-R. Sack and J. Urrutia, eds.), Elsevier Science Publishers, Amsterdam, 2000, pp. 1--47.Google ScholarGoogle ScholarCross RefCross Ref
  6. A. Aggarwal and J.S. Vitter, The input/output complexity of sorting and related problems, Communications of the ACM, 31 (1988), 1116--1127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Agrawal, O. Benjelloun, A. Das Sarma, C. Hayworth, S. Nabar, T. Sugihara, and J. Widom, Trio: A system for data, uncertainty, and lineage, Proc. International Conference on Very Large Databases, 2006, pp. 1151--1154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Aronov, S. Har-Peled, and M. Sharir, On approximate halfspace range counting and relative epsilon-approximations, Proc. 23rd Annual Symposium on Computational Geometry, 2007, pp. 327--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J.L. Bentley and J.B. Saxe, Decomposable searching problems I: Static-to-dynamic transformation, Journal of Algorithms, 1 (1980), 301--358.Google ScholarGoogle ScholarCross RefCross Ref
  10. T.M. Chan, Random sampling, halfspace range reporting, and construction of (≤k)-levels in three dimensions, SIAM Journal on Computing, 30 (2000), 561--575. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. Chazelle and L.J. Guibas, Fractional cascading: I. A data structuring technique, Algorithmica, 1 (1986), 133--162.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Chazelle, L.J. Guibas, and D.T. Lee, The power of geometric duality, BIT, 25 (1985), 76--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. Chazelle and B. Rosenberg, Simplex range reporting on a pointer machine, Computational Geometry: Theory and Applications, 5 (1996), 237--247. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Cheng, Y. Xia, S. Prabhakar, R. Shah, and J.S. Vitter, Efficient indexing methods for probabilistic threshold queries over uncertain data, Proc. International Conference on Very Large Databases, 2004, pp. 876--887. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K.L. Clarkson and P.W. Shor, Applications of random sampling in computational geometry, II, Discrete Computational Geometry, 4 (1989), 387--421.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. Dalvi and D. Suciu, Efficient query evaluation on probabilistic databases, Proc. International Conference on Very Large Databases, 2004, pp. 864--875. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. Davenport and A. Schinzel, A combinatorial problem connected with differential equations, American Journal of Mathematics, 87 (1965), 684--689.Google ScholarGoogle Scholar
  18. S. Hart and M. Sharir, Nonlinearity of Davenport-Schinzel sequences and of generalized path compression schemes, Combinatorica, 6 (1986), 151--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. V. Ljosa and A.K. Singh, APLA: Indexing arbitrary probability distributions, Proc. IEEE International Conference on Data Engineering, 2007, pp. 946--955.Google ScholarGoogle ScholarCross RefCross Ref
  20. G. Nivasch, Improved bounds and new techniques for Davenport-Schinzel sequences and their generalizations, Proc. 20th Annual ACM-SIAM Symposium on Discrete Algorithms, 2009, pp. 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Samet, Foundations of Multidimensional and Metric Data Structures, Morgan Kaufmann, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Singh, C. Mayfield, S. Prabhakar, R. Shah, and S. Hambrusch, Indexing uncertain categorical data, Proc. IEEE International Conference on Data Engineering, 2007, pp. 616--625.Google ScholarGoogle ScholarCross RefCross Ref
  23. Y. Tao, R. Cheng, X. Xiao, W.K. Ngai, B. Kao, and S. Prabhakar, Indexing multi-dimensional uncertain data with arbitrary probability density functions, Proc. International Conference on Very Large Databases, 2005, pp. 922--933. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Tao, X. Xiao, and R. Cheng, Range search on multidimensional uncertain data, ACM Transactions on Database Systems, 32 (2007), 15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M.L. Yiu, N. Mamoulis, X. Dai, Y. Tao, and M. Vaitis, Efficient evaluation of probabilistic advanced spatial queries on existentially uncertain data, IEEE Transactions on Knowledge and Data Engineering,21 (2009), 108--122. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Indexing uncertain data

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PODS '09: Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
          June 2009
          298 pages
          ISBN:9781605585536
          DOI:10.1145/1559795
          • General Chair:
          • Jan Paredaens,
          • Program Chair:
          • Jianwen Su

          Copyright © 2009 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 29 June 2009

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate476of1,835submissions,26%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!