ABSTRACT
We study complexity and approximation of queries in an expressive query language for probabilistic databases. The language studied supports the compositional use of confidence computation. It allows for a wide range of new use cases, such as the computation of conditional probabilities and of selections based on predicates that involve marginal and conditional probabilities. These features have important applications in areas such as data cleaning and the processing of sensor data. We establish techniques for efficiently computing approximate query results and for estimating the error incurred by queries. The central difficulty is due to selection predicates based on approximated values, which may lead to the unreliable selection of tuples. A database may contain certain singularities at which approximation of predicates cannot be achieved; however, the paper presents an algorithm that provides efficient approximation otherwise.
- L. Antova, T. Jansen, C. Koch, and D. Olteanu. "Fast and Simple Relational Processing of Uncertain Data". In Proc. ICDE, 2008. Google Scholar
Digital Library
- L. Antova, C. Koch, and D. Olteanu. "From Complete to Incomplete Information and Back". In Proc. SIGMOD, 2007. Google Scholar
Digital Library
- L. Antova, C. Koch, and D. Olteanu. "Query language support for incomplete information in the MayBMS system". In Proc. VLDB, 2007. Demonstration Paper. Google Scholar
Digital Library
- L. Antova, C. Koch, and D. Olteanu. "World-set Decompositions: Expressiveness and Efficient Algorithms". In Proc. ICDT, 2007. Google Scholar
Digital Library
- O. Benjelloun, A. Das Sarma, A. Halevy, and J. Widom. "ULDBs: Databases with Uncertainty and Lineage". In Proc. VLDB, 2006. Google Scholar
Digital Library
- J. Boulos, N. Dalvi, B. Mandhani, S. Mathur, C. Re, and D. Suciu. MYSTIQ: a system for finding more answers by using probabilities. In Proc. SIGMOD, 2005. Google Scholar
Digital Library
- N. Dalvi and D. Suciu. "Efficient query evaluation on probabilistic databases". In Proc. VLDB, 2004. Google Scholar
Digital Library
- N. Dalvi and D. Suciu. "The dichotomy of conjunctive queries on probabilistic structures". In Proc. PODS, 2007. Google Scholar
Digital Library
- M. de Rougemont. "The Reliability of Queries". In Proc. PODS, pages 286--291, 1995. Google Scholar
Digital Library
- E. Grädel, Y. Gurevich, and C. Hirsch. "The Complexity of Query Reliability". In Proc. PODS, pages 227--234, 1998. Google Scholar
Digital Library
- J. Y. Halpern. Reasoning about Uncertainty. MIT Press, 2003. Google Scholar
Digital Library
- J. M. Hellerstein, P. J. Haas, and H. J. Wang. "Online Aggregation". In Proc. SIGMOD, pages 171--182, 1997. Google Scholar
Digital Library
- C. M. Jermaine, S. Arumugam, A. Pol, and A. Dobra. "Scalable approximate query processing with the DBO engine". In Proc. SIGMOD, pages 725--736, 2007. Google Scholar
Digital Library
- R. M. Karp and M. Luby. "Monte-Carlo Algorithms for Enumeration and Reliability Problems". In Proc. FOCS, pages 56--64, 1983. Google Scholar
Digital Library
- M. Mitzenmacher and E. Upfal. Probability and Computing. Cambridge University Press, 2005.Google Scholar
Digital Library
- C. Re, N. Dalvi, and D. Suciu. Efficient top-k query evaluation on probabilistic data. In Proc. ICDE, 2007.Google Scholar
Cross Ref
- P. Sen and A. Deshpande. "Representing and Querying Correlated Tuples in Probabilistic Databases". In Proc. ICDE, pages 596--605, 2007.Google Scholar
Cross Ref
- Stanford Trio Project. "TriQL -- The Trio Query Language", 2006. http://infolab.stanford.edu/~widom/triql.html.Google Scholar
- M. Y. Vardi. "The Complexity of Relational Query Languages". In Proc. STOC, pages 137--146, 1982. Google Scholar
Digital Library
Index Terms
Approximating predicates and expressive queries on probabilistic databases
Recommendations
Top-k best probability queries and semantics ranking properties on probabilistic databases
There has been much interest in answering top-k queries on probabilistic data in various applications such as market analysis, personalized services, and decision making. In probabilistic relational databases, the most common problem in answering top-k ...
Top-k best probability queries on probabilistic data
DASFAA'12: Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part IIThere has been much interest in answering top-k queries on probabilistic data in various applications such as market analysis, personalised services, and decision making. In relation to probabilistic data, the most common problem in answering top-k ...
Approximating expressive queries on graph-modeled data
We present GeX for the approximate matching of complex queries on graph-modeled data.GeX generalizes existing approaches and allows for querying any graph-based datasets.GeX query language supports queries ranging from keyword-based to complex ones.GeX ...






Comments