ABSTRACT
We address the problem of finding a "best" deterministic query answer to a query over a probabilistic database. For this purpose, we propose the notion of a consensus world (or a consensus answer) which is a deterministic world (answer) that minimizes the expected distance to the possible worlds (answers). This problem can be seen as a generalization of the well-studied inconsistent information aggregation problems (e.g. rank aggregation) to probabilistic databases. We consider this problem for various types of queries including SPJ queries, Top-k ranking queries, group-by aggregate queries, and clustering. For different distance metrics, we obtain polynomial time optimal or approximation algorithms for computing the consensus answers (or prove NP-hardness). Most of our results are for a general probabilistic database model, called and/xor tree model, which significantly generalizes previous probabilistic database models like x-tuples and block-independent disjoint models, and is of independent interest.
- Nir Ailon. Aggregation of partial rankings, p-ratings and top-m lists. In SODA, pages 415--424, 2007. Google Scholar
Digital Library
- Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: Ranking and clustering. In J.ACM, volume 55(5), 2008. Google Scholar
Digital Library
- Periklis Andritsos, Ariel Fuxman, and Renee J. Miller. Clean answers over dirty databases. In ICDE, 2006. Google Scholar
Digital Library
- Lyublena Antova, Christoph Koch, and Dan Olteanu. From complete to incomplete information and back. In SIGMOD, 2007. Google Scholar
Digital Library
- B.H. Garcia-Molina, and D. Porter. The management of probabilistic data. IEEE TKDE, 1992. Google Scholar
Digital Library
- George Beskales, Mohamed A. Soliman, and Ihab F. Ilyas. Efficient search for the top-k probable nearest neighbors in uncertain databases. In VLDB, 2008. Google Scholar
Digital Library
- Reynold Cheng, Dmitri Kalashnikov, and Sunil Prabhakar. Evaluating probabilistic queries over imprecise data. In SIGMOD, 2003. Google Scholar
Digital Library
- Graham Cormode, Feifei Li, and Ke Yi. Semantics of ranking queries for probabilistic data and expected ranks. In ICDE, 2009. Google Scholar
Digital Library
- Graham Cormode and Andrew McGregor. Approximation algorithms for clustering uncertain data. In PODS, 2008. Google Scholar
Digital Library
- Nilesh Dalvi and Dan Suciu. Efficient query evaluation on probabilistic databases. In VLDB, 2004. Google Scholar
Digital Library
- Nilesh Dalvi and Dan Suciu. Management of probabilistic data: Foundations and challenges. In PODS, 2007. Google Scholar
Digital Library
- Amol Deshpande, Carlos Guestrin, Sam Madden, Joseph M. Hellerstein, and Wei Hong. Model-driven data acquisition in sensor networks. In VLDB, 2004. Google Scholar
Digital Library
- C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In Proceedings of the Tenth International Conference on the World Wide Web (WWW), pages 613--622, 2001. Google Scholar
Digital Library
- C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation revistied. In Manuscript, 2001.Google Scholar
- Ronald Fagin, Ravi Kumar, and D. Sivakumar. Comparing top k lists. SIAM J. Discrete Mathematics, 17(1):134--160, 2003. Google Scholar
Digital Library
- N. Fuhr and T. Rolleke. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. on Info. Syst., 1997. Google Scholar
Digital Library
- Minos Garofalakis and Dan Suciu, editors. IEEE Data Engineering Bulletin Special Issue on Probabilistic Data Management. March 2006.Google Scholar
- Todd Green, Grigoris Karvounarakis, and Val Tannen. Provenance semirings. In PODS, pages 31--40, 2007. Google Scholar
Digital Library
- Todd Green and Val Tannen. Models for incomplete and probabilistic information. In EDBT, 2006. Google Scholar
Digital Library
- Rahul Gupta and Sunita Sarawagi. Creating probabilistic databases from information extraction models. In VLDB, Seoul, Korea, 2006. Google Scholar
Digital Library
- M. Hua, J. Pei, W. Zhang, and X. Lin. Efficiently answering probabilistic threshold top-k queries on uncertain data. In ICDE, 2008. Google Scholar
Digital Library
- M. Hua, J. Pei, W. Zhang, and X. Lin. Ranking queries on uncertain data: A probabilistic threshold approach. In SIGMOD, 2008. Google Scholar
Digital Library
- T. Imielinski and W. Lipski, Jr. Incomplete information in relational databases. Journal of the ACM, 1984. Google Scholar
Digital Library
- T.S. Jayram, Andrew McGregor, S. Muthukrishnan, and Erik Vee. Estimating statistical aggregates on probabilistic data streams. In PODS, pages 243--252, 2007. Google Scholar
Digital Library
- J.C. Borda. Mémoire sur les élections au scrutin. Histoire de l'Acad'emie Royale des Sciences, 1781.Google Scholar
- J.G. Kemeny. Mathematics without numbers. Daedalus, 88:571--591, 1959.Google Scholar
- J. Hodge and R.E. Klima. The mathematics of voting and elections: a hands-on approach. AMS, 2000.Google Scholar
- L. Lakshmanan, N. Leone, R. Ross, and V.S. Subrahmanian. Probview: a flexible probabilistic database system. ACM Trans. on DB Syst., 1997. Google Scholar
Digital Library
- Jian Li, Barna Saha, and Amol Deshpande. Ranking and clustering in probabilistic databases. http://www.cs.umd.edu/~lijian/paper/clusterrank_tr.pdf, 2008. Unpublished manuscript.Google Scholar
- Silvio Micali and Vijay V. Vazirani. An o(sqrt(|v|) |e|) algorithm for finding maximum matching in general graphs. In FOCS '80: Proceedings of the 21th Annual Symposium on Foundations of Computer Science, pages 17--27, 1980. Google Scholar
Digital Library
- M.J. Condorcet. Éssai sur l'application de l'analyse à la probabilité des décisions rendues à la pluralité des voix. 1785.Google Scholar
- Christopher Re, Nilesh Dalvi, and Dan Suciu. Efficient top-k query evaluation on probabilistic data. In ICDE, 2007.Google Scholar
Cross Ref
- Christopher Re and Dan Suciu. Materialized views in probabilistic databases for information exchange and query optimization. In VLDB, Vienna, Austria, 2007. Google Scholar
Digital Library
- A. Sarma, O. Benjelloun, A. Halevy, and J. Widom. Working models for uncertain data. In ICDE, 2006. Google Scholar
Digital Library
- Prithviraj Sen and Amol Deshpande. Representing and querying correlated tuples in probabilistic databases. In ICDE, 2007.Google Scholar
Cross Ref
- Prithviraj Sen, Amol Deshpande, and Lise Getoor. Exploiting shared correlations in probabilistic databases. In VLDB, 2008. Google Scholar
Digital Library
- M. Soliman, I. Ilyas, and K. C. Chang. Top-k query processing in uncertain databases. In ICDE, 2007.Google Scholar
Cross Ref
- Christopher Réand Dan Suciu. Efficient evaluation of having queries on a probabilistic database. In DBPL, 2007.Google Scholar
- Daisy Zhe Wang, Eirinaios Michelakis, Minos Garofalakis, and Joseph M. Hellerstein. BayesStore: Managing large, uncertain data repositories with probabilistic graphical models. In VLDB, Auckland, New Zealand, 2008.Google Scholar
- Ke Yi, Feifei Li, Divesh Srivastava, and George Kollios. Efficient processing of top-k queries in uncertain databases. In ICDE, 2008. Google Scholar
Digital Library
- Y. Wakabayashi. The complexity of computing medians of relations. In Resenhas, volume 3(3), pages 323--349, 1998.Google Scholar
- Xi Zhang and Jan Chomicki. On the semantics and evaluation of top-k queries in probabilistic databases. In DBRank, 2008. Google Scholar
Digital Library
Index Terms
Consensus answers for queries over probabilistic databases
Recommendations
Sensitivity analysis and explanations for robust query evaluation in probabilistic databases
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of dataProbabilistic database systems have successfully established themselves as a tool for managing uncertain data. However, much of the research in this area has focused on efficient query evaluation and has largely ignored two key issues that commonly ...
Top-k best probability queries and semantics ranking properties on probabilistic databases
There has been much interest in answering top-k queries on probabilistic data in various applications such as market analysis, personalized services, and decision making. In probabilistic relational databases, the most common problem in answering top-k ...
Top-k best probability queries on probabilistic data
DASFAA'12: Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part IIThere has been much interest in answering top-k queries on probabilistic data in various applications such as market analysis, personalised services, and decision making. In relation to probabilistic data, the most common problem in answering top-k ...






Comments