skip to main content
10.1145/1559795.1559835acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Consensus answers for queries over probabilistic databases

Published:29 June 2009Publication History

ABSTRACT

We address the problem of finding a "best" deterministic query answer to a query over a probabilistic database. For this purpose, we propose the notion of a consensus world (or a consensus answer) which is a deterministic world (answer) that minimizes the expected distance to the possible worlds (answers). This problem can be seen as a generalization of the well-studied inconsistent information aggregation problems (e.g. rank aggregation) to probabilistic databases. We consider this problem for various types of queries including SPJ queries, Top-k ranking queries, group-by aggregate queries, and clustering. For different distance metrics, we obtain polynomial time optimal or approximation algorithms for computing the consensus answers (or prove NP-hardness). Most of our results are for a general probabilistic database model, called and/xor tree model, which significantly generalizes previous probabilistic database models like x-tuples and block-independent disjoint models, and is of independent interest.

References

  1. Nir Ailon. Aggregation of partial rankings, p-ratings and top-m lists. In SODA, pages 415--424, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: Ranking and clustering. In J.ACM, volume 55(5), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Periklis Andritsos, Ariel Fuxman, and Renee J. Miller. Clean answers over dirty databases. In ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Lyublena Antova, Christoph Koch, and Dan Olteanu. From complete to incomplete information and back. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B.H. Garcia-Molina, and D. Porter. The management of probabilistic data. IEEE TKDE, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. George Beskales, Mohamed A. Soliman, and Ihab F. Ilyas. Efficient search for the top-k probable nearest neighbors in uncertain databases. In VLDB, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Reynold Cheng, Dmitri Kalashnikov, and Sunil Prabhakar. Evaluating probabilistic queries over imprecise data. In SIGMOD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Graham Cormode, Feifei Li, and Ke Yi. Semantics of ranking queries for probabilistic data and expected ranks. In ICDE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Graham Cormode and Andrew McGregor. Approximation algorithms for clustering uncertain data. In PODS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Nilesh Dalvi and Dan Suciu. Efficient query evaluation on probabilistic databases. In VLDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Nilesh Dalvi and Dan Suciu. Management of probabilistic data: Foundations and challenges. In PODS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Amol Deshpande, Carlos Guestrin, Sam Madden, Joseph M. Hellerstein, and Wei Hong. Model-driven data acquisition in sensor networks. In VLDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In Proceedings of the Tenth International Conference on the World Wide Web (WWW), pages 613--622, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation revistied. In Manuscript, 2001.Google ScholarGoogle Scholar
  15. Ronald Fagin, Ravi Kumar, and D. Sivakumar. Comparing top k lists. SIAM J. Discrete Mathematics, 17(1):134--160, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. Fuhr and T. Rolleke. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. on Info. Syst., 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Minos Garofalakis and Dan Suciu, editors. IEEE Data Engineering Bulletin Special Issue on Probabilistic Data Management. March 2006.Google ScholarGoogle Scholar
  18. Todd Green, Grigoris Karvounarakis, and Val Tannen. Provenance semirings. In PODS, pages 31--40, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Todd Green and Val Tannen. Models for incomplete and probabilistic information. In EDBT, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Rahul Gupta and Sunita Sarawagi. Creating probabilistic databases from information extraction models. In VLDB, Seoul, Korea, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Hua, J. Pei, W. Zhang, and X. Lin. Efficiently answering probabilistic threshold top-k queries on uncertain data. In ICDE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Hua, J. Pei, W. Zhang, and X. Lin. Ranking queries on uncertain data: A probabilistic threshold approach. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Imielinski and W. Lipski, Jr. Incomplete information in relational databases. Journal of the ACM, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T.S. Jayram, Andrew McGregor, S. Muthukrishnan, and Erik Vee. Estimating statistical aggregates on probabilistic data streams. In PODS, pages 243--252, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J.C. Borda. Mémoire sur les élections au scrutin. Histoire de l'Acad'emie Royale des Sciences, 1781.Google ScholarGoogle Scholar
  26. J.G. Kemeny. Mathematics without numbers. Daedalus, 88:571--591, 1959.Google ScholarGoogle Scholar
  27. J. Hodge and R.E. Klima. The mathematics of voting and elections: a hands-on approach. AMS, 2000.Google ScholarGoogle Scholar
  28. L. Lakshmanan, N. Leone, R. Ross, and V.S. Subrahmanian. Probview: a flexible probabilistic database system. ACM Trans. on DB Syst., 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jian Li, Barna Saha, and Amol Deshpande. Ranking and clustering in probabilistic databases. http://www.cs.umd.edu/~lijian/paper/clusterrank_tr.pdf, 2008. Unpublished manuscript.Google ScholarGoogle Scholar
  30. Silvio Micali and Vijay V. Vazirani. An o(sqrt(|v|) |e|) algorithm for finding maximum matching in general graphs. In FOCS '80: Proceedings of the 21th Annual Symposium on Foundations of Computer Science, pages 17--27, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M.J. Condorcet. Éssai sur l'application de l'analyse à la probabilité des décisions rendues à la pluralité des voix. 1785.Google ScholarGoogle Scholar
  32. Christopher Re, Nilesh Dalvi, and Dan Suciu. Efficient top-k query evaluation on probabilistic data. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  33. Christopher Re and Dan Suciu. Materialized views in probabilistic databases for information exchange and query optimization. In VLDB, Vienna, Austria, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Sarma, O. Benjelloun, A. Halevy, and J. Widom. Working models for uncertain data. In ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Prithviraj Sen and Amol Deshpande. Representing and querying correlated tuples in probabilistic databases. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  36. Prithviraj Sen, Amol Deshpande, and Lise Getoor. Exploiting shared correlations in probabilistic databases. In VLDB, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Soliman, I. Ilyas, and K. C. Chang. Top-k query processing in uncertain databases. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  38. Christopher Réand Dan Suciu. Efficient evaluation of having queries on a probabilistic database. In DBPL, 2007.Google ScholarGoogle Scholar
  39. Daisy Zhe Wang, Eirinaios Michelakis, Minos Garofalakis, and Joseph M. Hellerstein. BayesStore: Managing large, uncertain data repositories with probabilistic graphical models. In VLDB, Auckland, New Zealand, 2008.Google ScholarGoogle Scholar
  40. Ke Yi, Feifei Li, Divesh Srivastava, and George Kollios. Efficient processing of top-k queries in uncertain databases. In ICDE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Y. Wakabayashi. The complexity of computing medians of relations. In Resenhas, volume 3(3), pages 323--349, 1998.Google ScholarGoogle Scholar
  42. Xi Zhang and Jan Chomicki. On the semantics and evaluation of top-k queries in probabilistic databases. In DBRank, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Consensus answers for queries over probabilistic databases

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PODS '09: Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
        June 2009
        298 pages
        ISBN:9781605585536
        DOI:10.1145/1559795
        • General Chair:
        • Jan Paredaens,
        • Program Chair:
        • Jianwen Su

        Copyright © 2009 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 29 June 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate476of1,835submissions,26%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!