skip to main content
10.1145/1989284.1989303acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

On provenance minimization

Authors Info & Claims
Published:13 June 2011Publication History

ABSTRACT

Provenance information has been proved to be very effective in capturing the computational process performed by queries, and has been used extensively as the input to many advanced data management tools (e.g. view maintenance, trust assessment, or query answering in probabilistic databases). We study here the core of provenance information, namely the part of provenance that appears in the computation of every query equivalent to the given one. This provenance core is informative as it describes the part of the computational process that is inherent to the query. It is also useful as a compact input to the above mentioned data management tools. We study algorithms that, given a query, compute an equivalent query that realizes the core provenance for all tuples in its result. We study these algorithms for queries of varying expressive power. Finally, we observe that, in general, one would not want to require database systems to evaluate a specific query that realizes the core provenance, but instead to be able to find, possibly off-line, the core provenance of a given tuple in the output (computed by an arbitrary equivalent query), without rewriting the query. We provide algorithms for such direct computation of the core provenance.

References

  1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. F. N. Afrati, C. Li, and P. Mitra. On containment of conjunctive queries with arithmetic comparisons. In EDBT, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  3. M. Arenas, P. Barceló, L. Libkin, and F. Murlak. Relational and XML Data Exchange. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. O. Benjelloun, A.D. Sarma, A.Y. Halevy, M. Theobald, and J. Widom. Databases with uncertainty and lineage. VLDB J., 17:243--264, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Deepavali Bhagwat, Laura Chiticariu, Wang-Chiew Tan, and Gaurav Vijayvargiya. An annotation management system for relational databases. In VLDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Buneman, J. Cheney, and S. Vansummeren. On the expressiveness of implicit provenance in query and update languages. ACM Trans. Database Syst., 33(4), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Buneman, S. Khanna, and W.C. Tan. Why and where: A characterization of data provenance. In ICDT, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Calvanese, G. De Giacomo, and M. Lenzerini. On the decidability of query containment under constraints. In PODS, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. K. Chandra and P. M. Merlin. Optimal implementation of conjunctive queries in relational data bases. In STOC, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Chapman, H. V. Jagadish, and P. Ramanan. Efficient provenance storage. In SIGMOD Conference, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Chekuri and A. Rajaraman. Conjunctive query containment revisited. In ICDT, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Cheney, S. Chong, N. Foster, M. I. Seltzer, and S. Vansummeren. Provenance: a future history. In Proc. of OOPSLA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Cohen. Equivalence of queries combining set and bag-set semantics. In PODS, pages 70--79, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Cohen, W. Nutt, and Y. Sagiv. Rewriting queries with arbitrary aggregation functions using views. ACM Trans. Database Syst., 31(2):672--715, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. On reconciling data exchange, data integration, and peer data management. In PODS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Fagin, P.G. Kolaitis, and L. Popa. Data exchange: getting to the core. ACM Trans. Database Syst., 30:174--210, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Gottlob, 2010. private communication.Google ScholarGoogle Scholar
  18. T. J. Green, G. Karvounarakis, Z. Ives, and V. Tannen. Update exchange with mappings and provenance. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In Proc. of PODS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T.J. Green. Containment of conjunctive queries on annotated relations. In ICDT, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. Karvounarakis and V. Tannen. Conjunctive queries and mappings with unequalities. Technical report, 2008.Google ScholarGoogle Scholar
  22. A. Klug. On conjunctive queries containing inequalities. J. ACM, 35(1), 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. W. Kuich. Semirings and formal power series. In Handbook of Formal Languages, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. Libkin and C. Sirangelo. Open and closed world assumptions in data exchange. In Description Logics, 2009.Google ScholarGoogle Scholar
  25. A. Meliou, W. Gatterbauer, K. F. Moore, and D. Suciu. The complexity of causality and responsibility for query answers and non-answers. PVLDB, 4(1), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Sagiv and M. Yannakakis. Equivalences among relational expressions with the union and difference operators. J. ACM, 27(4):633--655, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Y. L. Simmhan, B. Plale, and D. Gannon. A survey of data provenance in e-science. SIGMOD Rec., 34, September 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. I. Tatarinov and A. Halevy. Efficient query reformulation in peer data management systems. In SIGMOD, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Vansummeren and J. Cheney. Recording provenance for sql queries and updates. IEEE Data Eng. Bull., 30(4):29--37, 2007.Google ScholarGoogle Scholar
  30. W. Zhou, M. Sherr, T. Tao, X. Li, B. T. Loo, and Y. Mao. Efficient querying and maintenance of network provenance at internet-scale. In SIGMOD Conference, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. On provenance minimization

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PODS '11: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
      June 2011
      332 pages
      ISBN:9781450306607
      DOI:10.1145/1989284

      Copyright © 2011 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 June 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate476of1,835submissions,26%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!