ABSTRACT
Provenance information has been proved to be very effective in capturing the computational process performed by queries, and has been used extensively as the input to many advanced data management tools (e.g. view maintenance, trust assessment, or query answering in probabilistic databases). We study here the core of provenance information, namely the part of provenance that appears in the computation of every query equivalent to the given one. This provenance core is informative as it describes the part of the computational process that is inherent to the query. It is also useful as a compact input to the above mentioned data management tools. We study algorithms that, given a query, compute an equivalent query that realizes the core provenance for all tuples in its result. We study these algorithms for queries of varying expressive power. Finally, we observe that, in general, one would not want to require database systems to evaluate a specific query that realizes the core provenance, but instead to be able to find, possibly off-line, the core provenance of a given tuple in the output (computed by an arbitrary equivalent query), without rewriting the query. We provide algorithms for such direct computation of the core provenance.
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google Scholar
Digital Library
- F. N. Afrati, C. Li, and P. Mitra. On containment of conjunctive queries with arithmetic comparisons. In EDBT, 2004.Google Scholar
Cross Ref
- M. Arenas, P. Barceló, L. Libkin, and F. Murlak. Relational and XML Data Exchange. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2010. Google Scholar
Digital Library
- O. Benjelloun, A.D. Sarma, A.Y. Halevy, M. Theobald, and J. Widom. Databases with uncertainty and lineage. VLDB J., 17:243--264, 2008. Google Scholar
Digital Library
- Deepavali Bhagwat, Laura Chiticariu, Wang-Chiew Tan, and Gaurav Vijayvargiya. An annotation management system for relational databases. In VLDB, 2004. Google Scholar
Digital Library
- P. Buneman, J. Cheney, and S. Vansummeren. On the expressiveness of implicit provenance in query and update languages. ACM Trans. Database Syst., 33(4), 2008. Google Scholar
Digital Library
- P. Buneman, S. Khanna, and W.C. Tan. Why and where: A characterization of data provenance. In ICDT, 2001. Google Scholar
Digital Library
- D. Calvanese, G. De Giacomo, and M. Lenzerini. On the decidability of query containment under constraints. In PODS, 1998. Google Scholar
Digital Library
- A. K. Chandra and P. M. Merlin. Optimal implementation of conjunctive queries in relational data bases. In STOC, 1977. Google Scholar
Digital Library
- A. Chapman, H. V. Jagadish, and P. Ramanan. Efficient provenance storage. In SIGMOD Conference, 2008. Google Scholar
Digital Library
- C. Chekuri and A. Rajaraman. Conjunctive query containment revisited. In ICDT, 1997. Google Scholar
Digital Library
- J. Cheney, S. Chong, N. Foster, M. I. Seltzer, and S. Vansummeren. Provenance: a future history. In Proc. of OOPSLA, 2009. Google Scholar
Digital Library
- S. Cohen. Equivalence of queries combining set and bag-set semantics. In PODS, pages 70--79, 2006. Google Scholar
Digital Library
- S. Cohen, W. Nutt, and Y. Sagiv. Rewriting queries with arbitrary aggregation functions using views. ACM Trans. Database Syst., 31(2):672--715, 2006. Google Scholar
Digital Library
- G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. On reconciling data exchange, data integration, and peer data management. In PODS, 2007. Google Scholar
Digital Library
- R. Fagin, P.G. Kolaitis, and L. Popa. Data exchange: getting to the core. ACM Trans. Database Syst., 30:174--210, 2005. Google Scholar
Digital Library
- G. Gottlob, 2010. private communication.Google Scholar
- T. J. Green, G. Karvounarakis, Z. Ives, and V. Tannen. Update exchange with mappings and provenance. In VLDB, 2007. Google Scholar
Digital Library
- T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In Proc. of PODS, 2007. Google Scholar
Digital Library
- T.J. Green. Containment of conjunctive queries on annotated relations. In ICDT, 2009. Google Scholar
Digital Library
- G. Karvounarakis and V. Tannen. Conjunctive queries and mappings with unequalities. Technical report, 2008.Google Scholar
- A. Klug. On conjunctive queries containing inequalities. J. ACM, 35(1), 1988. Google Scholar
Digital Library
- W. Kuich. Semirings and formal power series. In Handbook of Formal Languages, 1997.Google Scholar
Digital Library
- L. Libkin and C. Sirangelo. Open and closed world assumptions in data exchange. In Description Logics, 2009.Google Scholar
- A. Meliou, W. Gatterbauer, K. F. Moore, and D. Suciu. The complexity of causality and responsibility for query answers and non-answers. PVLDB, 4(1), 2010. Google Scholar
Digital Library
- Y. Sagiv and M. Yannakakis. Equivalences among relational expressions with the union and difference operators. J. ACM, 27(4):633--655, 1980. Google Scholar
Digital Library
- Y. L. Simmhan, B. Plale, and D. Gannon. A survey of data provenance in e-science. SIGMOD Rec., 34, September 2005. Google Scholar
Digital Library
- I. Tatarinov and A. Halevy. Efficient query reformulation in peer data management systems. In SIGMOD, 2004. Google Scholar
Digital Library
- S. Vansummeren and J. Cheney. Recording provenance for sql queries and updates. IEEE Data Eng. Bull., 30(4):29--37, 2007.Google Scholar
- W. Zhou, M. Sherr, T. Tao, X. Li, B. T. Loo, and Y. Mao. Efficient querying and maintenance of network provenance at internet-scale. In SIGMOD Conference, 2010. Google Scholar
Digital Library
Index Terms
On provenance minimization
Recommendations
On Provenance Minimization
Provenance information has been proved to be very effective in capturing the computational process performed by queries, and has been used extensively as the input to many advanced data management tools (e.g., view maintenance, trust assessment, or ...
Minimization of tree pattern queries with constraints
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of dataTree pattern queries (TPQs) provide a natural and easy formalism to query tree-structured XML data, and the efficient processing of such queries has attracted a lot of attention. Since the size of a TPQ is a key determinant of its evaluation cost, ...
On the minimization of XPath queries
XPath expressions define navigational queries on XML data and are issued on XML documents to select sets of element nodes. Due to the wide use of XPath, which is embedded into several languages for querying and manipulating XML data, the problem of ...






Comments