ABSTRACT
We study in this paper provenance information for queries with aggregation. Provenance information was studied in the context of various query languages that do not allow for aggregation, and recent work has suggested to capture provenance by annotating the different database tuples with elements of a commutative semiring and propagating the annotations through query evaluation. We show that aggregate queries pose novel challenges rendering this approach inapplicable. Consequently, we propose a new approach, where we annotate with provenance information not just tuples but also the individual values within tuples, using provenance to describe the values computation. We realize this approach in a concrete construction, first for "simple" queries where the aggregation operator is the last one applied, and then for arbitrary (positive) relational algebra queries with aggregation; the latter queries are shown to be more challenging in this context. Finally, we use aggregation to encode queries with difference, and study the semantics obtained for such queries on provenance annotated databases.
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google Scholar
Digital Library
- F. N. Afrati and A. Vasilakopoulos. Managing lineage and uncertainty under a data exchange setting. In SUM, 2010. Google Scholar
Digital Library
- L. Antova, C. Koch, and D. Olteanu. 10(106) worlds and beyond: efficient representation and processing of incomplete information. VLDB J., 18(5), 2009. Google Scholar
Digital Library
- O. Benjelloun, A.D. Sarma, A.Y. Halevy, M. Theobald, and J. Widom. Databases with uncertainty and lineage. VLDB J., 17, 2008. Google Scholar
Digital Library
- P. Buneman, J. Cheney, and S. Vansummeren. On the expressiveness of implicit provenance in query and update languages. In ICDT, 2007. Google Scholar
Digital Library
- P. Buneman, J. Cheney, and S. Vansummeren. On the expressiveness of implicit provenance in query and update languages. ACM Trans. Database Syst., 33(4), 2008. Google Scholar
Digital Library
- P. Buneman, S. Khanna, and W.C. Tan. Why and where: A characterization of data provenance. In ICDT, 2001. Google Scholar
Digital Library
- P. Buneman, S. Naqvi, V. Tannen, and L. Wong. Principles of programming with complex objects and collection types. TCS, 149(1), 1995. Google Scholar
Digital Library
- J. Cheney, L. Chiticariu, and W. C. Tan. Provenance in databases: Why, how, and where. Foundations and Trends in Databases, 1(4), 2009. Google Scholar
Digital Library
- J. Cheney, S. Chong, N. Foster, M. I. Seltzer, and S. Vansummeren. Provenance: a future history. In OOPSLA Companion, 2009. Google Scholar
Digital Library
- S. Cohen. Containment of aggregate queries. SIGMOD Record, 34(1), 2005. Google Scholar
Digital Library
- S. Cohen, W. Nutt, and Y. Sagiv. Rewriting queries with arbitrary aggregation functions using views. ACM Trans. Database Syst., 31(2), 2006. Google Scholar
Digital Library
- Y. Cui, J. Widom, and J.L. Wiener. Tracing the lineage of view data in a warehousing environment. ACM Transactions on Database Systems, 25(2), 2000. Google Scholar
Digital Library
- N. N. Dalvi, C. Ré, and D. Suciu. Probabilistic databases: diamonds in the dirt. Commun. ACM, 52(7), 2009. Google Scholar
Digital Library
- J.N. Foster, T.J. Green, and V. Tannen. Annotated XML: queries and provenance. In PODS, 2008. Google Scholar
Digital Library
- N. Fuhr and T. Rölleke. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst., 15(1), 1997. Google Scholar
Digital Library
- F. Geerts and A. Poggi. On database query languages for k-relations. J. Applied Logic, 8(2), 2010.Google Scholar
- T.J. Green. Collaborative data sharing with mappings and provenance. PhD thesis, University of Pennsylvania, 2009. Google Scholar
Digital Library
- T.J. Green. Containment of conjunctive queries on annotated relations. In ICDT, 2009. Google Scholar
Digital Library
- T.J. Green, Z. Ives, and V. Tannen. Reconcilable differences. In ICDT, 2009. Google Scholar
Digital Library
- T.J. Green, G. Karvounarakis, Z. Ives, and V. Tannen. Update exchange with mappings and provenance. In VLDB, 2007. Google Scholar
Digital Library
- T.J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS, 2007. Google Scholar
Digital Library
- T.J. Green, G. Karvounarakis, N. E. Taylor, O. Biton, Z. Ives, and V. Tannen. Orchestra: facilitating collaborative data sharing. In Proc. of SIGMOD, 2007. Google Scholar
Digital Library
- A. Gupta, I.S. Mumick, and V.S. Subrahmanian. Maintaining views incrementally. In SIGMOD, 1993. Google Scholar
Digital Library
- J. Huang, T. Chen, A. Doan, and J. F. Naughton. On the provenance of non-answers to queries over extracted data. Proc. VLDB, 1, 2008. Google Scholar
Digital Library
- T. Imielinski and W. Lipski. Incomplete information in relational databases. J. ACM, 31(4), 1984. Google Scholar
Digital Library
- S. Issam, F. Adrian, and S. Vladimiro. A formal model of provenance in distributed systems. In First workshop on on Theory and practice of provenance, 2009. Google Scholar
Digital Library
- G. Karvounarakis, Z. G. Ives, and V. Tannen. Querying data provenance. In SIGMOD Conference, 2010. Google Scholar
Digital Library
- C. Koch. Incremental query evaluation in a ring of databases. In PODS, 2010. Google Scholar
Digital Library
- C. Koch and D. Olteanu. Conditioning probabilistic databases. PVLDB, 1(1), 2008. Google Scholar
Digital Library
- N. Kwasnikowska and J. Van den Bussche. Mapping the nrc dataflow model to the open provenance model. In IPAW, pages 3--16, 2008. Google Scholar
Digital Library
- J. Lechtenbörger, H. Shu, and G. Vossen. Aggregate queries over conditional tables. J. Intell. Inf. Syst., 19(3), 2002. Google Scholar
Digital Library
- S.K. Lellahi and V. Tannen. A calculus for collections and aggregates. In Cat. Theory and Comp. Science, 1997. Google Scholar
Digital Library
- J. Li, B. Saha, and A. Deshpande. A unified approach to ranking in probabilistic databases. Proc. VLDB, 2, 2009. Google Scholar
Digital Library
- B. Liu, L. Chiticariu, V. Chu, H.V. Jagadish, and F. Reiss. Refining information extraction rules using data provenance. IEEE Data Eng. Bull., 33(3), 2010.Google Scholar
- A. Meliou, W. Gatterbauer, K. F. Moore, and D. Suciu. The complexity of causality and responsibility for query answers and non-answers. PVLDB, 4(1), 2010. Google Scholar
Digital Library
- C. Ré and D. Suciu. Efficient evaluation of having queries on a probabilistic database. In DBPL, 2007.Google Scholar
- S. Vansummeren and J. Cheney. Recording provenance for sql queries and updates. IEEE Data Eng. Bull., 30(4), 2007.Google Scholar
- W. Zhou, M. Sherr, T. Tao, X. Li, B. T. Loo, and Y. Mao. Efficient querying and maintenance of network provenance at internet-scale. In SIGMOD, 2010. Google Scholar
Digital Library
- E. Zimányi. Query evaluation in probabilistic relational databases. Theor. Comput. Sci., 171(1-2), 1997. Google Scholar
Digital Library
Index Terms
Provenance for aggregate queries
Recommendations
Aggregate queries over ontologies
ONISW '08: Proceedings of the 2nd international workshop on Ontologies and information systems for the semantic webAnswering queries over ontologies is an important issue for the Semantic Web. Aggregate queries were widely studied for relational databases but almost no results are known for aggregate queries over ontologies. In this work we investigate the latter ...
Selecting and using views to compute aggregate queries
We consider a workload of aggregate queries and investigate the problem of selecting materialized views that (1) provide equivalent rewritings for all the queries, and (2) are optimal, in that the cost of evaluating the query workload is minimized. We ...
Routing and processing multiple aggregate queries in sensor networks
SenSys '06: Proceedings of the 4th international conference on Embedded networked sensor systemsWe present a novel approach to processing continuous aggregate queries in sensor networks, which lifts the assumption of tree-based routing. Given a query workload and a special-purpose gateway node where results are expected, the query optimizer ...






Comments