ABSTRACT
We show that relational algebra calculations for incomplete databases, probabilistic databases, bag semantics and why-provenance are particular cases of the same general algorithms involving semirings. This further suggests a comprehensive provenance representation that uses semirings of polynomials. We extend these considerations to datalog and semirings of formal power series. We give algorithms for datalog provenance calculation as well as datalog evaluation for incomplete and probabilistic databases. Finally, we show that for some semirings containment of conjunctive queries is the same as for standard set semantics.
Supplemental Material
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google Scholar
Digital Library
- F. N. Afrati and C. H. Papadimitriou. The parallel complexity of simple logic programs. J. ACM, 40(4):891--916, 1993. Google Scholar
Digital Library
- O. Benjelloun, A. D. Sarma, A. Y. Halevy, and J. Widom. ULDBs: Databases with uncertainty and lineage. In VLDB, 2006. Google Scholar
Digital Library
- S. Bistarelli, U. Montanari, and F. Rossi. Semiring-based constraint satisfaction and optimization. JACM, 44(2):201--236, 1997. Google Scholar
Digital Library
- S. Bistarelli, U. Montanari, and F. Rossi. Semiring-based constraint logic programming: syntax and semantics. ACM TOPLAS, 23(1):1--29, 2001. Google Scholar
Digital Library
- P. Buneman, S. Khanna, and W. -C. Tan. Why and where: A characterization of data provenance. In ICDT, 2001. Google Scholar
Digital Library
- A. K. Chandra and P. M. Merlin. Optimal implementation of conjunctive queries in relational data bases. In STOC, 1977. Google Scholar
Digital Library
- L. Chiticariu and W.-C. Tan. Debugging schema mappings with routes. In VLDB, 2006. Google Scholar
Digital Library
- N. Chomsky and M. -P. Schützenberger. The algebraic theory of context-free languages. Computer Programming and Formal Systems, pages 118--161, 1963.Google Scholar
Cross Ref
- M. P. Consens and A. O. Mendelzon. Low complexity aggregation in graphlog and datalog. In ICDT, 1990. Google Scholar
Digital Library
- P. Crawley and R. P. Dilworth. Algebraic Theory of Lattices. Prentice Hall, 1973.Google Scholar
- Y. Cui. Lineage Tracing in Data Warehouses. PhD thesis, Stanford University, 2001. Google Scholar
Digital Library
- Y. Cui, J. Widom, and J. L. Wiener. Tracing the lineage of view data in a warehousing environment. TODS, 25(2), 2000. Google Scholar
Digital Library
- N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, 2004.Google Scholar
Digital Library
- F. Dong and L. V. S. Lakshmanan. Deductive databases with incomplete information. In Symposium on Logic Programming, 1992.Google Scholar
- N. Fuhr. Probabilistic datalog -- a logic for powerful retrieval methods. In SIGIR, 1995. Google Scholar
Digital Library
- N. Fuhr and T. Rölleke. A probabilistic relational algebra for the integration of information retrieval and database systems. TOIS, 14(1), 1997. Google Scholar
Digital Library
- T. J. Green and V. Tannen. Models for incomplete and probabilistic information. In EDBT Workshops, 2006. Google Scholar
Digital Library
- T. Imielinski and W. Lipski. Incomplete information in relational databases. JACM, 31(4), 1984. Google Scholar
Digital Library
- Y. E. Ioannidis and R. Ramakrishnan. Containment of conjunctive queries: beyond relations as sets. TODS, 20(3), 1995. Google Scholar
Digital Library
- P. Kolaitis and M. Vardi. Conjunctive-query containment and constraint satisfaction. In PODS, 1998. Google Scholar
Digital Library
- W. Kuich. Semirings and formal power series. In Handbook of formal languages, volume 1. Springer, 1997.Google Scholar
Digital Library
- L. V. S. Lakshmanan, N. Leone, R. Ross, and V. S. Subrahmanian. Probview: a flexible probabilistic database system. TODS, 22(3), 1997. Google Scholar
Digital Library
- L. V. S. Lakshmanan and F. Sadri. Probabilistic deductive databases. In Symposium on Logic Programming, 1994. Google Scholar
Digital Library
- M. Maher and R. Ramakrishnan. Déjàvu in fixpoints of logic programs. In NACLP, 1989.Google Scholar
- I. S. Mumick. Query Optimization in Deductive and Relational Databases. PhD thesis, Stanford University, 1991. Google Scholar
Digital Library
- I. S. Mumick, H. Pirahesh, and R. Ramakrishnan. The magic of duplicates and aggregates. In VLDB Journal, 1990. Google Scholar
Digital Library
- I. S. Mumick and O. Shmueli. Finiteness properties of database queries. In Fourth Australian Database Conference, 1993.Google Scholar
- Y. Sagiv and M. Yannakakis. Equivalences among relational expressions with the union and difference operators. J. ACM, 27(4), 1980. Google Scholar
Digital Library
- A. D. Sarma, O. Benjelloun, A. Halevy, and J. Widom. Working models for uncertain data. In ICDE, 2006. Google Scholar
Digital Library
- J. D. Ullman. Principles of Database and Knowledge-Base Systems, volume II. Computer Science Press, 1989. Google Scholar
Digital Library
- L. A. Zadeh. Fuzzy sets. Inf. Control, 8(3), 1965.Google Scholar
- E. Zimányi. Query evaluation in probabilistic relational databases. TCS, 171(1-2), 1997. Google Scholar
Digital Library
Index Terms
Provenance semirings
Recommendations
Querying data provenance
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of dataMany advanced data management operations (e.g., incremental maintenance, trust assessment, debugging schema mappings, keyword search over databases, or query answering in probabilistic databases), involve computations that look at how a tuple was ...
Data Provenance for Historical Queries in Relational Database
Compute '15: Proceedings of the 8th Annual ACM India ConferenceCapturing, modeling, and querying data provenance in databases has gained considerable importance in the last decade. All kinds of applications developed on top of databases, now a days collect provenance for various purposes like trustworthiness of ...
PUG: a framework and practical implementation for why and why-not provenance
Explaining why an answer is (or is not) returned by a query is important for many applications including auditing, debugging data and queries, and answering hypothetical questions about data. In this work, we present the first practical approach for ...






Comments