ABSTRACT
The paper investigates the question of whether a partially closed database has complete information to answer a query. In practice an enterprise often maintains master data Dm, a closed-world database. We say that a database D is partially closed if it satisfies a set V of containment constraints of the form "q(D) is a subset of p(Dm)", where q is a query in a language Lc and p is a projection query. The part of D not constrained by (Dm,V) is open, from which some tuples may be missing. The database D is said to be complete for a query Q relative to (Dm,V) if for all partially closed extensions D' of D, Q(D')=Q(D), i.e., adding tuples to D either violates some constraints in V or does not change the answer to Q.
We first show that the proposed model can also capture the consistency of data, in addition to its relative completeness. Indeed, integrity constraints studied for consistency can be expressed as containment constraints. We then study two problems. One is to decide, given Dm, V, a query Q in a language Lq and a partially closed database D, whether D is complete for Q relative to (Dm,V). The other is to determine, given Dm, V and Q, whether there exists a partially closed database that is complete for Q relative to (Dm,V). We establish matching lower and upper bounds on these problems for a variety of languages Lq and Lc. We also provide characterizations for a database to be relatively complete, and for a query to allow a relatively complete database, when Lq and Lc are conjunctive queries.
- S. Abiteboul and O.M. Duschka. Complexity of answering queries using materialized views. In PODS, 1998. Google Scholar
Digital Library
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google Scholar
Digital Library
- M. Arenas, L. Bertossi, and J. Chomicki. Consistent query answers in inconsistent databases. In PODS, 1999. Google Scholar
Digital Library
- C. Batini and M.Scannapieco. Data Quality: Concepts, Methodologies and Techniques. Springer, 2006. Google Scholar
Digital Library
- L. Bravo, W. Fan, and S. Ma. Extending dependencies with conditions. In VLDB, 2007. Google Scholar
Digital Library
- A. Cali, D. Lembo, and R. Rosati. On the decidability and complexity of query answering over inconsistent and incomplete databases. In PODS, 2003. Google Scholar
Digital Library
- D. Calvanese, G.D. Giacomo, M. Lenzerini, and M.Y. Vardi. View-based query processing: On the relationship between rewriting, answering and losslessness. TCS, 371(3), 2007. Google Scholar
Digital Library
- J. Chomicki. Consistent query answering: Five easy pieces. In ICDT, 2007. Google Scholar
Digital Library
- E. Dantsin and A. Voronkov. Complexity of query answering in logic databases with complex values. In LFCS, 2007. Google Scholar
Digital Library
- A. Deutsch, B. Ludaescher, and A. Nash. Rewriting queries using views with access patterns under integrity constraints. TCS, 371(3), 2007. Google Scholar
Digital Library
- A. Dreibelbis, E. Hechler, B. Mathews, M. Oberhofer, and G. Sauter. Master data management architecture patterns. IBM, 2007.Google Scholar
- C. Elkan. Independence of logic database queries and updates. In PODS, 1990. Google Scholar
Digital Library
- W. Fan. Dependencies revisited for improving data quality. In PODS, 2008. Google Scholar
Digital Library
- W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis. Conditional functional dependencies for capturing data inconsistencies. TODS, 33(1), 2008. Google Scholar
Digital Library
- G. Gottlob and R. Zicari. Closed world databases opened through null values. In VLDB, 1988. Google Scholar
Digital Library
- G. Grahne. The Problem of Incomplete Information in Relational Databases. Springer, 1991. Google Scholar
Digital Library
- T. Imielinski and W. Lipski, Jr. Incomplete information in relational databases. JACM, 31(4), 1984. Google Scholar
Digital Library
- A.Y. Levy. Obtaining complete answers from incomplete databases. In VLDB, 1996. Google Scholar
Digital Library
- A.Y. Levy and Y. Sagiv. Queries independent of updates. In VLDB, 1993. Google Scholar
Digital Library
- C. Li. Computing complete answers to queries in the presence of limited access patterns. VLDB J., 12(3), 2003. Google Scholar
Digital Library
- D. Loshin. Master Data Management. Knowledge Integrity, Inc., 2009. Google Scholar
Digital Library
- A. Motro. Integrity = validity + completeness. TODS, 14(4), 1989. Google Scholar
Digital Library
- C.H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.Google Scholar
- J. Radcliffe and A. White. Key issues for master data management. Gartner, 2008.Google Scholar
- L. Segoufin and V. Vianu. Views and queries: determinacy and rewriting. In PODS, 2005. Google Scholar
Digital Library
- M. Spielmann. Abstract state machines: Verification problems and complexity. PhD thesis, RWTH Aachen, 2000.Google Scholar
- R. van der Meyden. Logical approaches to incomplete information: A survey. In J. Chomicki and G. Saake, editors, Logics for Databases and Information Systems. Kluwer, 1998. Google Scholar
Digital Library
- M. Vardi. On the integrity of databases with incomplete information. In PODS, 1986. Google Scholar
Digital Library
Index Terms
Relative information completeness
Recommendations
Relative information completeness
This article investigates the question of whether a partially closed database has complete information to answer a query. In practice an enterprise often maintains master data Dm, a closed-world database. We say that a database D is partially closed if ...
Capturing missing tuples and missing values
PODS '10: Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsDatabases in real life are often neither entirely closed-world nor entirely open-world. Indeed, databases in an enterprise are typically partially closed, in which a part of the data is constrained by master data that contains complete information about ...
Structured content-based query answers for improving information quality
Extensible markup language (XML) has been widely adopted as a standard to exchange and integrate data over multiple sources. This allows users to explore large datasets through a declarative query interface, such as XQuery and XPath. However, the ...






Comments