ABSTRACT
Dependency theory is almost as old as relational databases themselves, and has traditionally been used to improve the quality of schema, among other things. Recently there has been renewed interest in dependencies for improving the quality of data. The increasing demand for data quality technology has also motivated revisions of classical dependencies, to capture more inconsistencies in real-life data, and to match, repair and query the inconsistent data. This paper aims to provide an overview of recent advances in revising classical dependencies for improving data quality.
Supplemental Material
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google Scholar
Digital Library
- S. Abiteboul, L. Segoufin, and V. Vianu. Representing and querying XML with incomplete information. TODS 31(1): 208--254, 2006. Google Scholar
Digital Library
- P. Andritsos, A. Fuxman, and R. J.Miller. Clean answers over dirty databases: A probabilistic approach. In ICDE, 2006. Google Scholar
Digital Library
- L. Antova, C. Koch and D. Olteanu. From complete to incomplete information and back. In SIGMOD, 2007. Google Scholar
Digital Library
- L. Antova, C. Koch and D. Olteanu. From complete to incomplete information and back. In SIGMOD, 2007. Google Scholar
Digital Library
- M. Arenas, L. E. Bertossi, and J. Chomicki. Answer sets for consistent query answering in inconsistent databases. TPLP 3(4-5): 393--424, 2003. Google Scholar
Digital Library
- M. Arenas, L. E. Bertossi, and J. Chomicki. Consistent query answers in inconsistent databases. In PODS, 1999. Google Scholar
Digital Library
- M. Arenas, L. E. Bertossi, J. Chomicki, X. He, V. Raghavan, and J. Spinrad. Scalar aggregation in inconsistent databases. TCS 296(3): 405--434, 2003. Google Scholar
Digital Library
- M. Arenas, W. Fan, and L. Libkin. On the complexity of verifying consistency of XML specifications. SICOMP, to appear. Google Scholar
Digital Library
- C. Batini and M. Scannapieco. Data Quality: Concepts, Methodologies and Techniques. Springer, 2006. Google Scholar
Digital Library
- M. Baudinet, J. Chomicki, and P. Wolper. Constraint-generating dependencies. JCSS 59(1): 94--115, 1999. Google Scholar
Digital Library
- L. Bertossi. Consistent query answering in databases. SIG-MOD Rec. 35(2): 68--76, 2006. Google Scholar
Digital Library
- L. E. Bertossi, L. Bravo, E. Franconi, and A. Lopatenko. Complexity and approximation of fixing numerical attributes in databases under integrity constraints. In DBPL, 2005. Google Scholar
Digital Library
- L. Bertossi and J. Chomicki. Query answering in inconsistent databases. Logics for Emerging Applications of Databases, 2003.Google Scholar
- P. Bohannon, W. Fan, E. Elnahrawy, and M. Flaster. Putting context into schema matching. In VLDB, 2006. Google Scholar
Digital Library
- P. Bohannon, W. Fan, M. Flaster, and R. Rastogi. A costbased model and effective heuristic for repairing constraints by value modification. In SIGMOD, 2005. Google Scholar
Digital Library
- P. D. Bra and J. Paredaens. Conditional dependencies for horizontal decompositions. In ICALP, 1983. Google Scholar
Digital Library
- L. Bravo and L. E. Bertossi. Consistent query answers in virtual data integration systems. Inconsistency Tolerance, 2005. Google Scholar
Digital Library
- L. Bravo, W. Fan, F. Geerts, and S. Ma. Increasing the expressivity of conditional functional dependencies without extra charge for complexity. In ICDE, 2008. Google Scholar
Digital Library
- L. Bravo, W. Fan, and S. Ma. Extending dependencies with conditions. In VLDB, 2007. Google Scholar
Digital Library
- F. Bry. Query answering in information systems with integrity constraints. In IICIS, 1996. Google Scholar
Digital Library
- P. Buneman, J. Cheney, W. Tan, and S. Vansummeren. Curated databases. In PODS, 2008. Google Scholar
Digital Library
- A. Calì, D. Lembo, and R. Rosati. On the decidability and complexity of query answering over inconsistent and incomplete databases. In PODS, 2003. Google Scholar
Digital Library
- J. Chomicki. Consistent query answering: Five easy pieces. In ICDT, 2007. Google Scholar
Digital Library
- J. Chomicki and J. Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Inf. Comput. 197(1-2):90--121, 2005. Google Scholar
Digital Library
- J. Chomicki and J.Marcinkowski. On the computational complexity of minimal-change integrity maintenance in relational databases. Inconsistency Tolerance:119--150, 2005. Google Scholar
Digital Library
- E. F. Codd. Relational completeness of data base sublanguages. In R. Rustin (ed.): Database Systems: 65-98, Prentice Hall and IBM Research Report RJ 987, 1972.Google Scholar
- G. Cong, W. Fan, F. Geerts, X. Jia, and S.Ma. Improving data quality: Consistency and accuracy. In VLDB, 2007. Google Scholar
Digital Library
- N. N. Dalvi and D. Suciu. Management of probabilistic data: Foundations and challenges. In PODS, 2007. Google Scholar
Digital Library
- A. Dreibelbis, E. Hechler, B. Mathews, M. Oberhofer, and G. Sauter. Master Data Management architecture patterns. IBM, Mar. 2007.Google Scholar
- W. W. Eckerson. Data quality and the bottom line: Achieving business success through a commitment to high quality data. The Data Warehousing Institute, 2002.Google Scholar
- A. K. Elmagarmid, P. G. Ipeirotis and V. S. Verykios. Duplicate record detection: A survey. TKDE 19(1): 1--16, 1007. Google Scholar
Digital Library
- L. English. Plain English on data quality: Information quality management: The next frontier. DM Review Magazine, 2000.Google Scholar
- R. Fagin. Inverting schema mappings. in PODS, 2007. Google Scholar
Digital Library
- R. Fagin and M. Y. Vardi. The theory of data dependencies - An overview. In ICALP, 1984. Google Scholar
Digital Library
- W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis. Conditional functional dependencies for capturing data inconsistencies. TODS, to appear. Google Scholar
Digital Library
- W. Fan, Y. Hu, J. Liu, S. Ma, and Y. Wu. Computing view dependencies with conditions. Unpublished manuscript.Google Scholar
- W. Fan, X. Jia, and S. Ma. Object identification based on dependencies. Unpublished manuscript.Google Scholar
- W. Fan and L. Libkin. On XML integrity constraints in the presence of DTDs. J. ACM 49(3):368--406, 2002. Google Scholar
Digital Library
- I. Fellegi and D. Holt. A systematic approach to automatic edit and imputation. J. American Statistical Association 71(353):17--35, 1976.Google Scholar
Cross Ref
- S. Flesca, F. Furfaro, S. Greco, and E. Zumpano. Querying and repairing inconsistent XML data. In WISE 2005. Google Scholar
Digital Library
- A. Fuxman, E. Fazli, and R. J. Miller. ConQuer: Efficient management of inconsistent databases. In SIGMOD 2005. Google Scholar
Digital Library
- A. Fuxman and R. J. Miller. First-order query rewriting for inconsistent databases. JCSS 73(4): 610--635, 2007. Google Scholar
Digital Library
- Gartner. Forecast: Data quality tools, worldwide, 2006--2011. 2007.Google Scholar
- S. Ginsburg and E. H. Spanier. On completing tables to satisfy functional dependencies. TCS 39: 309--317, 1985.Google Scholar
Cross Ref
- G. Grahne. The Problem of Incomplete Information in Relational Databases. Springer, 1991. Google Scholar
Digital Library
- G. Greco, S. Greco, and E. Zumpano. A logical framework for querying and repairing inconsistent databases. TKDE 15(6): 1389--1408, 2003. Google Scholar
Digital Library
- M. A. Hernandez and S. Stolfo. Real-world data is dirty: Data cleansing and the merge/purge problem. Data Min. Knowl. Discov. 2(1): 9--37, 1998. Google Scholar
Digital Library
- R. Hull. Specifiable implicational dependency families. J. ACM 31(2): 210--226, 1984. Google Scholar
Digital Library
- T. Imieliński and W. Lipski Jr. Incomplete information in relational databases. J. ACM 31(4): 761--791, 1984. Google Scholar
Digital Library
- P. C. Kanellakis. Elements of relational database theory. In Handbook of Theoretical Computer Science, Volume B: Formal Models and Semantics: 1073--1156, 1990. Google Scholar
Digital Library
- A. C. Klug. Calculating constraints on relational expressions. TODS 5(3):260--290, 1980. Google Scholar
Digital Library
- A. C. Klug and R. Price. Determining view dependencies using tableaux. TODS 7(3):361--380, 1982. Google Scholar
Digital Library
- P. G. Kolaitis. Schema mappings, data exchange, and metadata management. In PODS, 2005. Google Scholar
Digital Library
- D. Lembo, M. Lenzerini, and R. Rosati. Source inconsistency and incompleteness in data integration. In KRDB, 2002.Google Scholar
- M. Lenzerini. Data integration: A theoretical perspective. In PODS, 2002. Google Scholar
Digital Library
- A. Lopatenko and L. E. Bertossi. Complexity of consistent query answering in databases under cardinality-based and incremental repair semantics. In ICDT, 2007. Google Scholar
Digital Library
- A. Lopatenko and L. Bravo. Efficient approximation algorithms for repairing inconsistent databases. In ICDE, 2007.Google Scholar
Cross Ref
- M. J. Maher. Constrained dependencies. TCS 173(1): 113--149, 1997. Google Scholar
Digital Library
- M. J. Maher and D. Srivastava. Chasing constrained tuple-generating dependencies. In PODS, 1996. Google Scholar
Digital Library
- R. van der Meyden. Logical approaches to incomplete information: A survey. In J. Chomicki and G. Saake (eds.): Logics for Databases and Information Systems: 307--356, 1998. Google Scholar
Digital Library
- J. Radcliffe and A. White. Key issues for Master Data Management. Gartner, Jan. 2008.Google Scholar
- K. V. S. V. N. Raju and A. K. Majumdar. Fuzzy functional dependencies and lossless join decomposition of fuzzy relational database systems. TODS 13(2): 129--166, 1988. Google Scholar
Digital Library
- E. Rahm and H. H. Do. Data cleaning: Problems and current approaches. IEEE Data Eng. Bull. 23(4): 3--13, 2000.Google Scholar
- T. Redman. The impact of poor data quality on the typical enterprise. Commun. ACM 41(2): 79--82, 1998. Google Scholar
Digital Library
- C. C. Shilakes and J. Tylman. Enterprise information portals. Merrill Lynch, 1998.Google Scholar
- S. Staworko. Declarative inconsistency handling in relational and semi-structured databases. PhD thesis, the State University of New York at Buffalo, 2007, UB CSE TR 2008-03. Google Scholar
Digital Library
- J. Wijsen. Database repairing using updates. TODS 30(3): 722--768, 2005. Google Scholar
Digital Library
- W. E.Winkler. Methods for evaluating and creating data quality. Inf. Syst. 29(7): 531--550, 2004. Google Scholar
Digital Library
- M. Winslett. Reasoning about action using a possible models approach. In AAAI, 1988.Google Scholar
Index Terms
Dependencies revisited for improving data quality
Recommendations
Semantic of Data Dependencies to Improve the Data Quality
MEDI 2015: Proceedings of the 5th International Conference on Model and Data Engineering - Volume 9344Data quality in databases is a critical challenge because the cost of anomalies may be very high, especially for large databases. Therefore, the correction of these anomalies represents an issue that has become more and more important both in ...
Towards Data Quality into the Data Warehouse Development
DASC '11: Proceedings of the 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure ComputingCommonly, DW development methodologies, paying little attention to the problem of data quality and completeness. One of the common mistakes made during the planning of a data warehousing project is to assume that data quality will be addressed during ...
Dependency discovery in data quality
CAiSE'10: Proceedings of the 22nd international conference on Advanced information systems engineeringA conceptual framework for the automatic discovery of dependencies between data quality dimensions is described. Dependency discovery consists in recovering the dependency structure for a set of data quality dimensions measured on attributes of a ...








Comments