skip to main content
10.1145/1989284.1989295acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Determining the currency of data

Published:13 June 2011Publication History

ABSTRACT

Data in real-life databases become obsolete rapidly. One often finds that multiple values of the same entity reside in a database. While all of these values were once correct, most of them may have become stale and inaccurate. Worse still, the values often do not carry reliable timestamps. With this comes the need for studying data currency, to identify the current value of an entity in a database and to answer queries with the current values, in the absence of timestamps.

This paper investigates the currency of data. (1) We propose a model that specifies partial currency orders in terms of simple constraints. The model also allows us to express what values are copied from other data sources, bearing currency orders in those sources, in terms of copy functions defined on correlated attributes. (2) We study fundamental problems for data currency, to determine whether a specification is consistent, whether a value is more current than another, and whether a query answer is certain no matter how partial currency orders are completed. (3) Moreover, we identify several problems associated with copy functions, to decide whether a copy function imports sufficient current data to answer a query, whether such a function copies redundant data, whether a copy function can be extended to import necessary current data for a query while respecting the constraints, and whether it suffices to copy data of a bounded size. (4) We establish upper and lower bounds of these problems, all matching, for combined complexity and data complexity, and for a variety of query languages. We also identify special cases that warrant lower complexity.

References

  1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. Berti-Equille, A. D. Sarma, X. Dong, A. Marian, and D. Srivastava. Sailing the information ocean with awareness of currents: Discovery and application of source dependence. In CIDR, 2009.Google ScholarGoogle Scholar
  3. L. Bertossi. Consistent query answering in databases. SIGMOD Rec., 35(2), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Bodirsky and J. Kara. The complexity of temporal constraint satisfaction problems. JACM, 57(2), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Buneman, J. Cheney, W. Tan, and S. Vansummeren. Curated databases. In PODS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Cheney, L. Chiticariu, and W. C. Tan. Provenance in databases: Why, how, and where. Foundations and Trends in Databases, 1(4):379--474, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Chomicki. Consistent query answering: Five easy pieces. In ICDT, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Chomicki and D. Toman. Time in database systems. In M. Fisher, D. Gabbay, and L. Vila, editors, Hand-book of Temporal Reasoning in Artificial Intelligence. Elsevier, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  9. J. Clifford, C. E. Dyreson, T. Isakowitz, C. S. Jensen, and R. T. Snodgrass. On the semantics of "now" in databases. TODS, 22(2):171--214, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. E. F. Codd. Extending the database relational model to capture more meaning. TODS, 4(4):397--434, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Deutsch, A. Nash, and J. B. Remmel. The chase revisited. In PODS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. X. Dong, L. Berti-Equille, Y. Hu, and D. Srivastava. Global detection of complex copying relationships between sources. In VLDB, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. X. Dong, L. Berti-Equille, and D. Srivastava. Truth discovery and copying detection in a dynamic world. In VLDB, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. E. Dyreson, C. S. Jensen, and R. T. Snodgrass. Now in temporal databases. In L. Liu and M. T. Ozsu, editors, Encyclopedia of Database Systems. Springer, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  15. W. W. Eckerson. Data quality and the bottom line: Achieving business success through a commitment to high quality data. Data Warehousing Institute, 2002.Google ScholarGoogle Scholar
  16. A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. Duplicate record detection: A survey. TKDE, 19(1):1--16, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. W. Fan, F. Geerts, J. Li, and M. Xiong. Discovering conditional functional dependencies. TKDE, 23(4):683--698, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Garey and D. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Company, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Grahne. The Problem of Incomplete Information in Relational Databases. Springer, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Grohe and G. Schwandtner. The complexity of datalog on linear orders. Logical Methods in Computer Science, 5(1), 2009.Google ScholarGoogle Scholar
  21. T. ImieliŃski and W. Lipski, Jr. Incomplete information in relational databases. JACM, 31(4), 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Knowledge Integrity. Two sides to data decay. DM Review, 2003.Google ScholarGoogle Scholar
  23. P. G. Kolaitis. Schema mappings, data exchange, and metadata management. In PODS, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Koubarakis. Database models for infinite and indefinite temporal information. Inf. Syst., 19(2):141--173, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Koubarakis. The complexity of query evaluation in indefinite temporal constraint databases. TCS, 171(1-2):25--60, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. W. Krentel. Generalizations of Opt P to the polynomial hierarchy. TCS, 97(2):183--198, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Lenzerini. Data integration: A theoretical perspective. In PODS, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.Google ScholarGoogle Scholar
  29. E. Schwalb and L. Vila. Temporal constraints: A survey. Constraints, 3(2-3):129--149, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. T. Snodgrass. Developing Time-Oriented Database Applications in SQL. Morgan Kaufmann, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. van der Meyden. The complexity of querying indefinite data about linearly ordered domains. JCSS, 54(1), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. van der Meyden. Logical approaches to incomplete information: A survey. In J. Chomicki and G. Saake, editors, Logics for Databases and Information Systems. Kluwer, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. V. Vianu. Dynamic functional dependencies and database aging. J. ACM, 34(1):28--59, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. H. Zhang, Y. Diao, and N. Immerman. Recognizing patterns in streams with imprecise timestamps. In VLDB, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Determining the currency of data

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PODS '11: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
          June 2011
          332 pages
          ISBN:9781450306607
          DOI:10.1145/1989284

          Copyright © 2011 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 June 2011

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate476of1,835submissions,26%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!