skip to main content
10.1145/1376916.1376918acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
invited-talk

Curated databases

Authors Info & Claims
Published:09 June 2008Publication History

ABSTRACT

Curated databases are databases that are populated and updated with a great deal of human effort. Most reference works that one traditionally found on the reference shelves of libraries -- dictionaries, encyclopedias, gazetteers etc. -- are now curated databases. Since it is now easy to publish databases on the web, there has been an explosion in the number of new curated databases used in scientific research. The value of curated databases lies in the organization and the quality of the data they contain. Like the paper reference works they have replaced, they usually represent the efforts of a dedicated group of people to produce a definitive description of some subject area.

Curated databases present a number of challenges for database research. The topics of annotation, provenance, and citation are central, because curated databases are heavily cross-referenced with, and include data from, other databases, and much of the work of a curator is annotating existing data. Evolution of structure is important because these databases often evolve from semistructured representations, and because they have to accommodate new scientific discoveries. Much of the work in these areas is in its infancy, but it is beginning to provide suggest new research for both theory and practice. We discuss some of this research and emphasize the need to find appropriate models of the processes associated with curated databases.

Skip Supplemental Material Section

Supplemental Material

Low Resolution
High Resolution

References

  1. C. Aravindan and P. Baumgartner. Theorem proving techniques for view deletion in databases. J. Symb. Comput., 29(2):119--147, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Bairoch and R. Apweiler. The SWISS-PROT protein sequence data bank and its supplement trEMBL. Nucleic Acids Research, 25(1):31--36, 1997.]]Google ScholarGoogle ScholarCross RefCross Ref
  3. V. Benzaken, G. Castagna, and A. Frisch. CDuce: an XML-centric general-purpose language. In ICFP 2003, pages 51--63. ACM, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. J. Bex, W. Gelada, F. Neven, and S. Vansummeren. Learning deterministic regular expressions for the inference of schemas from XML data. In WWW 2008, 2008.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. J. Bex, F. Neven, and J. V. den Bussche. DTDs versus XML Schema: a practical study. In WebDB 2004, pages 79--84, New York, NY, USA, 2004. ACM.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. J. Bex, F. Neven, T. Schwentick, and K. Tuyls. Inference of concise DTDs from XML data. In VLDB 2006, pages 115--126, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. J. Bex, F. Neven, and S. Vansummeren. Inferring XML schema definitions from XML data. In VLDB 2007, pages 998--1009, 2007.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. J. Bex, F. Neven, and S. Vansummeren. Inferring XML schema definitions from XML data. In VLDB 2007, pages 998--1009, 2007.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Bowers, L. Delcambre, and D. Maier. Enriching documents in an information portal using superimposed schematics. In dg.o '02: Proceedings of the 2002 annual national conference on Digital government research, pages 1--6. Digital Government Research Center, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Bowers, T. McPhillips, B. Ludaescher, S. Cohen, and S. B. Davidson. A model for user-oriented data provenance in pipelined scientific workflows. In Moreau and Foster {59}, pages 133--147.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. J. Brachman and J. G. Schmolze. An overview of the KL-ONE knowledge representation system. Cognitive Science, 9(2):171--216, 1985.]]Google ScholarGoogle ScholarCross RefCross Ref
  12. P. Buneman. How to cite curated databases and how to make them citable. In SSDBM 2006, pages 195--203. IEEE Computer Society, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Buneman, A. Chapman, and J. Cheney. Provenance management in curated databases. In SIGMOD 2006, pages 539--550, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Buneman, J. Cheney, and S. Vansummeren. On the expressiveness of implicit provenance in query and update languages. In Database Theory - ICDT 2007, volume 4353 of LNCS, pages 209--223, 2007.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Buneman, S. B. Davidson, W. Fan, C. S. Hara, and W. Tan. Keys for XML. Computer Networks, 39(5):473--487, 2002.]]Google ScholarGoogle ScholarCross RefCross Ref
  16. P. Buneman, S. Khanna, K. Tajima, and W. Tan. Archiving scientific data. ACM Trans. Database Syst., 27(1):2--42, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Buneman, S. Khanna, and W. Tan. On the propagation of deletions and annotations through views. In PODS 2002, pages 150--158. ACM, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Buneman, S. Khanna, and W. C. Tan. Why and where: A characterization of data provenance. In Database Theory - ICDT 2001, volume 1973 of LNCS, pages 316--330, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. Buneman, S. A. Naqvi, V. Tannen, and L. Wong. Principles of programming with complex objects and collection types. Theor. Comp. Sci., 149(1):3--48, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Central Intelligence Agency. The world factbook. http://www.cia.gov/cia/publications/factbook/.]]Google ScholarGoogle Scholar
  21. A. Chapman and H. V. Jagadish. Issues in building practical provenance systems. IEEE Data Eng. Bull., 30(4):38--43, 2007.]]Google ScholarGoogle Scholar
  22. J. Cheney. Program slicing and data provenance. IEEE Data Eng. Bull., 30(4):22--28, 2007.]]Google ScholarGoogle Scholar
  23. J. Cheney. Lux: A lightweight, statically typed XML update language. In ACM SIGPLAN Workshop on Programming Language Technology and XML (PLAN-X 2007), pages 25--36, 2007.]]Google ScholarGoogle Scholar
  24. J. Cheney, A. Ahmed, and U. A. Acar. Provenance as dependency analysis. In Database Programming Languages - DBPL 2007, volume 4797 of LNCS, pages 139--153. Springer, 2007.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. Chiticariu and W. Tan. Debugging schema mappings with routes. In VLDB 2006, pages 79--90, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. L. Chiticariu, W. Tan, and G. Vijayvargiya. DBNotes: A post-it system for relational databases based on provenance. In SIGMOD 2005, pages 942--944, 2005. (Demonstration paper).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Cong, W. Fan, and F. Geerts. Annotation propagation revisited for key preserving views. In CIKM 2006, pages 632--641. ACM, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Y. Cui and J. Widom. Run-time translation of view tuple deletions using data lineage. Technical report, Stanford University, 2001.]]Google ScholarGoogle Scholar
  29. Y. Cui, J. Widom, and J. L. Wiener. Tracing the lineage of view data in a warehousing environment. ACM Trans. Database Syst., 25(2):179--227, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. N. Dalvi and D. Suciu. Management of probabilistic data: foundations and challenges. In PODS 2007, pages 1--12. ACM, 2007.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. D. Dowell, R. M. Jokerst, A. Day, S. R. Eddy, and L. Stein. The distributed annotation system. BMC Bioinformatics, 2:7, 2001.]]Google ScholarGoogle ScholarCross RefCross Ref
  32. J. R. Driscoll, N. Sarnak, D. D. Sleator, and R. E. Tarjan. Making Data Structures Persistent. J. Comput. Syst. Sci., 38(1):86--124, 1989.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. W. Fan. Dependencies revisited for improving data quality. In PODS 2008. ACM, June 2008. These proceedings.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. K. Fisher, D. Walker, K. Q. Zhu, and P. White. From dirt to shovels: fully automatic tool generation from ad hoc data. In POPL 2008, pages 421--434. ACM, 2008.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. N. Foster, T. Green, and V. Tannen. Annotated XML: Queries and provenance. In PODS 2008. ACM, June 2008. These proceedings.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Y. Galperin. The molecular biology database collection: 2008 update. Nucleic Acids Research, 36, 2008.]]Google ScholarGoogle Scholar
  37. D. Gao and R. T. Snodgrass. Temporal slicing in the evaluation of XML queries. In VLDB 2003, pages 632--643, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. H. Garcia-Molina, Y. Papakonstantinou, D. Quass, A. Rajaraman, Y. Sagiv, J. D. Ullman, V. Vassalos, and J. Widom. The TSIMMIS approach to mediation: Data models and languages. J. Intell. Inf. Syst., 8:117--132, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. P. Gardner, G. Smith, M. Wheelhouse, and U. Zarfaty. Local hoare reasoning about DOM. In PODS 2008, June 2008. These proceedings.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. F. Geerts, A. Kementsietsidis, and D. Milano. MONDRIAN: Annotating and querying databases through colors and blocks. In ICDE 2006, page 82. IEEE Computer Society, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. F. Geerts and J. Van den Bussche. Relational completeness of query languages for annotated databases. In Database Programming Languages - DBPL 2007, volume 4797 of LNCS, pages 127--137, 2007.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. W. Gelade, W. Martens, and F. Neven. Optimizing schema languages for XML: Numerical constraints and interleaving. In Database Theory - ICDT 2007, volume 4353 of LNCS, pages 269--283. Springer, 2007.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. G. Ghelli, D. Colazzo, and C. Sartiani. Efficient inclusion for a class of XML types with interleaving and counting. In Database Programming Languages: DBPL 2007, volume 4797 of LNCS, pages 231--245, 2007.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS 2007, pages 31--40. ACM Press, 2007.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. J. Hidders, N. Kwasnikowska, J. Sroka, J. Tyszkiewicz, and J. V. den Bussche. DFL: A dataflow language based on petri nets and nested relational calculus. Inf. Syst., 33(3):261--284, 2008.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. H. Hosoya and B. C. Pierce. XDuce: A statically typed xml processing language. ACM Trans. Interet Technol., 3(2):117--148, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. T. Imielinski and J. Witold Lipski. Incomplete information in relational databases. J. ACM, 31(4):761--791, 1984.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. IUPHAR receptor database. http://www.iuphar-db.org.]]Google ScholarGoogle Scholar
  49. S. Jones, D. Abbott, , and S. Ross. Risk Assessment for AHDS Performing Arts Collections: A Response to the Withdrawal of Core Funding. Technical report, Glasgow, December 2007.]]Google ScholarGoogle Scholar
  50. S. Kumar and T. Bednar. Oracle9i flashback query. Technical report, Oracle Corporation, 2001.]]Google ScholarGoogle Scholar
  51. T. Lee, S. Bressan, and S. E. Madnick. Source attribution for querying against semi-structured documents. In First Workshop on Web Information and Data Management, pages 33--39. ACM, 1998.]]Google ScholarGoogle Scholar
  52. H. Liefke and S. B. Davidson. Specifying updates in biomedical databases. In SSDBM 1999, pages 44--53. IEEE, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. D. Lomet, R. Barga, M. F. Mokbel, G. Shegalov, R. Wang, and Y. Zhu. Immortal DB: transaction time support for SQL server. In SIGMOD 2005, pages 939--941. ACM, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. B. Ludäscher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger-Frank, M. Jones, E. Lee, J. Tao, and Y. Zhao. Scientific workflow management and the Kepler system. Concurrency and Computation: Practice & Experience, 18(10):1039--1065, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. P. Maniatis, M. Roussopoulos, T. J. Giuli, D. S. H. Rosenthal, and M. Baker. The LOCKSS peer-to-peer digital preservation system. ACM Trans. Comput. Syst., 23(1):2--50, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. A. J. Mayer and L. J. Stockmeyer. Word problems-this time with interleaving. Inf. Comput., 115(2):293--311, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. D. L. McGuinness, R. Fikes, J. Rice, and S. Wilder. The Chimaera ontology environment. In Proceedings of Twelfth Conference on Innovative Applications of Artificial Intelligence, pages 1123--1124. AAAI Press, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. V. A. McKusick. OMIM - online mendelian inheritance in man. www.ncbi.nlm.nih.gov/omim/.]]Google ScholarGoogle Scholar
  59. L. Moreau and I. T. Foster, editors. Provenance and Annotation of Data - IPAW 2006, volume 4145 of LNCS. Springer, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. H. Müller, P. Buneman, and I. Koltsidas. XArch: Archiving scientific and reference data. In SIGMOD 2008, June 2008. Demonstration Paper. To appear.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. N. F. Noy, M. Sintek, S. Decker, M. Crubezy, R. W. Fergerson, and M. A. Musen. Creating semantic web contents with Protege-2000. IEEE Intelligent Systems, 16(2):60--71, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger, M. Greenwood, T. Carver, K. Glover, M. R. Pocock, A. Wipat, and P. Li. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics, 20(17):3045--3054, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Y. Papakonstantinou, S. Abiteboul, and H. Garcia-Molina. Object fusion in mediator systems. In VLDB 1996, pages 413--424. Morgan Kaufmann, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Plutarch. Vita Thesei 22-23.]]Google ScholarGoogle Scholar
  65. D. Rémy. Type inference for records in a natural extension of ML. In Theoretical aspects of object-oriented programming. MIT Press, 1994.]]Google ScholarGoogle Scholar
  66. A. D. Sarma, O. Benjelloun, A. Halevy, and J. Widom. Working models for uncertain data. In ICDE 2006, page 7. IEEE Computer Society, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. R. T. Snodgrass. Developing Time-Oriented Database Applications in SQL. Morgan Kaufmann, July 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. L. D. Stein and J. Thierry-Mieg. AceDB: A genome database management system. Computing in Science and Engg., 1(3):44--52, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. W. Tan. Containment of relational queries with annotation propagation. In Database Programming Languages - DBPL 2003, volume 2921 of LNCS, pages 37--53. Springer, 2003.]]Google ScholarGoogle Scholar
  70. The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nature Genetics, 25(1):25--29, May 2000.]]Google ScholarGoogle ScholarCross RefCross Ref
  71. F. Wang and C. Zaniolo. Temporal queries in XML document archives and web warehouses. In TIME, pages 47--55. IEEE Computer Society, 2003.]]Google ScholarGoogle Scholar
  72. Y. R. Wang and S. E. Madnick. A polygen model for heterogeneous database systems: The source tagging perspective. In VLDB 1990, pages 519--538. Morgan Kaufmann, 1990.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. M. Weiser. Program slicing. In ICSE, pages 439--449, Piscataway, NJ, USA, 1981. IEEE Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. G. Yang, I. V. Ramakrishnan, and M. Kifer. On the complexity of schema inference from web pages in the presence of nullable data attributes. In CIKM 2003, pages 224--231. ACM, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Curated databases

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PODS '08: Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
          June 2008
          330 pages
          ISBN:9781605581521
          DOI:10.1145/1376916

          Copyright © 2008 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 9 June 2008

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • invited-talk

          Acceptance Rates

          Overall Acceptance Rate476of1,835submissions,26%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!