skip to main content
10.1145/1376916.1376954acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Annotated XML: queries and provenance

Published:09 June 2008Publication History

ABSTRACT

We present a formal framework for capturing the provenance of data appearing in XQuery views of XML. Building on previous work on relations and their (positive) query languages, we decorate unordered XML with annotations from commutative semirings and show that these annotations suffice for a large positive fragment of XQuery applied to this data. In addition to tracking provenance metadata, the framework can be used to represent and process XML with repetitions, incomplete XML, and probabilistic XML, and provides a basis for enforcing access control policies in security applications.

Each of these applications builds on our semantics for XQuery, which we present in several steps: we generalize the semantics of the Nested Relational Calculus (NRC) to handle semiring-annotated complex values, we extend it with a recursive type and structural recursion operator for trees, and we define a semantics for XQuery on annotated XML by translation into this calculus.

References

  1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Abiteboul, L. Segoufin, and V. Vianu. Representing and querying xml with incomplete information. TODS, 31(1):208--254, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Abiteboul and P. Senellart. Querying and updating probabilistic information in XML. In EDBT, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Buneman, J. Cheney, W.-C. Tan, and S. Vansummeren. Curated databases. In PODS, 2008.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Buneman, J. Cheney, and S. Vansummeren. On the expressiveness of implicit provenance in query and update languages. In ICDT, 2007.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Buneman, M. F. Fernandez, and D. Suciu. UnQL: A query language and algebra for semistructured data based on structural recursion. VLDB J., 9(1):76--110, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Buneman, S. Khanna, and W. C. Tan. Why and where: A characterization of data provenance. In ICDT, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Buneman, S. A. Naqvi, V. Tannen, and L. Wong. Principles of programming with complex objects and collection types. TCS, 149(1):3--48, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Buneman and V. Tannen. A structural approach to query language design. In The Functional Approach to Data Management Modeling, Analyzing, and Integrating Heterogenous Data. Springer, 2004.]]Google ScholarGoogle Scholar
  10. J. V. den Bussche, D. V. Gucht, and S. Vansummeren. Well-definedness and semantic type-checking in the nested relational calculus and XQuery. In ICDT, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Draper, P. Fankhauser, M. F. Fernández, A. Malhotra, K. Rose, M. Rys, J. Siméon, and P. Wadler. XQuery 1.0 and XPath 2.0 Formal Semantics. W3C, Jan. 2007.]]Google ScholarGoogle Scholar
  12. D. Florescu and D. Kossmann. Storing and querying XML data using an RDMBS. IEEE Data Engineering Bulletin, 22(3), 1999.]]Google ScholarGoogle Scholar
  13. J. N. Foster, T. J. Green, and V. Tannen. Annotated XML: Queries and provenance. Technical Report TR-CIS-08-06, University of Pennsylvania, 2008.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Grahne, A. Thomo, and W. W. Wadge. Preferentially annotated regular path queries. In ICDT, 2007.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. J. Green, G. Karvounarakis, Z. G. Ives, and V. Tannen. Update exchange with mappings and provenance. In VLDB, 2007.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS, 2007.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Hidders, N. Kwasnikowska, J. Sroka, J. Tyszkiewicz, and J. V. den Bussche. A formal model of dataflow repositories. In DILS, 2007.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. Hung, L. Getoor, and V. S. Subrahmanian. Probabilistic interval XML. ACM TOCL, 8(4), 2007.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Imieliński and W. Lipski. Incomplete information in relational databases. JACM, 31(4), 1984.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Kanza, W. Nutt, and Y. Sagiv. Queries with incomplete answers over semistructured data. In PODS, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. K. Lellahi and V. Tannen. A calculus for collections and aggregates. In Category Theory and Computer Science, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Libkin and L. Wong. Query languages for bags and aggregate functions. JCSS, 55(2):241--272, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Nierman and H. V. Jagadish. ProTDB: Probabilistic data in XML. In VLDB, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Olteanu, H. Meuss, T. Furche, and F. Bry. XPath: Looking forward. In EDBT Workshops, 2002.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Ré, J. Simèon, and M. Fernández. A complete and efficient algebraic compiler for xquery. In ICDE, page 14, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. E. L. Robertson, L. V. Saxton, D. V. Gucht, and S. Vansummeren. Structural recursion on ordered trees and list-based complex objects. In ICDT, 2007.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. Senellart and S. Abiteboul. On the complexity of managing probabilistic XML data. In PODS, 2007.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Shanmugasundaram, K. Tufte, C. Zhang, G. He, D. J. DeWitt, and J. F. Naughton. Relational databases for querying XML documents: Limitations and opportunities. In VLDB J., 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. van Keulen, A. de Keijzer, and W. Alink. A probabilistic XML approach to data integration. In ICDE, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Annotated XML: queries and provenance

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PODS '08: Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
      June 2008
      330 pages
      ISBN:9781605581521
      DOI:10.1145/1376916

      Copyright © 2008 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 June 2008

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate476of1,835submissions,26%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!