ABSTRACT
We present a formal framework for capturing the provenance of data appearing in XQuery views of XML. Building on previous work on relations and their (positive) query languages, we decorate unordered XML with annotations from commutative semirings and show that these annotations suffice for a large positive fragment of XQuery applied to this data. In addition to tracking provenance metadata, the framework can be used to represent and process XML with repetitions, incomplete XML, and probabilistic XML, and provides a basis for enforcing access control policies in security applications.
Each of these applications builds on our semantics for XQuery, which we present in several steps: we generalize the semantics of the Nested Relational Calculus (NRC) to handle semiring-annotated complex values, we extend it with a recursive type and structural recursion operator for trees, and we define a semantics for XQuery on annotated XML by translation into this calculus.
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.]] Google Scholar
Digital Library
- S. Abiteboul, L. Segoufin, and V. Vianu. Representing and querying xml with incomplete information. TODS, 31(1):208--254, 2006.]] Google Scholar
Digital Library
- S. Abiteboul and P. Senellart. Querying and updating probabilistic information in XML. In EDBT, 2006.]] Google Scholar
Digital Library
- P. Buneman, J. Cheney, W.-C. Tan, and S. Vansummeren. Curated databases. In PODS, 2008.]] Google Scholar
Digital Library
- P. Buneman, J. Cheney, and S. Vansummeren. On the expressiveness of implicit provenance in query and update languages. In ICDT, 2007.]] Google Scholar
Digital Library
- P. Buneman, M. F. Fernandez, and D. Suciu. UnQL: A query language and algebra for semistructured data based on structural recursion. VLDB J., 9(1):76--110, 2000.]] Google Scholar
Digital Library
- P. Buneman, S. Khanna, and W. C. Tan. Why and where: A characterization of data provenance. In ICDT, 2001.]] Google Scholar
Digital Library
- P. Buneman, S. A. Naqvi, V. Tannen, and L. Wong. Principles of programming with complex objects and collection types. TCS, 149(1):3--48, 1995.]] Google Scholar
Digital Library
- P. Buneman and V. Tannen. A structural approach to query language design. In The Functional Approach to Data Management Modeling, Analyzing, and Integrating Heterogenous Data. Springer, 2004.]]Google Scholar
- J. V. den Bussche, D. V. Gucht, and S. Vansummeren. Well-definedness and semantic type-checking in the nested relational calculus and XQuery. In ICDT, 2005.]] Google Scholar
Digital Library
- D. Draper, P. Fankhauser, M. F. Fernández, A. Malhotra, K. Rose, M. Rys, J. Siméon, and P. Wadler. XQuery 1.0 and XPath 2.0 Formal Semantics. W3C, Jan. 2007.]]Google Scholar
- D. Florescu and D. Kossmann. Storing and querying XML data using an RDMBS. IEEE Data Engineering Bulletin, 22(3), 1999.]]Google Scholar
- J. N. Foster, T. J. Green, and V. Tannen. Annotated XML: Queries and provenance. Technical Report TR-CIS-08-06, University of Pennsylvania, 2008.]]Google Scholar
Digital Library
- G. Grahne, A. Thomo, and W. W. Wadge. Preferentially annotated regular path queries. In ICDT, 2007.]] Google Scholar
Digital Library
- T. J. Green, G. Karvounarakis, Z. G. Ives, and V. Tannen. Update exchange with mappings and provenance. In VLDB, 2007.]] Google Scholar
Digital Library
- T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS, 2007.]] Google Scholar
Digital Library
- J. Hidders, N. Kwasnikowska, J. Sroka, J. Tyszkiewicz, and J. V. den Bussche. A formal model of dataflow repositories. In DILS, 2007.]] Google Scholar
Digital Library
- E. Hung, L. Getoor, and V. S. Subrahmanian. Probabilistic interval XML. ACM TOCL, 8(4), 2007.]] Google Scholar
Digital Library
- T. Imieliński and W. Lipski. Incomplete information in relational databases. JACM, 31(4), 1984.]] Google Scholar
Digital Library
- Y. Kanza, W. Nutt, and Y. Sagiv. Queries with incomplete answers over semistructured data. In PODS, 1999.]] Google Scholar
Digital Library
- S. K. Lellahi and V. Tannen. A calculus for collections and aggregates. In Category Theory and Computer Science, 1997.]] Google Scholar
Digital Library
- L. Libkin and L. Wong. Query languages for bags and aggregate functions. JCSS, 55(2):241--272, 1997.]] Google Scholar
Digital Library
- A. Nierman and H. V. Jagadish. ProTDB: Probabilistic data in XML. In VLDB, 2002.]] Google Scholar
Digital Library
- D. Olteanu, H. Meuss, T. Furche, and F. Bry. XPath: Looking forward. In EDBT Workshops, 2002.]]Google Scholar
Digital Library
- C. Ré, J. Simèon, and M. Fernández. A complete and efficient algebraic compiler for xquery. In ICDE, page 14, 2006.]] Google Scholar
Digital Library
- E. L. Robertson, L. V. Saxton, D. V. Gucht, and S. Vansummeren. Structural recursion on ordered trees and list-based complex objects. In ICDT, 2007.]] Google Scholar
Digital Library
- P. Senellart and S. Abiteboul. On the complexity of managing probabilistic XML data. In PODS, 2007.]] Google Scholar
Digital Library
- J. Shanmugasundaram, K. Tufte, C. Zhang, G. He, D. J. DeWitt, and J. F. Naughton. Relational databases for querying XML documents: Limitations and opportunities. In VLDB J., 1999.]] Google Scholar
Digital Library
- M. van Keulen, A. de Keijzer, and W. Alink. A probabilistic XML approach to data integration. In ICDE, 2005.]] Google Scholar
Digital Library
Index Terms
Annotated XML: queries and provenance
Recommendations
The essence of XML
The World-Wide Web Consortium (W3C) promotes XML and related standards, including XML Schema, XQuery, and XPath. This paper describes a formalization of XML Schema. A formal semantics based on these ideas is part of the official XQuery and XPath ...
XML Processing and Data Integration with XQuery
Most Web applications exchange data as XML, but they create and process this data with languages that don't have native support for XML. With appropriate middleware, XQuery can dramatically simplify this process, treating all data sources as though they ...
The essence of XML
POPL '03: Proceedings of the 30th ACM SIGPLAN-SIGACT symposium on Principles of programming languagesThe World-Wide Web Consortium (W3C) promotes XML and related standards, including XML Schema, XQuery, and XPath. This paper describes a formalization of XML Schema. A formal semantics based on these ideas is part of the official XQuery and XPath ...






Comments