ABSTRACT
While XML is nowadays adopted as the de facto standard for data exchange, historically, its predecessor SGML was invented for describing electronic documents, i.e., marked up text. Actually, today there are still large volumes of such XML texts. We consider simple transformations which can change the internal structure of documents, that is, the mark-up, and can filter out parts of the text but do not disrupt the ordering of the words. Specifically, we focus on XML transformations where the transformed document is a subsequence of the input document when ignoring mark-up. We call the latter text-preserving XML transformations. We characterize such transformations as copy- and rearrange-free transductions. Furthermore, we study the problem of deciding whether a given XML transducer is text-preserving over a given tree language. We consider top-down transducers as well as the abstraction of XSLT called DTL. We show that deciding whether a transformation is text-preserving over an unranked regular tree language is in PTime for top-down transducers, EXPTime-complete for DTL with XPath, and decidable for DTL with MSO patterns. Finally, we obtain that for every transducer in one of the above mentioned classes, the maximal subset of the input schema can be computed on which the transformation is text-preserving.
- J. Albert, D. Giammerresi, D. Wood. Normal form algorithms for extended context free grammars. Theor. Comp. Sc., 267(1-2):35--47, 2001. Google Scholar
Digital Library
- G. J. Bex, S. Maneth, F. Neven. A formal model for an expressive fragment of XSLT. Inf. Syst., 27(1):21--39, 2002. Google Scholar
Digital Library
- H. Björklund, W. Gelade, W. Martens. Incremental XPath evaluation. ACM Trans. Database Syst., 35(4), 2011. Google Scholar
Digital Library
- R. Bloem, J. Engelfriet. A comparison of tree transductions defined by monadic second order logic and by attribute grammars. J. Comput. Syst. Sci., 61(1):1--50, 2000. Google Scholar
Digital Library
- M. Bojanczyk. Tree-walking automata. In LATA, pages 1--2, 2008. Google Scholar
Digital Library
- M. Bojanczyk, A. Muscholl, T. Schwentick, L. Segoufin. Two-variable logic on data trees and XML reasoning. Journal of the ACM, 56(3), 2009. Google Scholar
Digital Library
- J. Engelfriet, S. Maneth. A comparison of pebble tree transducers with macro tree transducers. Acta Inf., 39(9):613--698, 2003.Google Scholar
Digital Library
- L. Libkin. Elements Of Finite Model Theory. Springer Verlag, 2004. Google Scholar
Digital Library
- S. Maneth, A. Berlea, T. Perst, H. Seidl. XML type checking with macro tree transducers. In PODS, pages 283--294, 2005. Google Scholar
Digital Library
- S. Maneth, S. Friese, H. Seidl. Type checking of tree walking transducers. In Modern Applications of Automata Theory. World Scientific Publishing, 2011.Google Scholar
- S. Maneth, F. Neven. Structured document transformations based on XSL. In DBPL, pages 80--98, 1999. Google Scholar
Digital Library
- S. Maneth, T. Perst, H. Seidl. Exact XML type checking in polynomial time. In ICDT, pages 254--268, 2007. Google Scholar
Digital Library
- W. Martens, F. Neven. On the complexity of typechecking top-down XML transformations. Theor. Comp. Sc., 336(1):153--180, 2005. Google Scholar
Digital Library
- W. Martens, F. Neven. Frontiers of tractability for typechecking simple XML transformations. J. Comput. Syst. Sci., 73(3):362--390, 2007. Google Scholar
Digital Library
- W. Martens, F. Neven, M. Gyssens. Typechecking top-down XML transformations: Fixed input or output schemas. Inf. and Comput., 206(7):806--827, 2008. Google Scholar
Digital Library
- W. Martens, F. Neven, T. Schwentick. Complexity of decision problems for XML schemas and chain regular expressions. SIAM J. Comput., 39(4):1486--1530, 2009. Google Scholar
Digital Library
- M. Marx. XPath with conditional axis relations. In EDBT, pages 477--494, 2004.Google Scholar
Cross Ref
- T. Milo, D. Suciu, V. Vianu. Typechecking for XML transformers. J. Comput. Syst. Sci., 66(1):66--97, 2003. Google Scholar
Digital Library
- F. Neven. On the power of walking for querying tree-structured data. In PODS, pages 77--84, 2002. Google Scholar
Digital Library
- F. Neven. Attribute grammars for unranked trees as a query language for structured documents. J. Comput. Syst. Sci., 70(2):221--257, 2005. Google Scholar
Digital Library
- F. Neven, T. Schwentick. On the complexity of XPath containment in the presence of disjunction, DTDs, and variables. Log. Meth. in Comp. Sc., 2(3), 2006.Google Scholar
- T. Perst and H. Seidl. Macro forest transducers. Inf. Process. Lett., 89(3):141--149, 2004. Google Scholar
Digital Library
- M. Samuelides and L. Segoufin. Complexity of pebble tree-walking automata. In FCT, pages 458--469, 2007. Google Scholar
Digital Library
- B. ten Cate and C. Lutz. The complexity of query containment in expressive fragments of XPath 2.0. Journal of the ACM, 56(6), 2009. Google Scholar
Digital Library
Index Terms
The complexity of text-preserving XML transformations
Recommendations
Constraint Preserving Transformation from Relational Schema to XML Schema
XML has become the standard for publishing and exchanging data on the Web. However, most business data is managed and will remain to be managed by relational database management systems. As such, there is an increasing need to efficiently and accurately ...
Mapping of bibliographical standards into XML
The most popular bibliographical standards, which prescribe the exchange of bibliographical data in machine readable form, are MARC (Machine Readable Cataloguing) and UNIMARC (Universal Machine Readable Cataloguing). This paper presents two schemas, ...
On the complexity of typechecking top-down XML transformations
Database theoryWe investigate the typechecking problem for XML transformations: statically verifying that every answer to a transformation conforms to a given output schema, for inputs satisfying a given input schema. As typechecking quickly turns undecidable for ...






Comments