skip to main content
10.1145/1989284.1989316acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

The complexity of text-preserving XML transformations

Authors Info & Claims
Published:13 June 2011Publication History

ABSTRACT

While XML is nowadays adopted as the de facto standard for data exchange, historically, its predecessor SGML was invented for describing electronic documents, i.e., marked up text. Actually, today there are still large volumes of such XML texts. We consider simple transformations which can change the internal structure of documents, that is, the mark-up, and can filter out parts of the text but do not disrupt the ordering of the words. Specifically, we focus on XML transformations where the transformed document is a subsequence of the input document when ignoring mark-up. We call the latter text-preserving XML transformations. We characterize such transformations as copy- and rearrange-free transductions. Furthermore, we study the problem of deciding whether a given XML transducer is text-preserving over a given tree language. We consider top-down transducers as well as the abstraction of XSLT called DTL. We show that deciding whether a transformation is text-preserving over an unranked regular tree language is in PTime for top-down transducers, EXPTime-complete for DTL with XPath, and decidable for DTL with MSO patterns. Finally, we obtain that for every transducer in one of the above mentioned classes, the maximal subset of the input schema can be computed on which the transformation is text-preserving.

References

  1. J. Albert, D. Giammerresi, D. Wood. Normal form algorithms for extended context free grammars. Theor. Comp. Sc., 267(1-2):35--47, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. J. Bex, S. Maneth, F. Neven. A formal model for an expressive fragment of XSLT. Inf. Syst., 27(1):21--39, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. H. Björklund, W. Gelade, W. Martens. Incremental XPath evaluation. ACM Trans. Database Syst., 35(4), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Bloem, J. Engelfriet. A comparison of tree transductions defined by monadic second order logic and by attribute grammars. J. Comput. Syst. Sci., 61(1):1--50, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Bojanczyk. Tree-walking automata. In LATA, pages 1--2, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Bojanczyk, A. Muscholl, T. Schwentick, L. Segoufin. Two-variable logic on data trees and XML reasoning. Journal of the ACM, 56(3), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Engelfriet, S. Maneth. A comparison of pebble tree transducers with macro tree transducers. Acta Inf., 39(9):613--698, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Libkin. Elements Of Finite Model Theory. Springer Verlag, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Maneth, A. Berlea, T. Perst, H. Seidl. XML type checking with macro tree transducers. In PODS, pages 283--294, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Maneth, S. Friese, H. Seidl. Type checking of tree walking transducers. In Modern Applications of Automata Theory. World Scientific Publishing, 2011.Google ScholarGoogle Scholar
  11. S. Maneth, F. Neven. Structured document transformations based on XSL. In DBPL, pages 80--98, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Maneth, T. Perst, H. Seidl. Exact XML type checking in polynomial time. In ICDT, pages 254--268, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. W. Martens, F. Neven. On the complexity of typechecking top-down XML transformations. Theor. Comp. Sc., 336(1):153--180, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. W. Martens, F. Neven. Frontiers of tractability for typechecking simple XML transformations. J. Comput. Syst. Sci., 73(3):362--390, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. W. Martens, F. Neven, M. Gyssens. Typechecking top-down XML transformations: Fixed input or output schemas. Inf. and Comput., 206(7):806--827, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. W. Martens, F. Neven, T. Schwentick. Complexity of decision problems for XML schemas and chain regular expressions. SIAM J. Comput., 39(4):1486--1530, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Marx. XPath with conditional axis relations. In EDBT, pages 477--494, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  18. T. Milo, D. Suciu, V. Vianu. Typechecking for XML transformers. J. Comput. Syst. Sci., 66(1):66--97, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. F. Neven. On the power of walking for querying tree-structured data. In PODS, pages 77--84, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. F. Neven. Attribute grammars for unranked trees as a query language for structured documents. J. Comput. Syst. Sci., 70(2):221--257, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. F. Neven, T. Schwentick. On the complexity of XPath containment in the presence of disjunction, DTDs, and variables. Log. Meth. in Comp. Sc., 2(3), 2006.Google ScholarGoogle Scholar
  22. T. Perst and H. Seidl. Macro forest transducers. Inf. Process. Lett., 89(3):141--149, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Samuelides and L. Segoufin. Complexity of pebble tree-walking automata. In FCT, pages 458--469, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. ten Cate and C. Lutz. The complexity of query containment in expressive fragments of XPath 2.0. Journal of the ACM, 56(6), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The complexity of text-preserving XML transformations

                    Recommendations

                    Comments

                    Login options

                    Check if you have access through your login credentials or your institution to get full access on this article.

                    Sign in
                    • Published in

                      cover image ACM Conferences
                      PODS '11: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
                      June 2011
                      332 pages
                      ISBN:9781450306607
                      DOI:10.1145/1989284

                      Copyright © 2011 ACM

                      Publisher

                      Association for Computing Machinery

                      New York, NY, United States

                      Publication History

                      • Published: 13 June 2011

                      Permissions

                      Request permissions about this article.

                      Request Permissions

                      Check for updates

                      Qualifiers

                      • research-article

                      Acceptance Rates

                      Overall Acceptance Rate476of1,835submissions,26%

                    PDF Format

                    View or Download as a PDF file.

                    PDF

                    eReader

                    View online with eReader.

                    eReader
                    About Cookies On This Site

                    We use cookies to ensure that we give you the best experience on our website.

                    Learn more

                    Got it!