skip to main content
10.1145/1600193.1600197acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
research-article

Efficient change control of XML documents

Published:16 September 2009Publication History

ABSTRACT

XML-based documents play a major role in modern information architectures and their corresponding workflows. In this context, the ability to identify and represent differences between two versions of a document is essential. Several approaches to finding the differences between XML documents have already been proposed. Typically, they are based on tree-to-tree correction, or sequence alignment. Most of these algorithms, however, are too slow and do not support the subsequent merging of changes. In this paper, we present a differencing algorithm tailored to ordered XML documents, called DocTreeDiff. It relies on our context-oriented XML versioning model which allows for document merging, presented in earlier work. An empiric evaluation demonstrates the efficiency of our approach as well as the high quality of the generated deltas.

References

  1. L. Bergroth, H. Hakonen, and T. Raita. A survey of longest common subsequence algorithms. In SPIRE '00: Proceedings of the 7th International Symposium on String Processing Information Retrieval, page 39, Washington, DC, USA, 2000. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Bernard, L. Boyer, A. Habrard, and M. Sebban. Learning probabilistic models of tree edit distance. Pattern Recogn., 41(8):2611--2629, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Bille. A survey on tree edit distance and related problems. Theor. Comput. Sci., 337(1-3):217--239, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Brauer, R. Weir, and M. McRae. OpenDocument v1.1 specification, 2007.Google ScholarGoogle Scholar
  5. S. S. Chawathe and H. Garcia--Molina. Meaningful change detection in structured data. SIGMOD Rec., 26(2):26--37, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. S. Chawathe, A. Rajaraman, H. Garcia-Molina, and J. Widom. Change detection in hierarchically structured information. In SIGMOD '96: Proceedings of the 1996 ACM SIGMOD conference on Management of data, pages 493--504, New York, NY, USA, 1996. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. Chen. New algorithm for ordered tree-to-tree correction problem. Journal of Algorithms, 40(2):135 -- 158, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Cobéna, S. Abiteboul, and A. Marian. Detecting Changes in XML Documents. In Proceedings of the 18th International Conference on Data Engineering, 26 February -- 1 March 2002, San Jose, CA, pages 41--52. IEEE Computer Society, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. DeRose and J. Clark. XML path language (XPath) version 1.0. W3C recommendation, W3C, Nov. 1999. http://www.w3.org/TR/1999/REC-xpath-19991116.Google ScholarGoogle Scholar
  10. R. L. Fontaine. Merging xml files: a new approach providing intelligent merge of xml data sets. In Proceedings of XML Europe 2002, 2002.Google ScholarGoogle Scholar
  11. K.-S. Huang, C.-B. Yang, K.-T. Tseng, H.-Y. Ann, and Y.-H. Peng. Efficient algorithms for finding interleaving relationship between sequences. Information Processing Letters, 105(5):188 -- 193, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Jansson and A. Lingas. A fast algorithm for optimal alignment between similar ordered trees. In CPM '01: Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching, pages 232--240, London, UK, 2001. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Jiang, L. Wang, and K. Zhang. Alignment of trees -- an alternative to tree edit. Theoretical Computer Science, 143(1):137 -- 148, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Lindholm. A three-way merge for xml documents. In DocEng '04: Proceedings of the 2004 ACM symposium on Document engineering, pages 1--10, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Lindholm, J. Kangasharju, and S. Tarkoma. Fast and simple xml tree differencing by sequence alignment. In DocEng '06: Proceedings of the 2006 ACM symposium on Document engineering, pages 75--84, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. E. W. Myers. An o(nd) difference algorithm and its variations. Algorithmica, 1:251--266, 1986.Google ScholarGoogle ScholarCross RefCross Ref
  17. J. Paoli, I. Valet-Harper, A. Farquhar, and I. Sebestyen. ECMA-376 Office Open XML File Formats, 2006.Google ScholarGoogle Scholar
  18. S. Pemberton. XHTML™ 1.0 the extensible hypertext markup language (second edition). W3C recommendation, W3C, Aug. 2002. http://www.w3.org/TR/2002/REC-xhtml1-20020801.Google ScholarGoogle Scholar
  19. M. O. Rabin. Fingerprinting by random polynomials. Technical Report TR-CSE-03-01, Center for Research in Computing Technology, Harvard University, 1981.Google ScholarGoogle Scholar
  20. S. Rönnau and U. M. Borghoff. Versioning xml-based office documents. Multimedia Tools and Applications, 43(3):253--274, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Rönnau, C. Pauli, and U. M. Borghoff. Merging changes in xml documents using reliable context fingerprints. In DocEng '08: Proceeding of the eighth ACM symposium on Document engineering, pages 52--61, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Rönnau, J. Scheffczyk, and U. M. Borghoff. Towards xml version control of office documents. In DocEng '05: Proceedings of the 2005 ACM symposium on Document engineering, pages 10--19, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. A. Rosado, A. P. Márquez, and J. M. Gil. Managing branch versioning in versioned/temporal xml documents. In XSym 2007: Proceedings of 5th International XML Database Symposium, pages 107--121, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. M. Selkow. The tree-to-tree editing problem. Inf. Process. Lett., 6(6):184--186, 1977.Google ScholarGoogle ScholarCross RefCross Ref
  25. K.-C. Tai. The tree-to-tree correction problem. J. ACM, 26(3):422--433, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Valiente. An efficient bottom-up distance between trees. In SPIRE, pages 212--219, 2001.Google ScholarGoogle Scholar
  27. R. A. Wagner and M. J. Fischer. The string-to-string correction problem. J. ACM, 21(1):168--173, 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. K. Zhang. Efficient parallel algorithms for tree editing problems. In CPM '96: Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching, pages 361--372, London, UK, 1996. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. K. Zhang and D. Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput., 18(6):1245--1262, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. Zhang, J. T.-L. Wang, and D. Shasha. On the editing distance between undirected acyclic graphs and related problems. In CPM '95: Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching, pages 395--407, 1995.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Efficient change control of XML documents

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      DocEng '09: Proceedings of the 9th ACM symposium on Document engineering
      September 2009
      264 pages
      ISBN:9781605585758
      DOI:10.1145/1600193

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 September 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate178of537submissions,33%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader