ABSTRACT
XML-based documents play a major role in modern information architectures and their corresponding workflows. In this context, the ability to identify and represent differences between two versions of a document is essential. Several approaches to finding the differences between XML documents have already been proposed. Typically, they are based on tree-to-tree correction, or sequence alignment. Most of these algorithms, however, are too slow and do not support the subsequent merging of changes. In this paper, we present a differencing algorithm tailored to ordered XML documents, called DocTreeDiff. It relies on our context-oriented XML versioning model which allows for document merging, presented in earlier work. An empiric evaluation demonstrates the efficiency of our approach as well as the high quality of the generated deltas.
- L. Bergroth, H. Hakonen, and T. Raita. A survey of longest common subsequence algorithms. In SPIRE '00: Proceedings of the 7th International Symposium on String Processing Information Retrieval, page 39, Washington, DC, USA, 2000. IEEE Computer Society. Google Scholar
Digital Library
- M. Bernard, L. Boyer, A. Habrard, and M. Sebban. Learning probabilistic models of tree edit distance. Pattern Recogn., 41(8):2611--2629, 2008. Google Scholar
Digital Library
- P. Bille. A survey on tree edit distance and related problems. Theor. Comput. Sci., 337(1-3):217--239, 2005. Google Scholar
Digital Library
- M. Brauer, R. Weir, and M. McRae. OpenDocument v1.1 specification, 2007.Google Scholar
- S. S. Chawathe and H. Garcia--Molina. Meaningful change detection in structured data. SIGMOD Rec., 26(2):26--37, 1997. Google Scholar
Digital Library
- S. S. Chawathe, A. Rajaraman, H. Garcia-Molina, and J. Widom. Change detection in hierarchically structured information. In SIGMOD '96: Proceedings of the 1996 ACM SIGMOD conference on Management of data, pages 493--504, New York, NY, USA, 1996. ACM. Google Scholar
Digital Library
- W. Chen. New algorithm for ordered tree-to-tree correction problem. Journal of Algorithms, 40(2):135 -- 158, 2001. Google Scholar
Digital Library
- G. Cobéna, S. Abiteboul, and A. Marian. Detecting Changes in XML Documents. In Proceedings of the 18th International Conference on Data Engineering, 26 February -- 1 March 2002, San Jose, CA, pages 41--52. IEEE Computer Society, 2002. Google Scholar
Digital Library
- S. DeRose and J. Clark. XML path language (XPath) version 1.0. W3C recommendation, W3C, Nov. 1999. http://www.w3.org/TR/1999/REC-xpath-19991116.Google Scholar
- R. L. Fontaine. Merging xml files: a new approach providing intelligent merge of xml data sets. In Proceedings of XML Europe 2002, 2002.Google Scholar
- K.-S. Huang, C.-B. Yang, K.-T. Tseng, H.-Y. Ann, and Y.-H. Peng. Efficient algorithms for finding interleaving relationship between sequences. Information Processing Letters, 105(5):188 -- 193, 2008. Google Scholar
Digital Library
- J. Jansson and A. Lingas. A fast algorithm for optimal alignment between similar ordered trees. In CPM '01: Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching, pages 232--240, London, UK, 2001. Springer-Verlag. Google Scholar
Digital Library
- T. Jiang, L. Wang, and K. Zhang. Alignment of trees -- an alternative to tree edit. Theoretical Computer Science, 143(1):137 -- 148, 1995. Google Scholar
Digital Library
- T. Lindholm. A three-way merge for xml documents. In DocEng '04: Proceedings of the 2004 ACM symposium on Document engineering, pages 1--10, New York, NY, USA, 2004. ACM. Google Scholar
Digital Library
- T. Lindholm, J. Kangasharju, and S. Tarkoma. Fast and simple xml tree differencing by sequence alignment. In DocEng '06: Proceedings of the 2006 ACM symposium on Document engineering, pages 75--84, New York, NY, USA, 2006. ACM. Google Scholar
Digital Library
- E. W. Myers. An o(nd) difference algorithm and its variations. Algorithmica, 1:251--266, 1986.Google Scholar
Cross Ref
- J. Paoli, I. Valet-Harper, A. Farquhar, and I. Sebestyen. ECMA-376 Office Open XML File Formats, 2006.Google Scholar
- S. Pemberton. XHTML™ 1.0 the extensible hypertext markup language (second edition). W3C recommendation, W3C, Aug. 2002. http://www.w3.org/TR/2002/REC-xhtml1-20020801.Google Scholar
- M. O. Rabin. Fingerprinting by random polynomials. Technical Report TR-CSE-03-01, Center for Research in Computing Technology, Harvard University, 1981.Google Scholar
- S. Rönnau and U. M. Borghoff. Versioning xml-based office documents. Multimedia Tools and Applications, 43(3):253--274, 2009. Google Scholar
Digital Library
- S. Rönnau, C. Pauli, and U. M. Borghoff. Merging changes in xml documents using reliable context fingerprints. In DocEng '08: Proceeding of the eighth ACM symposium on Document engineering, pages 52--61, New York, NY, USA, 2008. ACM. Google Scholar
Digital Library
- S. Rönnau, J. Scheffczyk, and U. M. Borghoff. Towards xml version control of office documents. In DocEng '05: Proceedings of the 2005 ACM symposium on Document engineering, pages 10--19, New York, NY, USA, 2005. ACM. Google Scholar
Digital Library
- L. A. Rosado, A. P. Márquez, and J. M. Gil. Managing branch versioning in versioned/temporal xml documents. In XSym 2007: Proceedings of 5th International XML Database Symposium, pages 107--121, 2007. Google Scholar
Digital Library
- S. M. Selkow. The tree-to-tree editing problem. Inf. Process. Lett., 6(6):184--186, 1977.Google Scholar
Cross Ref
- K.-C. Tai. The tree-to-tree correction problem. J. ACM, 26(3):422--433, 1979. Google Scholar
Digital Library
- G. Valiente. An efficient bottom-up distance between trees. In SPIRE, pages 212--219, 2001.Google Scholar
- R. A. Wagner and M. J. Fischer. The string-to-string correction problem. J. ACM, 21(1):168--173, 1974. Google Scholar
Digital Library
- K. Zhang. Efficient parallel algorithms for tree editing problems. In CPM '96: Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching, pages 361--372, London, UK, 1996. Springer-Verlag. Google Scholar
Digital Library
- K. Zhang and D. Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput., 18(6):1245--1262, 1989. Google Scholar
Digital Library
- K. Zhang, J. T.-L. Wang, and D. Shasha. On the editing distance between undirected acyclic graphs and related problems. In CPM '95: Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching, pages 395--407, 1995.Google Scholar
Cross Ref
Index Terms
- Efficient change control of XML documents
Recommendations
Merging changes in XML documents using reliable context fingerprints
DocEng '08: Proceedings of the eighth ACM symposium on Document engineeringDifferent dialects of XML have emerged as ubiquitous document exchange formats. For effective collaboration based on such documents, the capability to propagate edit operations performed on a document is indispensable. In order to avoid the transmission ...
Towards XML version control of office documents
DocEng '05: Proceedings of the 2005 ACM symposium on Document engineeringOffice applications such as OpenOffice and Microsoft Office are widely used to edit the majority of today's business documents: office documents. Usually, version control systems consider office documents as binary objects, thus severely hindering ...
Efficient and reliable merging of XML documents
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementMany knowledge-based processes rely on XML-based office documents. Up to now, versioning and merging XML documents was a difficult and error-prone task, mostly done manually. The support by tools is still in its infancy. We have presented a novel ...





Comments