Abstract
Effectively computing the difference between two version of a source file has become an indispensable part of software development. The de facto standard tool used by most version control systems is the UNIX diff utility, that compares two files on a line-by-line basis without any regard for the structure of the data stored in these files.
This paper presents an alternative datatype generic algorithm for computing the difference between two values of any algebraic datatype. This algorithm maximizes sharing between the source and target trees, while still running in linear time.
Finally, this paper demonstrates that by instantiating this algorithm to the Lua abstract syntax tree and mining the commit history of repositories found on GitHub, the resulting patches can often be merged automatically, even when existing technology has failed.
Supplemental Material
- Tatsuya Akutsu, Daiji Fukagawa, and Atsuhiro Takasu. 2010. Approximating Tree Edit Distance through String Edit Distance. Algorithmica 57, 2 (2010), 325–348.Google Scholar
Digital Library
- Carlo Angiuli, Edward Morehouse, Daniel R. Licata, and Robert Harper. 2014. Homotopical Patch Theory. In Proceedings of the 19th ACM SIGPLAN International Conference on Functional Programming (ICFP ’14). ACM, New York, NY, USA, 243–256. Google Scholar
Digital Library
- Dimitar Asenov, Balz Guenat, Peter Müller, and Martin Otth. 2017. Precise Version Control of Trees with Line-Based Version Control Systems. In Proceedings of the 20th International Conference on Fundamental Approaches to Software Engineering -Volume 10202. Springer-Verlag New York, Inc., New York, NY, USA, 152–169. Google Scholar
Digital Library
- L. Bergroth, H. Hakonen, and T. Raita. 2000. A survey of longest common subsequence algorithms. In String Processing and Information Retrieval, 2000. SPIRE 2000. Proceedings. Seventh International Symposium on. 39–48. Google Scholar
Digital Library
- Philip Bille. 2005. A survey on tree edit distance and related problems. Theor. Comput. Sci 337 (2005), 217–239. Google Scholar
Digital Library
- Peter Brass. 2008. Advanced Data Structures (1 ed.). Cambridge University Press, New York, NY, USA. Google Scholar
Digital Library
- Johannes Bubenzer. 2014. Cycle-aware minimization of acyclic deterministic finite-state automata. Discrete Applied Mathematics 163 (2014), 238 – 246. Google Scholar
Digital Library
- Edsko de Vries and Andres Löh. 2014. True Sums of Products. In Proceedings of the 10th ACM SIGPLAN Workshop on Generic Programming (WGP ’14). ACM, New York, NY, USA, 83–94. Google Scholar
Digital Library
- Erik D. Demaine, Shay Mozes, Benjamin Rossman, and Oren Weimann. 2007. An Optimal Decomposition Algorithm for Tree Edit Distance. In Proceedings of the 34th International Colloquium on Automata, Languages and Programming (ICALP 2007). Wroclaw, Poland, 146–157. Google Scholar
Digital Library
- Richard A. Eisenberg and Stephanie Weirich. 2012. Dependently Typed Programming with Singletons. In Proceedings of the 2012 Haskell Symposium (Haskell ’12). ACM, New York, NY, USA, 117–130. Google Scholar
Digital Library
- Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. Fine-grained and accurate source code differencing. In ACM/IEEE International Conference on Automated Software Engineering, ASE ’14, Vasteras, Sweden - September 15 - 19, 2014. 313–324. Google Scholar
Digital Library
- Jean-Christophe Filliâtre and Sylvain Conchon. 2006. Type-safe Modular Hash-consing. In Proceedings of the 2006 Workshop on ML (ML ’06). ACM, New York, NY, USA, 12–19. Google Scholar
Digital Library
- GHC Trac. 2018. Memory usage exploding for complex pattern matching. (2018). https://ghc.haskell.org/trac/ghc/ticket/ 14987#no2 .Google Scholar
- Gérard Huet. 1994. Residual theory in λ-calculus: a formal development. Journal of Functional Programming 4, 3 (1994), 371âĂŞ394.Google Scholar
Cross Ref
- J. W. Hunt and M. D. McIlroy. 1976. An Algorithm for Differential File Comparison. Technical Report CSTR 41. Bell Laboratories, Murray Hill, NJ.Google Scholar
- Roberto Ierusalimschy, Luiz Henrique de Figueiredo, and Waldemar Celes Filho. 1996. LuaâĂŤAn Extensible Extension Language. Software: Practice and Experience 26, 6 (1996), 635–652. Google Scholar
Digital Library
- Philip N. Klein. 1998. Computing the Edit-Distance Between Unrooted Ordered Trees. In Proceedings of the 6th Annual European Symposium on Algorithms (ESA ’98). Springer-Verlag, London, UK, UK, 91–102. Google Scholar
Digital Library
- Donald E. Knuth. 1990. The Genesis of Attribute Grammars. In Proceedings of the International Conference WAGA on Attribute Grammars and Their Applications. Springer-Verlag, London, UK, UK, 1–12. http://dl.acm.org/citation.cfm?id=645938. 671208 Google Scholar
Digital Library
- Eelco Lempsink, Sean Leather, and Andres Löh. 2009. Type-safe Diff for Families of Datatypes. In Proceedings of the 2009 ACM SIGPLAN Workshop on Generic Programming (WGP ’09). ACM, New York, NY, USA, 61–72. Google Scholar
Digital Library
- Markus Lohrey. 2015. Grammar-Based Tree Compression. In Developments in Language Theory, Igor Potapov (Ed.). Springer International Publishing, Cham, 46–57.Google Scholar
- Paul van Oorschot Menezes A. J. and Scott A. Vanstone. {n. d.}. Handbook of Applied Cryptography (boca raton, xiii, 780, 1997 ed.). CRC Press.Google Scholar
- Ralph C. Merkle. 1988. A Digital Signature Based on a Conventional Encryption Function. In Advances in Cryptology — CRYPTO ’87, Carl Pomerance (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 369–378. Google Scholar
Digital Library
- Andrew Miller, Michael Hicks, Jonathan Katz, and Elaine Shi. 2014. Authenticated Data Structures, Generically. In Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’14). ACM, New York, NY, USA, 411–423. Google Scholar
Digital Library
- Samuel Mimram and Cinzia Di Giusto. 2013. A Categorical Theory of Patches. CoRR abs/1311.3903 (2013). arXiv: 1311.3903 http://arxiv.org/abs/1311.3903 Google Scholar
Digital Library
- Victor Cacciari Miraldo, Harold Carr, Alex Kogan, Mark Moir, and Maurice Herlihy. 2018. Authenticated Modular Maps in Haskell. In Proceedings of the 3rd ACM SIGPLAN International Workshop on Type-Driven Development (TyDe 2018). ACM, New York, NY, USA, 1–13. Google Scholar
Digital Library
- Victor Cacciari Miraldo, Pierre-Évariste Dagand, and Wouter Swierstra. 2017. Type-directed diffing of structured data. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Type-Driven Development. ACM, 2–15. Google Scholar
Digital Library
- Victor Cacciari Miraldo and Alejandro Serrano. 2018. Sums of products for mutually recursive datatypes: the appropriationistâĂŹs view on generic programming. In Proceedings of the 3rd ACM SIGPLAN International Workshop on Type-Driven Development. ACM, 65–77. Google Scholar
Digital Library
- David Roundy. 2005. Darcs: Distributed Version Management in Haskell. In Proceedings of the 2005 ACM SIGPLAN Workshop on Haskell (Haskell ’05). ACM, New York, NY, USA, 1–4. Google Scholar
Digital Library
- Alejandro Serrano and Victor Cacciari Miraldo. 2018. Generic Programming of All Kinds. In Proceedings of the 11th ACM SIGPLAN International Symposium on Haskell (Haskell 2018). ACM, New York, NY, USA, 41–54. Google Scholar
Digital Library
- Wouter Swierstra and Andres Löh. 2014. The Semantics of Version Control. In Proceedings of the 2014 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software (Onward! ’14). 43–54. Google Scholar
Digital Library
- Kuo-Chung Tai. 1979. The Tree-to-Tree Correction Problem. J. ACM 26, 3 (July 1979), 422–433. Google Scholar
Digital Library
- Marco Vassena. 2016. Generic Diff3 for Algebraic Datatypes. In Proceedings of the 1st International Workshop on Type-Driven Development (TyDe 2016). ACM, New York, NY, USA, 62–71. Google Scholar
Digital Library
- Tim A. Wagner and Susan L. Graham. 1998. Efficient and Flexible Incremental Parsing. ACM Trans. Program. Lang. Syst. 20, 5 (Sept. 1998), 980–1013. Google Scholar
Digital Library
- Brent A. Yorgey, Stephanie Weirich, Julien Cretin, Simon Peyton Jones, Dimitrios Vytiniotis, and José Pedro Magalhães. 2012. Giving Haskell a Promotion. In Proceedings of the 8th ACM SIGPLAN Workshop on Types in Language Design and Implementation (TLDI ’12). ACM, New York, NY, USA, 53–66. Google Scholar
Digital Library
Index Terms
An efficient algorithm for type-safe structural diffing
Recommendations
Concise, type-safe, and efficient structural diffing
PLDI 2021: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and ImplementationA structural diffing algorithm compares two pieces of tree-shaped data and computes their difference. Existing structural diffing algorithms either produce concise patches or ensure type safety, but never both. We present a new structural diffing ...
Forest: a language and toolkit for programming with filestores
ICFP '11A filestore is a structured collection of data files housed in a conventional hierarchical file system. Many applications use filestores as a poor-man's database, and the correct execution of these applications requires that the collection of files, ...
Type families with class, type classes with family
Haskell '15: Proceedings of the 2015 ACM SIGPLAN Symposium on HaskellType classes and type families are key ingredients in Haskell programming. Type classes were introduced to deal with ad-hoc polymorphism, although with the introduction of functional dependencies, their use expanded to type-level programming. Type ...






Comments