skip to main content
research-article
Open Access

An efficient algorithm for type-safe structural diffing

Published:26 July 2019Publication History
Skip Abstract Section

Abstract

Effectively computing the difference between two version of a source file has become an indispensable part of software development. The de facto standard tool used by most version control systems is the UNIX diff utility, that compares two files on a line-by-line basis without any regard for the structure of the data stored in these files.

This paper presents an alternative datatype generic algorithm for computing the difference between two values of any algebraic datatype. This algorithm maximizes sharing between the source and target trees, while still running in linear time.

Finally, this paper demonstrates that by instantiating this algorithm to the Lua abstract syntax tree and mining the commit history of repositories found on GitHub, the resulting patches can often be merged automatically, even when existing technology has failed.

Skip Supplemental Material Section

Supplemental Material

a113-miraldo.webm

References

  1. Tatsuya Akutsu, Daiji Fukagawa, and Atsuhiro Takasu. 2010. Approximating Tree Edit Distance through String Edit Distance. Algorithmica 57, 2 (2010), 325–348.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Carlo Angiuli, Edward Morehouse, Daniel R. Licata, and Robert Harper. 2014. Homotopical Patch Theory. In Proceedings of the 19th ACM SIGPLAN International Conference on Functional Programming (ICFP ’14). ACM, New York, NY, USA, 243–256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Dimitar Asenov, Balz Guenat, Peter Müller, and Martin Otth. 2017. Precise Version Control of Trees with Line-Based Version Control Systems. In Proceedings of the 20th International Conference on Fundamental Approaches to Software Engineering -Volume 10202. Springer-Verlag New York, Inc., New York, NY, USA, 152–169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Bergroth, H. Hakonen, and T. Raita. 2000. A survey of longest common subsequence algorithms. In String Processing and Information Retrieval, 2000. SPIRE 2000. Proceedings. Seventh International Symposium on. 39–48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Philip Bille. 2005. A survey on tree edit distance and related problems. Theor. Comput. Sci 337 (2005), 217–239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Peter Brass. 2008. Advanced Data Structures (1 ed.). Cambridge University Press, New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Johannes Bubenzer. 2014. Cycle-aware minimization of acyclic deterministic finite-state automata. Discrete Applied Mathematics 163 (2014), 238 – 246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Edsko de Vries and Andres Löh. 2014. True Sums of Products. In Proceedings of the 10th ACM SIGPLAN Workshop on Generic Programming (WGP ’14). ACM, New York, NY, USA, 83–94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Erik D. Demaine, Shay Mozes, Benjamin Rossman, and Oren Weimann. 2007. An Optimal Decomposition Algorithm for Tree Edit Distance. In Proceedings of the 34th International Colloquium on Automata, Languages and Programming (ICALP 2007). Wroclaw, Poland, 146–157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Richard A. Eisenberg and Stephanie Weirich. 2012. Dependently Typed Programming with Singletons. In Proceedings of the 2012 Haskell Symposium (Haskell ’12). ACM, New York, NY, USA, 117–130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. Fine-grained and accurate source code differencing. In ACM/IEEE International Conference on Automated Software Engineering, ASE ’14, Vasteras, Sweden - September 15 - 19, 2014. 313–324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jean-Christophe Filliâtre and Sylvain Conchon. 2006. Type-safe Modular Hash-consing. In Proceedings of the 2006 Workshop on ML (ML ’06). ACM, New York, NY, USA, 12–19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. GHC Trac. 2018. Memory usage exploding for complex pattern matching. (2018). https://ghc.haskell.org/trac/ghc/ticket/ 14987#no2 .Google ScholarGoogle Scholar
  14. Gérard Huet. 1994. Residual theory in λ-calculus: a formal development. Journal of Functional Programming 4, 3 (1994), 371âĂŞ394.Google ScholarGoogle ScholarCross RefCross Ref
  15. J. W. Hunt and M. D. McIlroy. 1976. An Algorithm for Differential File Comparison. Technical Report CSTR 41. Bell Laboratories, Murray Hill, NJ.Google ScholarGoogle Scholar
  16. Roberto Ierusalimschy, Luiz Henrique de Figueiredo, and Waldemar Celes Filho. 1996. LuaâĂŤAn Extensible Extension Language. Software: Practice and Experience 26, 6 (1996), 635–652. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Philip N. Klein. 1998. Computing the Edit-Distance Between Unrooted Ordered Trees. In Proceedings of the 6th Annual European Symposium on Algorithms (ESA ’98). Springer-Verlag, London, UK, UK, 91–102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Donald E. Knuth. 1990. The Genesis of Attribute Grammars. In Proceedings of the International Conference WAGA on Attribute Grammars and Their Applications. Springer-Verlag, London, UK, UK, 1–12. http://dl.acm.org/citation.cfm?id=645938. 671208 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Eelco Lempsink, Sean Leather, and Andres Löh. 2009. Type-safe Diff for Families of Datatypes. In Proceedings of the 2009 ACM SIGPLAN Workshop on Generic Programming (WGP ’09). ACM, New York, NY, USA, 61–72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Markus Lohrey. 2015. Grammar-Based Tree Compression. In Developments in Language Theory, Igor Potapov (Ed.). Springer International Publishing, Cham, 46–57.Google ScholarGoogle Scholar
  21. Paul van Oorschot Menezes A. J. and Scott A. Vanstone. {n. d.}. Handbook of Applied Cryptography (boca raton, xiii, 780, 1997 ed.). CRC Press.Google ScholarGoogle Scholar
  22. Ralph C. Merkle. 1988. A Digital Signature Based on a Conventional Encryption Function. In Advances in Cryptology — CRYPTO ’87, Carl Pomerance (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 369–378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Andrew Miller, Michael Hicks, Jonathan Katz, and Elaine Shi. 2014. Authenticated Data Structures, Generically. In Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’14). ACM, New York, NY, USA, 411–423. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Samuel Mimram and Cinzia Di Giusto. 2013. A Categorical Theory of Patches. CoRR abs/1311.3903 (2013). arXiv: 1311.3903 http://arxiv.org/abs/1311.3903 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Victor Cacciari Miraldo, Harold Carr, Alex Kogan, Mark Moir, and Maurice Herlihy. 2018. Authenticated Modular Maps in Haskell. In Proceedings of the 3rd ACM SIGPLAN International Workshop on Type-Driven Development (TyDe 2018). ACM, New York, NY, USA, 1–13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Victor Cacciari Miraldo, Pierre-Évariste Dagand, and Wouter Swierstra. 2017. Type-directed diffing of structured data. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Type-Driven Development. ACM, 2–15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Victor Cacciari Miraldo and Alejandro Serrano. 2018. Sums of products for mutually recursive datatypes: the appropriationistâĂŹs view on generic programming. In Proceedings of the 3rd ACM SIGPLAN International Workshop on Type-Driven Development. ACM, 65–77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. David Roundy. 2005. Darcs: Distributed Version Management in Haskell. In Proceedings of the 2005 ACM SIGPLAN Workshop on Haskell (Haskell ’05). ACM, New York, NY, USA, 1–4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Alejandro Serrano and Victor Cacciari Miraldo. 2018. Generic Programming of All Kinds. In Proceedings of the 11th ACM SIGPLAN International Symposium on Haskell (Haskell 2018). ACM, New York, NY, USA, 41–54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Wouter Swierstra and Andres Löh. 2014. The Semantics of Version Control. In Proceedings of the 2014 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software (Onward! ’14). 43–54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Kuo-Chung Tai. 1979. The Tree-to-Tree Correction Problem. J. ACM 26, 3 (July 1979), 422–433. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Marco Vassena. 2016. Generic Diff3 for Algebraic Datatypes. In Proceedings of the 1st International Workshop on Type-Driven Development (TyDe 2016). ACM, New York, NY, USA, 62–71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Tim A. Wagner and Susan L. Graham. 1998. Efficient and Flexible Incremental Parsing. ACM Trans. Program. Lang. Syst. 20, 5 (Sept. 1998), 980–1013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Brent A. Yorgey, Stephanie Weirich, Julien Cretin, Simon Peyton Jones, Dimitrios Vytiniotis, and José Pedro Magalhães. 2012. Giving Haskell a Promotion. In Proceedings of the 8th ACM SIGPLAN Workshop on Types in Language Design and Implementation (TLDI ’12). ACM, New York, NY, USA, 53–66. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An efficient algorithm for type-safe structural diffing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the ACM on Programming Languages
      Proceedings of the ACM on Programming Languages  Volume 3, Issue ICFP
      August 2019
      1054 pages
      EISSN:2475-1421
      DOI:10.1145/3352468
      Issue’s Table of Contents

      Copyright © 2019 Owner/Author

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 July 2019
      Published in pacmpl Volume 3, Issue ICFP

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!