skip to main content

Efficient Dual-Numbers Reverse AD via Well-Known Program Transformations

Published:11 January 2023Publication History
Skip Abstract Section

Abstract

Where dual-numbers forward-mode automatic differentiation (AD) pairs each scalar value with its tangent value, dual-numbers reverse-mode AD attempts to achieve reverse AD using a similarly simple idea: by pairing each scalar value with a backpropagator function. Its correctness and efficiency on higher-order input languages have been analysed by Brunel, Mazza and Pagani, but this analysis used a custom operational semantics for which it is unclear whether it can be implemented efficiently. We take inspiration from their use of linear factoring to optimise dual-numbers reverse-mode AD to an algorithm that has the correct complexity and enjoys an efficient implementation in a standard functional language with support for mutable arrays, such as Haskell. Aside from the linear factoring ingredient, our optimisation steps consist of well-known ideas from the functional programming community. We demonstrate the use of our technique by providing a practical implementation that differentiates most of Haskell98.

References

  1. Martín Abadi and Gordon D. Plotkin. 2020. A simple differentiable programming language. Proc. ACM Program. Lang., 4, POPL (2020), 38:1–38:28. https://doi.org/10.1145/3371106 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Atilim Gunes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. 2017. Automatic Differentiation in Machine Learning: a Survey. J. Mach. Learn. Res., 18 (2017), 153:1–153:43. http://jmlr.org/papers/v18/17-468.html Google ScholarGoogle Scholar
  3. Jean-Philippe Bernardy, Mathieu Boespflug, Ryan R. Newton, Simon Peyton Jones, and Arnaud Spiwack. 2018. Linear Haskell: practical linearity in a higher-order polymorphic language. Proc. ACM Program. Lang., 2, POPL (2018), 5:1–5:29. https://doi.org/10.1145/3158093 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Aloïs Brunel, Damiano Mazza, and Michele Pagani. 2020. Backpropagation in the simply typed lambda-calculus with linear negation. Proc. ACM Program. Lang., 4, POPL (2020), 64:1–64:27. https://doi.org/10.1145/3371132 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Paulo Emílio de Vilhena and François Pottier. 2021. Verifying a Minimalist Reverse-Mode AD Library. arXiv preprint arXiv:2112.07292. Google ScholarGoogle Scholar
  6. Conal Elliott. 2018. The simple essence of automatic differentiation. Proc. ACM Program. Lang., 2, ICFP (2018), 70:1–70:29. https://doi.org/10.1145/3236765 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Andreas Griewank and Andrea Walther. 2008. Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition. SIAM. isbn:978-0-89871-659-7 https://doi.org/10.1137/1.9780898717761 Google ScholarGoogle ScholarCross RefCross Ref
  8. R. John M. Hughes. 1986. A Novel Representation of Lists and its Application to the Function "reverse". Inf. Process. Lett., 22, 3 (1986), 141–144. https://doi.org/10.1016/0020-0190(86)90059-1 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Mathieu Huot, Sam Staton, and Matthijs Vákár. 2020. Correctness of Automatic Differentiation via Diffeologies and Categorical Gluing. In Foundations of Software Science and Computation Structures - 23rd International Conference, FOSSACS 2020, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020, Dublin, Ireland, April 25-30, 2020, Proceedings, Jean Goubault-Larrecq and Barbara König (Eds.) (Lecture Notes in Computer Science, Vol. 12077). Springer, 319–338. https://doi.org/10.1007/978-3-030-45231-5_17 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Edward Kmett and contributors. 2021. ad: Automatic Differentiation. https://hackage.haskell.org/package/ad Google ScholarGoogle Scholar
  11. Faustyna Krawiec, Simon Peyton Jones, Neel Krishnaswami, Tom Ellis, Richard A. Eisenberg, and Andrew W. Fitzgibbon. 2022. Provably correct, asymptotically efficient, higher-order reverse-mode automatic differentiation. Proc. ACM Program. Lang., 6, POPL (2022), 1–30. https://doi.org/10.1145/3498710 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. John Launchbury and Simon L. Peyton Jones. 1994. Lazy Functional State Threads. In Proceedings of the ACM SIGPLAN’94 Conference on Programming Language Design and Implementation (PLDI), Orlando, Florida, USA, June 20-24, 1994, Vivek Sarkar, Barbara G. Ryder, and Mary Lou Soffa (Eds.). ACM, 24–35. https://doi.org/10.1145/178243.178246 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Seppo Linnainmaa. 1970. The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master’s Thesis (in Finnish), Univ. Helsinki. Google ScholarGoogle Scholar
  14. Charles C. Margossian. 2019. A review of automatic differentiation and its efficient implementation. Wiley Interdiscip. Rev. Data Min. Knowl. Discov., 9, 4 (2019), https://doi.org/10.1002/widm.1305 Google ScholarGoogle ScholarCross RefCross Ref
  15. Damiano Mazza and Michele Pagani. 2021. Automatic differentiation in PCF. Proc. ACM Program. Lang., 5, POPL (2021), 1–27. https://doi.org/10.1145/3434309 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Fernando Lucatelli Nunes and Matthijs Vákár. 2021. CHAD for Expressive Total Languages. CoRR, abs/2110.00446 (2021), arXiv:2110.00446. arxiv:2110.00446 Google ScholarGoogle Scholar
  17. Fernando Lucatelli Nunes and Matthijs Vákár. 2022. Automatic Differentiation for ML-family languages: correctness via logical relations. CoRR, abs/2210.07724 (2022), arXiv:2210.07724. arxiv:2210.07724 Google ScholarGoogle Scholar
  18. Fernando Lucatelli Nunes and Matthijs Vákár. 2022. Logical Relations for Partial Features and Automatic Differentiation Correctness. CoRR, abs/2210.08530 (2022), arXiv:2210.08530. arxiv:2210.08530 Google ScholarGoogle Scholar
  19. Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In NIPS 2017 Autodiff Workshop: The future of gradient-based machine learning software and techniques. Curran Associates, Inc., Red Hook, NY, USA. Google ScholarGoogle Scholar
  20. Adam Paszke, Daniel Johnson, David Duvenaud, Dimitrios Vytiniotis, Alexey Radul, Matthew Johnson, Jonathan Ragan-Kelley, and Dougal Maclaurin. 2021. Getting to the Point. Index Sets and Parallelism-Preserving Autodiff for Pointful Array Programming. CoRR, abs/2104.05372 (2021), arxiv:2104.05372. Google ScholarGoogle Scholar
  21. Barak A. Pearlmutter and Jeffrey Mark Siskind. 2008. Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator. ACM Trans. Program. Lang. Syst., 30, 2 (2008), 7:1–7:36. https://doi.org/10.1145/1330017.1330018 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. John C. Reynolds. 1998. Definitional Interpreters for Higher-Order Programming Languages. High. Order Symb. Comput., 11, 4 (1998), 363–397. https://doi.org/10.1023/A:1010027404223 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Robert Schenck, Ola Rønning, Troels Henriksen, and Cosmin E. Oancea. 2022. AD for an Array Language with Nested Parallelism. CoRR, abs/2202.10297 (2022), arXiv:2202.10297. arxiv:2202.10297 Google ScholarGoogle Scholar
  24. Amir Shaikhha, Andrew Fitzgibbon, Dimitrios Vytiniotis, and Simon Peyton Jones. 2019. Efficient differentiable programming in a functional array-processing language. Proc. ACM Program. Lang., 3, ICFP (2019), 97:1–97:30. https://doi.org/10.1145/3341701 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Tim Sheard and Simon L. Peyton Jones. 2002. Template meta-programming for Haskell. ACM SIGPLAN Notices, 37, 12 (2002), 60–75. https://doi.org/10.1145/636517.636528 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jesse Sigal. 2021. Automatic differentiation via effects and handlers: An implementation in Frank. arXiv preprint arXiv:2101.08095. Google ScholarGoogle Scholar
  27. Tom Smeding and Matthijs Vákár. 2022. Artifact for Efficient Dual-Numbers Reverse AD via Well-Known Program Transformations. https://doi.org/10.5281/zenodo.7130343 Artifact for this publication Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Tom Smeding and Matthijs Vákár. 2022. Efficient Dual-Numbers Reverse AD via Well-Known Program Transformations. CoRR, abs/2207.03418v2 (2022), https://doi.org/10.48550/arXiv.2207.03418 arXiv:2207.03418v2. Google ScholarGoogle Scholar
  29. B. Speelpenning. 1980. Compiling fast partial derivatives of functions given by algorithms. Illinois University. https://doi.org/10.2172/5254402 Google ScholarGoogle ScholarCross RefCross Ref
  30. Matthijs Vákár. 2021. Reverse AD at Higher Types: Pure, Principled and Denotationally Correct. In Programming Languages and Systems, Nobuko Yoshida (Ed.) (Lecture Notes in Computer Science, Vol. 12648). Springer, 607–634. https://doi.org/10.1007/978-3-030-72019-3_22 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Matthijs Vákár and Tom Smeding. 2022. CHAD: Combinatory Homomorphic Automatic Differentiation. ACM Trans. Program. Lang. Syst., 44, 3, 20:1–20:49. https://doi.org/10.1145/3527634 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Dimitrios Vytiniotis, Dan Belov, Richard Wei, Gordon Plotkin, and Martin Abadi. 2019. The differentiable curry. NeurIPS Workshop on Program Transformations. Google ScholarGoogle Scholar
  33. Fei Wang and Tiark Rompf. 2018. A Language and Compiler View on Differentiable Programming. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Workshop Track Proceedings. OpenReview.net. https://openreview.net/forum?id=SJxJtYkPG Google ScholarGoogle Scholar
  34. R. E. Wengert. 1964. A simple automatic derivative evaluation program. Commun. ACM, 7, 8 (1964), 463–464. https://doi.org/10.1145/355586.364791 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient Dual-Numbers Reverse AD via Well-Known Program Transformations

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Article Metrics

          • Downloads (Last 12 months)157
          • Downloads (Last 6 weeks)17

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!