skip to main content
research-article
Open Access

You Only Linearize Once: Tangents Transpose to Gradients

Published:11 January 2023Publication History
Skip Abstract Section

Abstract

Automatic differentiation (AD) is conventionally understood as a family of distinct algorithms, rooted in two “modes”—forward and reverse—which are typically presented (and implemented) separately. Can there be only one? Following up on the AD systems developed in the JAX and Dex projects, we formalize a decomposition of reverse-mode AD into (i) forward-mode AD followed by (ii) unzipping the linear and non-linear parts and then (iii) transposition of the linear part.

To that end, we define a (substructurally) linear type system that can prove a class of functions are (algebraically) linear. Our main results are that forward-mode AD produces such linear functions, and that we can unzip and transpose any such linear function, conserving cost, size, and linearity. Composing these three transformations recovers reverse-mode AD. This decomposition also sheds light on checkpointing, which emerges naturally from a free choice in unzipping let expressions. As a corollary, checkpointing techniques are applicable to general-purpose partial evaluation, not just AD.

We hope that our formalization will lead to a deeper understanding of automatic differentiation and that it will simplify implementations, by separating the concerns of differentiation proper from the concerns of gaining efficiency (namely, separating the derivative computation from the act of running it backward).

References

  1. Martín Abadi and Gordon D. Plotkin. 2019. A Simple Differentiable Programming Language. Proc. ACM Program. Lang., 4, POPL (2019), Article 38, dec, 28 pages. https://doi.org/10.1145/3371106 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. 2018. JAX: composable transformations of Python+NumPy programs. http://github.com/google/jax Google ScholarGoogle Scholar
  3. Aloïs Brunel, Damiano Mazza, and Michele Pagani. 2019. Backpropagation in the Simply Typed Lambda-Calculus with Linear Negation. Proc. ACM Program. Lang., 4, POPL (2019), Article 64, dec, 27 pages. https://doi.org/10.1145/3371132 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Conal Elliott. 2018. The Simple Essence of Automatic Differentiation. Proc. ACM Program. Lang., 2, ICFP (2018), Article 70, jul, 29 pages. https://doi.org/10.1145/3236765 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Conal M. Elliott. 2009. Beautiful Differentiation. In Proceedings of the 14th ACM SIGPLAN International Conference on Functional Programming (ICFP ’09). Association for Computing Machinery, New York, NY, USA. 191–202. isbn:9781605583327 https://doi.org/10.1145/1596550.1596579 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Roy Frostig, Matthew Johnson, and Chris Leary. 2018. Compiling machine learning programs via high-level tracing. In Machine Learning and Systems (MLSys). https://mlsys.org/Conferences/doc/2018/146.pdf Google ScholarGoogle Scholar
  7. Roy Frostig, Matthew J. Johnson, Dougal Maclaurin, Adam Paszke, and Alexey Radul. 2021. Decomposing reverse-mode automatic differentiation. In LAFI: POPL workshop on Languages for Inference. Association for Computing Machinery, New York, NY, USA. arxiv:2105.09469 Google ScholarGoogle Scholar
  8. Yoshihiko Futamura. 1983. Partial computation of programs. In RIMS Symposia on Software Science and Engineering, Eiichi Goto, Koichi Furukawa, Reiji Nakajima, Ikuo Nakata, and Akinori Yonezawa (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 1–35. isbn:978-3-540-39442-6 https://doi.org/10.1007/3-540-11980-9_13 Google ScholarGoogle ScholarCross RefCross Ref
  9. Andreas Griewank and Andrea Walther. 2008. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation (second ed.). Society for Industrial and Applied Mathematics, USA. isbn:0898716594 https://doi.org/10.1137/1.9780898717761 Google ScholarGoogle ScholarCross RefCross Ref
  10. Mathieu Huot, Sam Staton, and Matthijs Vákár. 2020. Correctness of Automatic Differentiation via Diffeologies and Categorical Gluing. In Foundations of Software Science and Computation Structures - 23rd International Conference, FOSSACS 2020, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020, Dublin, Ireland, April 25-30, 2020, Proceedings, Jean Goubault-Larrecq and Barbara König (Eds.) (Lecture Notes in Computer Science, Vol. 12077). Springer, 319–338. https://doi.org/10.1007/978-3-030-45231-5_17 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Neil D. Jones, Carsten K. Gomard, and Peter Sestoft. 1993. Partial Evaluation and Automatic Program Generation. Prentice-Hall, Inc., USA. isbn:0130202495 Google ScholarGoogle Scholar
  12. Jerzy Karczmarczuk. 1998. Functional Differentiation of Computer Programs. In Proceedings of the Third ACM SIGPLAN International Conference on Functional Programming (ICFP ’98). Association for Computing Machinery, New York, NY, USA. 195–203. isbn:1581130244 https://doi.org/10.1145/289423.289442 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Faustyna Krawiec, Simon Peyton Jones, Neel Krishnaswami, Tom Ellis, Richard A. Eisenberg, and Andrew Fitzgibbon. 2022. Provably Correct, Asymptotically Efficient, Higher-Order Reverse-Mode Automatic Differentiation. Proc. ACM Program. Lang., 6, POPL (2022), Article 48, jan, 30 pages. https://doi.org/10.1145/3498710 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Damiano Mazza and Michele Pagani. 2021. Automatic Differentiation in PCF. Proc. ACM Program. Lang., 5, POPL (2021), Article 28, jan, 27 pages. https://doi.org/10.1145/3434309 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Adam Paszke, Daniel D. Johnson, David Duvenaud, Dimitrios Vytiniotis, Alexey Radul, Matthew J. Johnson, Jonathan Ragan-Kelley, and Dougal Maclaurin. 2021. Getting to the Point: Index Sets and Parallelism-Preserving Autodiff for Pointful Array Programming. Proc. ACM Program. Lang., 5, ICFP (2021), Article 88, aug, 29 pages. https://doi.org/10.1145/3473593 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Barak A Pearlmutter and Jeffrey Mark Siskind. 2008. Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator. ACM Transactions on Programming Languages and Systems (TOPLAS), 30, 2 (2008), 1–36. https://doi.org/10.1145/1330017.1330018 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Barak A. Pearlmutter and Jeffrey Mark Siskind. 2008. Reverse-Mode AD in a Functional Framework: Lambda the Ultimate Backpropagator. ACM Trans. Program. Lang. Syst., 30, 2 (2008), Article 7, mar, 36 pages. issn:0164-0925 https://doi.org/10.1145/1330017.1330018 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Dan Piponi. 2004. Automatic Differentiation, C++ Templates, and Photogrammetry. Journal of Graphics Tools, 9, 4 (2004), 41–55. https://doi.org/10.1080/10867651.2004.10504901 Google ScholarGoogle ScholarCross RefCross Ref
  19. Dan Piponi. 2009. Two Tricks for the Price of One: Linear Filters and Their Transposes. J. Graphics, GPU, & Game Tools, 14, 1 (2009), 63–72. https://doi.org/10.1080/2151237X.2009.10129275 Google ScholarGoogle ScholarCross RefCross Ref
  20. Amr Sabry and Matthias Felleisen. 1992. Reasoning about Programs in Continuation-Passing Style.. SIGPLAN Lisp Pointers, V, 1 (1992), jan, 288–298. issn:1045-3563 https://doi.org/10.1145/141478.141563 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Andrea Walther, Andreas Griewank, and Olaf Vogel. 2003. ADOL-C: Automatic Differentiation Using Operator Overloading in C++. PAMM, 2, 1 (2003), 41–44. https://doi.org/10.1002/pamm.200310011 Google ScholarGoogle ScholarCross RefCross Ref
  22. Fei Wang, Daniel Zheng, James Decker, Xilun Wu, Grégory M. Essertel, and Tiark Rompf. 2019. Demystifying Differentiable Programming: Shift/Reset the Penultimate Backpropagator. Proc. ACM Program. Lang., 3, ICFP (2019), Article 96, jul, 31 pages. https://doi.org/10.1145/3341700 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. You Only Linearize Once: Tangents Transpose to Gradients

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the ACM on Programming Languages
      Proceedings of the ACM on Programming Languages  Volume 7, Issue POPL
      January 2023
      2196 pages
      EISSN:2475-1421
      DOI:10.1145/3554308
      • Editor:
      Issue’s Table of Contents

      Copyright © 2023 Owner/Author

      This work is licensed under a Creative Commons Attribution 4.0 International License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 January 2023
      Published in pacmpl Volume 7, Issue POPL

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader