Abstract
Automatic differentiation (AD) is conventionally understood as a family of distinct algorithms, rooted in two “modes”—forward and reverse—which are typically presented (and implemented) separately. Can there be only one? Following up on the AD systems developed in the JAX and Dex projects, we formalize a decomposition of reverse-mode AD into (i) forward-mode AD followed by (ii) unzipping the linear and non-linear parts and then (iii) transposition of the linear part.
To that end, we define a (substructurally) linear type system that can prove a class of functions are (algebraically) linear. Our main results are that forward-mode AD produces such linear functions, and that we can unzip and transpose any such linear function, conserving cost, size, and linearity. Composing these three transformations recovers reverse-mode AD. This decomposition also sheds light on checkpointing, which emerges naturally from a free choice in unzipping let expressions. As a corollary, checkpointing techniques are applicable to general-purpose partial evaluation, not just AD.
We hope that our formalization will lead to a deeper understanding of automatic differentiation and that it will simplify implementations, by separating the concerns of differentiation proper from the concerns of gaining efficiency (namely, separating the derivative computation from the act of running it backward).
- Martín Abadi and Gordon D. Plotkin. 2019. A Simple Differentiable Programming Language. Proc. ACM Program. Lang., 4, POPL (2019), Article 38, dec, 28 pages. https://doi.org/10.1145/3371106
Google Scholar
Digital Library
- James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. 2018. JAX: composable transformations of Python+NumPy programs. http://github.com/google/jax
Google Scholar
- Aloïs Brunel, Damiano Mazza, and Michele Pagani. 2019. Backpropagation in the Simply Typed Lambda-Calculus with Linear Negation. Proc. ACM Program. Lang., 4, POPL (2019), Article 64, dec, 27 pages. https://doi.org/10.1145/3371132
Google Scholar
Digital Library
- Conal Elliott. 2018. The Simple Essence of Automatic Differentiation. Proc. ACM Program. Lang., 2, ICFP (2018), Article 70, jul, 29 pages. https://doi.org/10.1145/3236765
Google Scholar
Digital Library
- Conal M. Elliott. 2009. Beautiful Differentiation. In Proceedings of the 14th ACM SIGPLAN International Conference on Functional Programming (ICFP ’09). Association for Computing Machinery, New York, NY, USA. 191–202. isbn:9781605583327 https://doi.org/10.1145/1596550.1596579
Google Scholar
Digital Library
- Roy Frostig, Matthew Johnson, and Chris Leary. 2018. Compiling machine learning programs via high-level tracing. In Machine Learning and Systems (MLSys). https://mlsys.org/Conferences/doc/2018/146.pdf
Google Scholar
- Roy Frostig, Matthew J. Johnson, Dougal Maclaurin, Adam Paszke, and Alexey Radul. 2021. Decomposing reverse-mode automatic differentiation. In LAFI: POPL workshop on Languages for Inference. Association for Computing Machinery, New York, NY, USA. arxiv:2105.09469
Google Scholar
- Yoshihiko Futamura. 1983. Partial computation of programs. In RIMS Symposia on Software Science and Engineering, Eiichi Goto, Koichi Furukawa, Reiji Nakajima, Ikuo Nakata, and Akinori Yonezawa (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 1–35. isbn:978-3-540-39442-6 https://doi.org/10.1007/3-540-11980-9_13
Google Scholar
Cross Ref
- Andreas Griewank and Andrea Walther. 2008. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation (second ed.). Society for Industrial and Applied Mathematics, USA. isbn:0898716594 https://doi.org/10.1137/1.9780898717761
Google Scholar
Cross Ref
- Mathieu Huot, Sam Staton, and Matthijs Vákár. 2020. Correctness of Automatic Differentiation via Diffeologies and Categorical Gluing. In Foundations of Software Science and Computation Structures - 23rd International Conference, FOSSACS 2020, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020, Dublin, Ireland, April 25-30, 2020, Proceedings, Jean Goubault-Larrecq and Barbara König (Eds.) (Lecture Notes in Computer Science, Vol. 12077). Springer, 319–338. https://doi.org/10.1007/978-3-030-45231-5_17
Google Scholar
Digital Library
- Neil D. Jones, Carsten K. Gomard, and Peter Sestoft. 1993. Partial Evaluation and Automatic Program Generation. Prentice-Hall, Inc., USA. isbn:0130202495
Google Scholar
- Jerzy Karczmarczuk. 1998. Functional Differentiation of Computer Programs. In Proceedings of the Third ACM SIGPLAN International Conference on Functional Programming (ICFP ’98). Association for Computing Machinery, New York, NY, USA. 195–203. isbn:1581130244 https://doi.org/10.1145/289423.289442
Google Scholar
Digital Library
- Faustyna Krawiec, Simon Peyton Jones, Neel Krishnaswami, Tom Ellis, Richard A. Eisenberg, and Andrew Fitzgibbon. 2022. Provably Correct, Asymptotically Efficient, Higher-Order Reverse-Mode Automatic Differentiation. Proc. ACM Program. Lang., 6, POPL (2022), Article 48, jan, 30 pages. https://doi.org/10.1145/3498710
Google Scholar
Digital Library
- Damiano Mazza and Michele Pagani. 2021. Automatic Differentiation in PCF. Proc. ACM Program. Lang., 5, POPL (2021), Article 28, jan, 27 pages. https://doi.org/10.1145/3434309
Google Scholar
Digital Library
- Adam Paszke, Daniel D. Johnson, David Duvenaud, Dimitrios Vytiniotis, Alexey Radul, Matthew J. Johnson, Jonathan Ragan-Kelley, and Dougal Maclaurin. 2021. Getting to the Point: Index Sets and Parallelism-Preserving Autodiff for Pointful Array Programming. Proc. ACM Program. Lang., 5, ICFP (2021), Article 88, aug, 29 pages. https://doi.org/10.1145/3473593
Google Scholar
Digital Library
- Barak A Pearlmutter and Jeffrey Mark Siskind. 2008. Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator. ACM Transactions on Programming Languages and Systems (TOPLAS), 30, 2 (2008), 1–36. https://doi.org/10.1145/1330017.1330018
Google Scholar
Digital Library
- Barak A. Pearlmutter and Jeffrey Mark Siskind. 2008. Reverse-Mode AD in a Functional Framework: Lambda the Ultimate Backpropagator. ACM Trans. Program. Lang. Syst., 30, 2 (2008), Article 7, mar, 36 pages. issn:0164-0925 https://doi.org/10.1145/1330017.1330018
Google Scholar
Digital Library
- Dan Piponi. 2004. Automatic Differentiation, C++ Templates, and Photogrammetry. Journal of Graphics Tools, 9, 4 (2004), 41–55. https://doi.org/10.1080/10867651.2004.10504901
Google Scholar
Cross Ref
- Dan Piponi. 2009. Two Tricks for the Price of One: Linear Filters and Their Transposes. J. Graphics, GPU, & Game Tools, 14, 1 (2009), 63–72. https://doi.org/10.1080/2151237X.2009.10129275
Google Scholar
Cross Ref
- Amr Sabry and Matthias Felleisen. 1992. Reasoning about Programs in Continuation-Passing Style.. SIGPLAN Lisp Pointers, V, 1 (1992), jan, 288–298. issn:1045-3563 https://doi.org/10.1145/141478.141563
Google Scholar
Digital Library
- Andrea Walther, Andreas Griewank, and Olaf Vogel. 2003. ADOL-C: Automatic Differentiation Using Operator Overloading in C++. PAMM, 2, 1 (2003), 41–44. https://doi.org/10.1002/pamm.200310011
Google Scholar
Cross Ref
- Fei Wang, Daniel Zheng, James Decker, Xilun Wu, Grégory M. Essertel, and Tiark Rompf. 2019. Demystifying Differentiable Programming: Shift/Reset the Penultimate Backpropagator. Proc. ACM Program. Lang., 3, ICFP (2019), Article 96, jul, 31 pages. https://doi.org/10.1145/3341700
Google Scholar
Digital Library
Index Terms
You Only Linearize Once: Tangents Transpose to Gradients
Recommendations
Efficient Dual-Numbers Reverse AD via Well-Known Program Transformations
Where dual-numbers forward-mode automatic differentiation (AD) pairs each scalar value with its tangent value, dual-numbers reverse-mode AD attempts to achieve reverse AD using a similarly simple idea: by pairing each scalar value with a ...
Automatic parallelism in differentiation of Fourier transforms
SAC '03: Proceedings of the 2003 ACM symposium on Applied computingFor functions given in the form of a computer program, automatic differentiation is an efficient technique to accurately evaluate the derivatives of that function. Starting from a given computer program, automatic differentiation generates another ...
Efficient Derivative Codes through Automatic Differentiation and Interface Contraction: An Application in Biostatistics
Developing code for computing the first- and higher-order derivatives of a function by hand can be very time consuming and is prone to errors. Automatic differentiation has proven capable of producing derivative codes with very little effort on the part ...





Comments