Abstract
Backpropagation is a classic automatic differentiation algorithm computing the gradient of functions specified by a certain class of simple, first-order programs, called computational graphs. It is a fundamental tool in several fields, most notably machine learning, where it is the key for efficiently training (deep) neural networks. Recent years have witnessed the quick growth of a research field called differentiable programming, the aim of which is to express computational graphs more synthetically and modularly by resorting to actual programming languages endowed with control flow operators and higher-order combinators, such as map and fold. In this paper, we extend the backpropagation algorithm to a paradigmatic example of such a programming language: we define a compositional program transformation from the simply-typed lambda-calculus to itself augmented with a notion of linear negation, and prove that this computes the gradient of the source program with the same efficiency as first-order backpropagation. The transformation is completely effect-free and thus provides a purely logical understanding of the dynamics of backpropagation.
Supplemental Material
- Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In Proceedings of OSDI. USENIX Association, 265–283.Google Scholar
Digital Library
- Beniamino Accattoli. 2012. An Abstract Factorization Theorem for Explicit Substitutions. In Proceedings of RTA (LIPIcs), Vol. 15. 6–21.Google Scholar
- Beniamino Accattoli. 2018. Proof Nets and the Linear Substitution Calculus. In Proceedings of ICTAC (Lecture Notes in Computer Science), Vol. 11187. Springer, 37–61.Google Scholar
Cross Ref
- Beniamino Accattoli, Pablo Barenbaum, and Damiano Mazza. 2014. Distilling Abstract Machines. In Proceedings of ICFP. ACM, 363–376.Google Scholar
Digital Library
- Beniamino Accattoli and Bruno Barras. 2017. Environments and the complexity of abstract machines. In In Proceedings of PPDP. ACM, 4–16.Google Scholar
Digital Library
- Atılım Güneş Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. 2017. Automatic Differentiation in Machine Learning: a Survey. Journal of Machine Learning Research 18 (2017), 153:1–153:43.Google Scholar
- George Cybenko. 1989. Approximation by superpositions of a sigmoidal function. MCSS 2, 4 (1989), 303–314.Google Scholar
- Olivier Danvy and Mayer Goldberg. 2005. There and Back Again. Fundam. Inform. 66, 4 (2005), 397–413.Google Scholar
Digital Library
- Cicero dos Santos and Maira Gatti. 2014. Deep convolutional neural networks for sentiment analysis of short texts. In Proceedings of COLING: Technical Papers. 69–78.Google Scholar
- Thomas Ehrhard. 2018. An introduction to differential linear logic: proof-nets, models and antiderivatives. Mathematical Structures in Computer Science 28, 7 (2018), 995–1060.Google Scholar
Cross Ref
- Thomas Ehrhard and Giulio Guerrieri. 2016. The Bang Calculus: an untyped lambda-calculus generalizing call-by-name and call-by-value. In Proceedings PPDP. ACM, 174–187.Google Scholar
Digital Library
- Thomas Ehrhard and Laurent Regnier. 2003. The differential lambda-calculus. Theor. Comput. Sci. 309, 1-3 (2003), 1–41.Google Scholar
Digital Library
- Conal Elliott. 2018. The simple essence of automatic differentiation. PACMPL 2, ICFP (2018), 70:1–70:29.Google Scholar
- Jean-Yves Girard. 1987. Linear Logic. Theor. Comput. Sci. 50, 1 (Jan. 1987), 1–102.Google Scholar
Digital Library
- Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of ICML. 513–520.Google Scholar
- Kurt Hornik. 1991. Approximation capabilities of multilayer feedforward networks. Neural Networks 4, 2 (1991), 251–257.Google Scholar
Digital Library
- J. M. E. Hyland. 2017. Classical lambda calculus in modern dress. Math. Structures Comput. Sci. 27, 5 (2017), 762–781.Google Scholar
Cross Ref
- Teijiro Isokawa, Tomoaki Kusakabe, Nobuyuki Matsui, and Ferdinand Peper. 2003. Quaternion Neural Network and Its Application. In Proceedings of KES, Part II. 318–324.Google Scholar
Cross Ref
- Yann LeCun. 2018. Deep Learning est mort. Vive Differentiable Programming! (2018).Google Scholar
- Yann LeCun, Bernhard E. Boser, John S. Denker, Donnie Henderson, Richard E. Howard, Wayne E. Hubbard, and Lawrence D. Jackel. 1989. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation 1, 4 (1989), 541–551.Google Scholar
Digital Library
- Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. (2017).Google Scholar
- Barak A. Pearlmutter. 1995. Gradient calculations for dynamic recurrent neural networks: a survey. IEEE Trans. Neural Networks 6, 5 (1995), 1212–1228.Google Scholar
Digital Library
- Barak A. Pearlmutter and Jeffrey Mark Siskind. 2008. Reverse-mode AD in a Functional Framework: Lambda the Ultimate Backpropagator. ACM Trans. Program. Lang. Syst. 30, 2, Article 7 (March 2008), 36 pages.Google Scholar
Digital Library
- J.K. Pearson and David L. Bisset. 1992. Back Propagation in a Clifford Algebra. In Proceedings of ICANN, Vol. 2. 413–416.Google Scholar
- Gordon Plotkin. 2018. Some Principles of Differential Programming Languages. (2018). https://popl18.sigplan.org/details/ POPL- 2018- papers/76/Some- Principles- of- Differential- Programming- Languages Invited talk at POPL 2018.Google Scholar
- Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019).Google Scholar
- David E. Rumelhart, James L. McClelland, and PDP Research Group. 1987. Parallel Distributed Processing, Volumes 1 and 2. MIT Press.Google Scholar
- Aliaksei Severyn and Alessandro Moschitti. 2015. Twitter sentiment analysis with deep convolutional neural networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 959–962.Google Scholar
Digital Library
- Ronald Van Iwaarden. 1993. Automatic Differentiation Applied to Unconstrained Nonlinear Optimization with Result Verification. Interval Computations 3 (1993), 41–60.Google Scholar
- Fei Wang, Daniel Zheng, James M. Decker, Xilun Wu, Grégory M. Essertel, and Tiark Rompf. 2019. Demystifying differentiable programming: shift/reset the penultimate backpropagator. PACMPL 3, ICFP (2019), 96:1–96:31.Google Scholar
Digital Library
Index Terms
Backpropagation in the simply typed lambda-calculus with linear negation
Recommendations
Focused Linear Logic and the λ-calculus
Linear logic enjoys strong symmetries inherited from classical logic while providing a constructive framework comparable to intuitionistic logic. However, the computational interpretation of sequent calculus presentations of linear logic remains ...
The Bang Calculus: an untyped lambda-calculus generalizing call-by-name and call-by-value
PPDP '16: Proceedings of the 18th International Symposium on Principles and Practice of Declarative ProgrammingWe introduce and study the Bang Calculus, an untyped functional calculus in which the promotion operation of Linear Logic is made explicit and where application is a bilinear operation. This calculus, which can be understood as an untyped version of ...
Intuitionistic differential nets and lambda-calculus
We define pure intuitionistic differential proof nets, extending Ehrhard and Regnier s differential interaction nets with the exponential box of Linear Logic. Normalization of the exponential reduction and confluence of the full one is proved. These ...






Comments