Abstract
We study the correctness of automatic differentiation (AD) in the context of a higher-order, Turing-complete language (PCF with real numbers), both in forward and reverse mode. Our main result is that, under mild hypotheses on the primitive functions included in the language, AD is almost everywhere correct, that is, it computes the derivative or gradient of the program under consideration except for a set of Lebesgue measure zero. Stated otherwise, there are inputs on which AD is incorrect, but the probability of randomly choosing one such input is zero. Our result is in fact more precise, in that the set of failure points admits a more explicit description: for example, in case the primitive functions are just constants, addition and multiplication, the set of points where AD fails is contained in a countable union of zero sets of polynomials.
- Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jefrey Dean, Matthieu Devin, Sanjay Ghemawat, Geofrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In Proceedings of OSDI. USENIX Association, 265-283.Google Scholar
Digital Library
- Martín Abadi and Gordon D. Plotkin. 2020. A simple diferentiable programming language. Proc. ACM Program. Lang. 4, POPL ( 2020 ), 38 : 1-38 : 28.Google Scholar
- Roberto Amadio and Pierre-Louis Curien. 1998. Domains and Lambda-Calculi. Vol. 46. Cambridge University Press.Google Scholar
- Henk P. Barendregt. 1985. The Lambda Calculus, Its Syntax and Semantics. North Holland.Google Scholar
- Gilles Barthe, Raphaëlle Crubillé, Ugo Dal Lago, and Francesco Gavazzo. 2020. On the Versatility of Open Logical Relations-Continuity, Automatic Diferentiation, and a Containment Theorem. In Proceedings of ESOP. 56-83.Google Scholar
- Atilim Baydin, Barak Pearlmutter, Alexey Radul, and Jefrey Siskind. 2018. Automatic diferentiation in machine learning: A survey. Journal of Machine Learning Research 18 ( 2018 ), 1-43.Google Scholar
- Thomas Beck and Herbert Fischer. 1994. The if-problem in automatic diferentiation. J. Comput. Appl. Math. 50, 1 ( 1994 ), 119-131.Google Scholar
- Aloïs Brunel, Damiano Mazza, and Michele Pagani. 2020. Backpropagation in the simply typed lambda-calculus with linear negation. PACMPL 4, POPL ( 2020 ), 64 : 1-64 : 27.Google Scholar
- Pietro Di Gianantonio and Abbas Edalat. 2013. A Language for Diferentiable Functions. In Proceedings of FOSSACS. 337-352.Google Scholar
- Thomas Ehrhard and Laurent Regnier. 2006. Böhm Trees, Krivine's Machine and the Taylor Expansion of Lambda-Terms. In Proceedings of CiE. 186-197.Google Scholar
- Thomas Ehrhard and Laurent Regnier. 2008. Uniformity and the Taylor expansion of ordinary lambda-terms. Theor. Comput. Sci. 403, 2-3 ( 2008 ), 347-372.Google Scholar
Digital Library
- Conal Elliott. 2018. The simple essence of automatic diferentiation. PACMPL 2, ICFP ( 2018 ), 70 : 1-70 : 29.Google Scholar
- Martín Hötzel Escardó. 1996. PCF Extended with Real Numbers. Theor. Comput. Sci. 162, 1 ( 1996 ), 79-115.Google Scholar
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook. org Andreas Griewank and Andrea Walther. 2008. Evaluating derivatives-principles and techniques of algorithmic diferentiation, Second Edition. SIAM.Google Scholar
Digital Library
- Mathieu Huot, Sam Staton, and Matthijs Vákár. 2020. Correctness of Automatic Diferentiation via Difeologies and Categorical Gluing. In Proceedings of FOSSACS. 319-338.Google Scholar
- Johan Joss. 1976. Algorthmisches Diferenzieren. Ph.D. Dissertation. ETH Zurich.Google Scholar
- Yann LeCun. 2018. Deep Learning est mort. Vive Diferentiable Programming! ( 2018 ). https://www.facebook.com/yann. lecun/posts/10155003011462143Google Scholar
- Wonyeol Lee, Hangyeol Yu, Xavier Rival, and Hongseok Yang. 2020. On Correctness of Automatic Diferentiation for Non-Diferentiable Functions. CoRR abs/ 2006.06903 ( 2020 ).Google Scholar
- Carol Mak, C.-H. Luke Ong, Hugo Paquet, and Dominik Wagner. 2020. Densities of almost-surely terminating probabilistic programs are diferentiable almost everywhere. CoRR abs/ 2004.03924 ( 2020 ).Google Scholar
- Damiano Mazza. 2017. Polyadic Approximations in Logic and Computation. Habilitation thesis. Université Paris 13.Google Scholar
- Damiano Mazza, Luc Pellissier, and Pierre Vial. 2018. Polyadic approximations, fibrations and intersection types. Proc. ACM Program. Lang. 2, POPL ( 2018 ), 6 : 1-6 : 28.Google Scholar
Digital Library
- Boris Mityagin. 2015. The Zero Set of a Real Analytic Function. arXiv: 1512.07276 [math.CA] ( 2015 ).Google Scholar
- Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic diferentiation in PyTorch. ( 2017 ).Google Scholar
- Barak A. Pearlmutter and Jefrey Mark Siskind. 2008. Reverse-mode AD in a Functional Framework: Lambda the Ultimate Backpropagator. ACM Trans. Program. Lang. Syst. 30, 2 ( 2008 ), 7 : 1-7 : 36.Google Scholar
Digital Library
- Gordon Plotkin. 1977. LCF Considered as a Programming Language. Theoretical Computer Science 5, 3 ( 1977 ), 223-255.Google Scholar
- Amir Shaikhha, Andrew Fitzgibbon, Dimitrios Vytiniotis, and Simon Peyton Jones. 2019. Eficient diferentiable programming in a functional array-processing language. PACMPL 3, ICFP ( 2019 ), 97 : 1-97 : 30.Google Scholar
- Bert Speelpenning. 1980. Compiling Fast Partial Derivatives of Functions Given by Algorithms. Ph.D. Dissertation. University of Illinois at Urbana-Champaign.Google Scholar
Digital Library
- Ágnes Szendrei. 1986. Clones in Universal Algebra. Presses de l'Université de Montréal.Google Scholar
- Fei Wang, Daniel Zheng, James M. Decker, Xilun Wu, Grégory M. Essertel, and Tiark Rompf. 2019. Demystifying diferentiable programming: shift/reset the penultimate backpropagator. PACMPL 3, ICFP ( 2019 ), 96 : 1-96 : 31.Google Scholar
- Yuan Zhou, Bradley J. Gram-Hansen, Tobias Kohn, Tom Rainforth, Hongseok Yang, and Frank Wood. 2019. LF-PPL: A Low-Level First Order Probabilistic Programming Language for Non-Diferentiable Models. In Proceedings of AISTATS. 148-157.Google Scholar
Index Terms
Automatic differentiation in PCF
Recommendations
Backpropagation in the simply typed lambda-calculus with linear negation
Backpropagation is a classic automatic differentiation algorithm computing the gradient of functions specified by a certain class of simple, first-order programs, called computational graphs. It is a fundamental tool in several fields, most notably ...
Classical By-Need
Proceedings of the 25th European Symposium on Programming Languages and Systems - Volume 9632Call-by-need calculi are complex to design and reason with. When adding control effects, the very notion of canonicity is irremediably lost, the resulting calculi being necessarily ad hoc. This calls for a design of call-by-need guided by logical rather ...
The Bang Calculus: an untyped lambda-calculus generalizing call-by-name and call-by-value
PPDP '16: Proceedings of the 18th International Symposium on Principles and Practice of Declarative ProgrammingWe introduce and study the Bang Calculus, an untyped functional calculus in which the promotion operation of Linear Logic is made explicit and where application is a bilinear operation. This calculus, which can be understood as an untyped version of ...






Comments