Abstract
Automatic differentiation plays a prominent role in scientific computing and in modern machine learning, often in the context of powerful programming systems. The relation of the various embodiments of automatic differentiation to the mathematical notion of derivative is not always entirely clear---discrepancies can arise, sometimes inadvertently. In order to study automatic differentiation in such programming contexts, we define a small but expressive programming language that includes a construct for reverse-mode differentiation. We give operational and denotational semantics for this language. The operational semantics employs popular implementation techniques, while the denotational semantics employs notions of differentiation familiar from real analysis. We establish that these semantics coincide.
- Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016b. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS ’16). ACM, 308–318. Google Scholar
Digital Library
- Samson Abramsky and Achim Jung. 1994. Domain theory. In Handbook of Logic in Computer Science (Vol. 3), Samson Abramsky, Dov M. Gabbay, and T. S. E. Maibaum (Eds.). Oxford University Press, Inc., 1–168.Google Scholar
- Akshay Agrawal, Akshay Naresh Modi, Alexandre Passos, Allen Lavoie, Ashish Agarwal, Asim Shankar, Igor Ganichev, Josh Levenberg, Mingsheng Hong, Rajat Monga, et al. 2019. TensorFlow Eager: A multi-stage, Python-embedded DSL for machine learning. arXiv preprint arXiv:1903.01855 (2019).Google Scholar
- Shun-ichi Amari. 1996. Neural learning in structured parameter spaces — natural Riemannian gradient. In Advances in Neural Information Processing Systems 9, NIPS, M. Mozer, M. I. Jordan, and T. Petsche (Eds.). MIT Press, 127–133.Google Scholar
- Atilim Günes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. 2018. Automatic differentiation in machine learning: a survey. Journal of Machine Learning Research 18, 153 (2018), 1–43.Google Scholar
Digital Library
- Atilim Günes Baydin, Barak A. Pearlmutter, and Jeffrey Mark Siskind. 2016. Tricks from deep learning. CoRR abs/1611.03777 (2016).Google Scholar
- Thomas Beck and Herbert Fischer. 1994. The if-problem in automatic differentiation. J. Comput. Appl. Math. 50, 1-3 (May 1994), 119–131. Google Scholar
Digital Library
- James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. 2010. Theano: A CP U and GP U math expression compiler. In Proceedings of the Python for scientific computing conference (SciPy), Vol. 4.Google Scholar
Cross Ref
- Yves Bertot and Pierre Castéran. 2013. Interactive theorem proving and program development: Coq’Art: the calculus of inductive constructions. Springer Science & Business Media.Google Scholar
Digital Library
- Richard Blute, Thomas Ehrhard, and Christine Tasson. 2010. A convenient differential category. arXiv preprint arXiv:1006.3140 (2010).Google Scholar
- Richard F Blute, J Robin B Cockett, and Robert AG Seely. 2009. Cartesian differential categories. Theory and Applications of Categories 22, 23 (2009), 622–672.Google Scholar
- Antonio Bucciarelli, Thomas Ehrhard, and Giulio Manzonetto. 2010. Categorical models for simply typed resource calculi. Electronic Notes in Theoretical Computer Science 265 (2010), 213–230.Google Scholar
Digital Library
- Bruce Christianson. 2012. A Leibniz notation for automatic differentiation. In Recent Advances in Algorithmic Differentiation, Shaun Forth, Paul Hovland, Eric Phipps, Jean Utke, and Andrea Walther (Eds.). Lecture Notes in Computational Science and Engineering, Vol. 87. Springer, 1–9.Google Scholar
Cross Ref
- Frank H. Clarke. 1990. Optimization and nonsmooth analysis. Classics in Applied Mathematics, Vol. 5. SIAM.Google Scholar
- J Robin B Cockett, Geoff SH Cruttwell, and Jonathan D Gallagher. 2011. Differential restriction categories. Theory and Applications of Categories 25, 21 (2011), 537–613.Google Scholar
- Leonardo Mendonça de Moura, Soonho Kong, Jeremy Avigad, Floris van Doorn, and Jakob von Raumer. 2015. The Lean Theorem Prover (System Description). In Automated Deduction - CADE-25 - 25th International Conference on Automated Deduction, Berlin, Germany, August 1-7, 2015, Proceedings (Lecture Notes in Computer Science), Amy P. Felty and Aart Middeldorp (Eds.), Vol. 9195. Springer, 378–388. Google Scholar
- Pietro Di Gianantonio and Abbas Edalat. 2013. A language for differentiable functions. In Foundations of Software Science and Computation Structures, Frank Pfenning (Ed.). Springer, 337–352.Google Scholar
- Abbas Edalat and André Lieutier. 2004. Domain theory and differential calculus (functions of one variable). Mathematical Structures in Computer Science 14, 6 (2004), 771–802. Google Scholar
Digital Library
- Abbas Edalat and Mehrdad Maleki. 2018. Differential calculus with imprecise input and its logical framework. In Foundations of Software Science and Computation Structures - 21st International Conference, FOSSACS 2018. 459–475. Google Scholar
Cross Ref
- Thomas Ehrhard and Laurent Regnier. 2003. The differential lambda-calculus. Theo. Comp. Sci. 309, 1-3 (2003), 1–41.Google Scholar
Digital Library
- Conal Elliott. 2018. The simple essence of automatic differentiation. In Proceedings of the ACM on Programming Languages (ICFP).Google Scholar
Digital Library
- Matthias Felleisen and Daniel P. Friedman. 1987. Control operators, the SECD-machine and the λ-calculus. In Formal Description of Programming Concepts III, M. Wirsing (Ed.). Elsevier, 193–217.Google Scholar
- H. Fischer. 2001. Automatic differentiation: root problem and branch problem. In Encyclopedia of Optimization, C. A. Floudas and P. M. Pardalos (Eds.). Vol. I. Kluwer Academic Publishers, 118–122.Google Scholar
- Roy Frostig, Matthew James Johnson, and Chris Leary. 2018. Compiling machine learning programs via high-level tracing. Presented at SysML 2018.Google Scholar
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.Google Scholar
Digital Library
- Andreas Griewank. 2000. Evaluating derivatives - principles and techniques of algorithmic differentiation. Frontiers in applied mathematics, Vol. 19. SIAM.Google Scholar
- L. Hascoët and V. Pascual. 2013. The Tapenade automatic differentiation tool: principles, model, and specification. ACM Transactions On Mathematical Software 39, 3 (2013).Google Scholar
Digital Library
- P. Iglesias-Zemmour. 2013. Diffeology. American Mathematical Society.Google Scholar
- Andreas Kriegl and Peter W Michor. 1997. The convenient setting of global analysis. Vol. 53. American Mathematical Soc.Google Scholar
- Dougal Maclaurin, David Duvenaud, and Ryan P Adams. 2015. Autograd: effortless gradients in Numpy. In ICML 2015 AutoML Workshop, Vol. 238.Google Scholar
- Oleksandr Manzyuk. 2012. A simply typed λ-calculus of forward automatic differentiation. Electronic Notes in Theoretical Computer Science 286 (2012), 257 – 272. Google Scholar
Digital Library
- Micaela Mayero. 2002. Using theorem proving for numerical analysis (correctness proof of an automatic differentiation algorithm). In Theorem Proving in Higher Order Logics, 15th International Conference, TPHOLs 2002, Hampton, VA, USA, August 20-23, 2002, Proceedings (Lecture Notes in Computer Science), Victor Carreño, César A. Muñoz, and Sofiène Tahar (Eds.), Vol. 2410. Springer, 246–262. Google Scholar
Cross Ref
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems. 8024–8035.Google Scholar
- Barak A. Pearlmutter and Jeffrey Mark Siskind. 2008. Reverse-mode AD in a functional framework: lambda the ultimate backpropagator. ACM Trans. Program. Lang. Syst. 30, 2 (2008), 7:1–7:36. Google Scholar
Digital Library
- Daniel Selsam, Percy Liang, and David L. Dill. 2017. Developing bug-free machine learning systems with formal mathematics. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017 (Proceedings of Machine Learning Research), Doina Precup and Yee Whye Teh (Eds.), Vol. 70. PMLR, 3047–3056.Google Scholar
- Amir Shaikhha, Andrew Fitzgibbon, Dimitrios Vytiniotis, Simon Peyton Jones, and Christoph Koch. 2018. Efficient differentiable programming in a functional array-processing language. CoRR abs/1806.02136 (2018). arXiv: 1806.02136Google Scholar
- Jeffrey Mark Siskind and Barak A. Pearlmutter. 2005. Perturbation confusion and referential transparency: correct functional implementation of forward-mode AD. In Implementation and Application of Functional Languages—17th International Workshop, IFL’05, A. Butterfield (Ed.). 1–9. Trinity College Dublin, Computer Science Department Technical Report TCD-CS-2005-60.Google Scholar
- Jeffrey Mark Siskind and Barak A. Pearlmutter. 2008. Nesting forward-mode AD in a functional framework. Higher-Order and Symbolic Computation 21, 4 (2008), 361–376. Google Scholar
Digital Library
- Justin Slepak, Olin Shivers, and Panagiotis Manolios. 2014. An array-oriented language with static rank polymorphism. In Proceedings of the 23rd European Symposium on Programming Languages and Systems - Volume 8410. Springer-Verlag, 27–46. Google Scholar
Digital Library
- Shuang Song, Kamalika Chaudhuri, and Anand D. Sarwate. 2013. Stochastic gradient descent with differentially private updates. In IEEE Global Conference on Signal and Information Processing, GlobalSIP 2013, Austin, TX, USA, December 3-5, 2013. IEEE, 245–248. Google Scholar
Cross Ref
- Seiya Tokui, Kenta Oono, Shohei Hido, and Justin Clayton. 2015. Chainer: a next-generation open source framework for deep learning. In Proceedings of Workshop on Machine Learning Systems (LearningSys) in the Twenty-Ninth Conference on Neural Information Processing Systems (NIPS), Vol. 5. 1–6.Google Scholar
- W.F. Trench. 2003. Introduction to Real Analysis. Prentice Hall/Pearson Education.Google Scholar
- M. Vákár, O. Kammar, and S. Staton. 2018. Diffeological spaces and semantics for differential programming. Presented at Domains XIII Workshop.Google Scholar
- Bart van Merrienboer, Dan Moldovan, and Alexander B. Wiltschko. 2018. Tangent: automatic differentiation using sourcecode transformation for dynamically typed array programming. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada., S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). 6259–6268.Google Scholar
- Fei Wang, Xilun Wu, Grégory M. Essertel, James M. Decker, and Tiark Rompf. 2018. Demystifying differentiable programming: shift/reset the penultimate backpropagator. CoRR abs/1803.10228 (2018). arXiv: 1803.10228Google Scholar
- Yuan Yu, Martín Abadi, Paul Barham, Eugene Brevdo, Mike Burrows, Andy Davis, Jeff Dean, Sanjay Ghemawat, Tim Harley, Peter Hawkins, Michael Isard, Manjunath Kudlur, Rajat Monga, Derek Murray, and Xiaoqiang Zheng. 2018. Dynamic control flow in large-scale machine learning. In Proceedings of the Thirteenth EuroSys Conference (EuroSys ’18). ACM, Article 18, 15 pages. Google Scholar
Digital Library
Index Terms
A simple differentiable programming language
Recommendations
Systematically differentiating parametric discontinuities
Emerging research in computer graphics, inverse problems, and machine learning requires us to differentiate and optimize parametric discontinuities. These discontinuities appear in object boundaries, occlusion, contact, and sudden change over time. In ...






Comments