skip to main content
research-article
Open Access

A simple differentiable programming language

Published:20 December 2019Publication History
Skip Abstract Section

Abstract

Automatic differentiation plays a prominent role in scientific computing and in modern machine learning, often in the context of powerful programming systems. The relation of the various embodiments of automatic differentiation to the mathematical notion of derivative is not always entirely clear---discrepancies can arise, sometimes inadvertently. In order to study automatic differentiation in such programming contexts, we define a small but expressive programming language that includes a construct for reverse-mode differentiation. We give operational and denotational semantics for this language. The operational semantics employs popular implementation techniques, while the denotational semantics employs notions of differentiation familiar from real analysis. We establish that these semantics coincide.

References

  1. Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016b. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS ’16). ACM, 308–318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Samson Abramsky and Achim Jung. 1994. Domain theory. In Handbook of Logic in Computer Science (Vol. 3), Samson Abramsky, Dov M. Gabbay, and T. S. E. Maibaum (Eds.). Oxford University Press, Inc., 1–168.Google ScholarGoogle Scholar
  3. Akshay Agrawal, Akshay Naresh Modi, Alexandre Passos, Allen Lavoie, Ashish Agarwal, Asim Shankar, Igor Ganichev, Josh Levenberg, Mingsheng Hong, Rajat Monga, et al. 2019. TensorFlow Eager: A multi-stage, Python-embedded DSL for machine learning. arXiv preprint arXiv:1903.01855 (2019).Google ScholarGoogle Scholar
  4. Shun-ichi Amari. 1996. Neural learning in structured parameter spaces — natural Riemannian gradient. In Advances in Neural Information Processing Systems 9, NIPS, M. Mozer, M. I. Jordan, and T. Petsche (Eds.). MIT Press, 127–133.Google ScholarGoogle Scholar
  5. Atilim Günes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. 2018. Automatic differentiation in machine learning: a survey. Journal of Machine Learning Research 18, 153 (2018), 1–43.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Atilim Günes Baydin, Barak A. Pearlmutter, and Jeffrey Mark Siskind. 2016. Tricks from deep learning. CoRR abs/1611.03777 (2016).Google ScholarGoogle Scholar
  7. Thomas Beck and Herbert Fischer. 1994. The if-problem in automatic differentiation. J. Comput. Appl. Math. 50, 1-3 (May 1994), 119–131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. 2010. Theano: A CP U and GP U math expression compiler. In Proceedings of the Python for scientific computing conference (SciPy), Vol. 4.Google ScholarGoogle ScholarCross RefCross Ref
  9. Yves Bertot and Pierre Castéran. 2013. Interactive theorem proving and program development: Coq’Art: the calculus of inductive constructions. Springer Science & Business Media.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Richard Blute, Thomas Ehrhard, and Christine Tasson. 2010. A convenient differential category. arXiv preprint arXiv:1006.3140 (2010).Google ScholarGoogle Scholar
  11. Richard F Blute, J Robin B Cockett, and Robert AG Seely. 2009. Cartesian differential categories. Theory and Applications of Categories 22, 23 (2009), 622–672.Google ScholarGoogle Scholar
  12. Antonio Bucciarelli, Thomas Ehrhard, and Giulio Manzonetto. 2010. Categorical models for simply typed resource calculi. Electronic Notes in Theoretical Computer Science 265 (2010), 213–230.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Bruce Christianson. 2012. A Leibniz notation for automatic differentiation. In Recent Advances in Algorithmic Differentiation, Shaun Forth, Paul Hovland, Eric Phipps, Jean Utke, and Andrea Walther (Eds.). Lecture Notes in Computational Science and Engineering, Vol. 87. Springer, 1–9.Google ScholarGoogle ScholarCross RefCross Ref
  14. Frank H. Clarke. 1990. Optimization and nonsmooth analysis. Classics in Applied Mathematics, Vol. 5. SIAM.Google ScholarGoogle Scholar
  15. J Robin B Cockett, Geoff SH Cruttwell, and Jonathan D Gallagher. 2011. Differential restriction categories. Theory and Applications of Categories 25, 21 (2011), 537–613.Google ScholarGoogle Scholar
  16. Leonardo Mendonça de Moura, Soonho Kong, Jeremy Avigad, Floris van Doorn, and Jakob von Raumer. 2015. The Lean Theorem Prover (System Description). In Automated Deduction - CADE-25 - 25th International Conference on Automated Deduction, Berlin, Germany, August 1-7, 2015, Proceedings (Lecture Notes in Computer Science), Amy P. Felty and Aart Middeldorp (Eds.), Vol. 9195. Springer, 378–388. Google ScholarGoogle Scholar
  17. Pietro Di Gianantonio and Abbas Edalat. 2013. A language for differentiable functions. In Foundations of Software Science and Computation Structures, Frank Pfenning (Ed.). Springer, 337–352.Google ScholarGoogle Scholar
  18. Abbas Edalat and André Lieutier. 2004. Domain theory and differential calculus (functions of one variable). Mathematical Structures in Computer Science 14, 6 (2004), 771–802. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Abbas Edalat and Mehrdad Maleki. 2018. Differential calculus with imprecise input and its logical framework. In Foundations of Software Science and Computation Structures - 21st International Conference, FOSSACS 2018. 459–475. Google ScholarGoogle ScholarCross RefCross Ref
  20. Thomas Ehrhard and Laurent Regnier. 2003. The differential lambda-calculus. Theo. Comp. Sci. 309, 1-3 (2003), 1–41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Conal Elliott. 2018. The simple essence of automatic differentiation. In Proceedings of the ACM on Programming Languages (ICFP).Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Matthias Felleisen and Daniel P. Friedman. 1987. Control operators, the SECD-machine and the λ-calculus. In Formal Description of Programming Concepts III, M. Wirsing (Ed.). Elsevier, 193–217.Google ScholarGoogle Scholar
  23. H. Fischer. 2001. Automatic differentiation: root problem and branch problem. In Encyclopedia of Optimization, C. A. Floudas and P. M. Pardalos (Eds.). Vol. I. Kluwer Academic Publishers, 118–122.Google ScholarGoogle Scholar
  24. Roy Frostig, Matthew James Johnson, and Chris Leary. 2018. Compiling machine learning programs via high-level tracing. Presented at SysML 2018.Google ScholarGoogle Scholar
  25. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Andreas Griewank. 2000. Evaluating derivatives - principles and techniques of algorithmic differentiation. Frontiers in applied mathematics, Vol. 19. SIAM.Google ScholarGoogle Scholar
  27. L. Hascoët and V. Pascual. 2013. The Tapenade automatic differentiation tool: principles, model, and specification. ACM Transactions On Mathematical Software 39, 3 (2013).Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Iglesias-Zemmour. 2013. Diffeology. American Mathematical Society.Google ScholarGoogle Scholar
  29. Andreas Kriegl and Peter W Michor. 1997. The convenient setting of global analysis. Vol. 53. American Mathematical Soc.Google ScholarGoogle Scholar
  30. Dougal Maclaurin, David Duvenaud, and Ryan P Adams. 2015. Autograd: effortless gradients in Numpy. In ICML 2015 AutoML Workshop, Vol. 238.Google ScholarGoogle Scholar
  31. Oleksandr Manzyuk. 2012. A simply typed λ-calculus of forward automatic differentiation. Electronic Notes in Theoretical Computer Science 286 (2012), 257 – 272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Micaela Mayero. 2002. Using theorem proving for numerical analysis (correctness proof of an automatic differentiation algorithm). In Theorem Proving in Higher Order Logics, 15th International Conference, TPHOLs 2002, Hampton, VA, USA, August 20-23, 2002, Proceedings (Lecture Notes in Computer Science), Victor Carreño, César A. Muñoz, and Sofiène Tahar (Eds.), Vol. 2410. Springer, 246–262. Google ScholarGoogle ScholarCross RefCross Ref
  33. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems. 8024–8035.Google ScholarGoogle Scholar
  34. Barak A. Pearlmutter and Jeffrey Mark Siskind. 2008. Reverse-mode AD in a functional framework: lambda the ultimate backpropagator. ACM Trans. Program. Lang. Syst. 30, 2 (2008), 7:1–7:36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Daniel Selsam, Percy Liang, and David L. Dill. 2017. Developing bug-free machine learning systems with formal mathematics. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017 (Proceedings of Machine Learning Research), Doina Precup and Yee Whye Teh (Eds.), Vol. 70. PMLR, 3047–3056.Google ScholarGoogle Scholar
  36. Amir Shaikhha, Andrew Fitzgibbon, Dimitrios Vytiniotis, Simon Peyton Jones, and Christoph Koch. 2018. Efficient differentiable programming in a functional array-processing language. CoRR abs/1806.02136 (2018). arXiv: 1806.02136Google ScholarGoogle Scholar
  37. Jeffrey Mark Siskind and Barak A. Pearlmutter. 2005. Perturbation confusion and referential transparency: correct functional implementation of forward-mode AD. In Implementation and Application of Functional Languages—17th International Workshop, IFL’05, A. Butterfield (Ed.). 1–9. Trinity College Dublin, Computer Science Department Technical Report TCD-CS-2005-60.Google ScholarGoogle Scholar
  38. Jeffrey Mark Siskind and Barak A. Pearlmutter. 2008. Nesting forward-mode AD in a functional framework. Higher-Order and Symbolic Computation 21, 4 (2008), 361–376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Justin Slepak, Olin Shivers, and Panagiotis Manolios. 2014. An array-oriented language with static rank polymorphism. In Proceedings of the 23rd European Symposium on Programming Languages and Systems - Volume 8410. Springer-Verlag, 27–46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Shuang Song, Kamalika Chaudhuri, and Anand D. Sarwate. 2013. Stochastic gradient descent with differentially private updates. In IEEE Global Conference on Signal and Information Processing, GlobalSIP 2013, Austin, TX, USA, December 3-5, 2013. IEEE, 245–248. Google ScholarGoogle ScholarCross RefCross Ref
  41. Seiya Tokui, Kenta Oono, Shohei Hido, and Justin Clayton. 2015. Chainer: a next-generation open source framework for deep learning. In Proceedings of Workshop on Machine Learning Systems (LearningSys) in the Twenty-Ninth Conference on Neural Information Processing Systems (NIPS), Vol. 5. 1–6.Google ScholarGoogle Scholar
  42. W.F. Trench. 2003. Introduction to Real Analysis. Prentice Hall/Pearson Education.Google ScholarGoogle Scholar
  43. M. Vákár, O. Kammar, and S. Staton. 2018. Diffeological spaces and semantics for differential programming. Presented at Domains XIII Workshop.Google ScholarGoogle Scholar
  44. Bart van Merrienboer, Dan Moldovan, and Alexander B. Wiltschko. 2018. Tangent: automatic differentiation using sourcecode transformation for dynamically typed array programming. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada., S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). 6259–6268.Google ScholarGoogle Scholar
  45. Fei Wang, Xilun Wu, Grégory M. Essertel, James M. Decker, and Tiark Rompf. 2018. Demystifying differentiable programming: shift/reset the penultimate backpropagator. CoRR abs/1803.10228 (2018). arXiv: 1803.10228Google ScholarGoogle Scholar
  46. Yuan Yu, Martín Abadi, Paul Barham, Eugene Brevdo, Mike Burrows, Andy Davis, Jeff Dean, Sanjay Ghemawat, Tim Harley, Peter Hawkins, Michael Isard, Manjunath Kudlur, Rajat Monga, Derek Murray, and Xiaoqiang Zheng. 2018. Dynamic control flow in large-scale machine learning. In Proceedings of the Thirteenth EuroSys Conference (EuroSys ’18). ACM, Article 18, 15 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A simple differentiable programming language

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Proceedings of the ACM on Programming Languages
            Proceedings of the ACM on Programming Languages  Volume 4, Issue POPL
            January 2020
            1984 pages
            EISSN:2475-1421
            DOI:10.1145/3377388
            Issue’s Table of Contents

            Copyright © 2019 Owner/Author

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 20 December 2019
            Published in pacmpl Volume 4, Issue POPL

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!