Abstract
We present a system for the automatic differentiation (AD) of a higher-order functional array-processing language. The core functional language underlying this system simultaneously supports both source-to-source forward-mode AD and global optimisations such as loop transformations. In combination, gradient computation with forward-mode AD can be as efficient as reverse mode, and that the Jacobian matrices required for numerical algorithms such as Gauss-Newton and Levenberg-Marquardt can be efficiently computed.
Supplemental Material
- Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning.. In OSDI, Vol. 16. 265–283. Google Scholar
Digital Library
- Sameer Agarwal, Noah Snavely, Steven M Seitz, and Richard Szeliski. 2010. Bundle adjustment in the large. In European conference on computer vision. Springer, 29–42. Google Scholar
Digital Library
- Eugene Agichtein, Eric Brill, and Susan Dumais. 2006. Improving Web Search Ranking by Incorporating User Behavior Information. In SIGIR. Google Scholar
Digital Library
- Johan Anker and Josef Svenningsson. 2013. An EDSL approach to high performance Haskell programming. In ACM Haskell Symposium. 1–12. Google Scholar
Digital Library
- Ricardo Baeza-Yates, Carlos Hurtado, and Marcelo Mendoza. 2004. Query Recommendation Using Query Logs in Search Engines. In EDBT. Google Scholar
Digital Library
- Atilim Gunes Baydin, Barak A Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. 2015b. Automatic differentiation in machine learning: a survey. arXiv preprint arXiv:1502.05767 (2015).Google Scholar
- Atilim Gunes Baydin, Barak A Pearlmutter, and Jeffrey Mark Siskind. 2015a. Diffsharp: Automatic differentiation library. arXiv preprint arXiv:1511.07727 (2015).Google Scholar
- James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. 2010. Theano: A CP U and GP U math compiler in Python. In Proc. 9th Python in Science Conf. 1–7.Google Scholar
- Michael W. Berry, Murray Browne, Amy N. Langville, V. Paul Pauca, and Robert J. Plemmons. 2006. Algorithms and applications for approximate nonnegative matrix factorization. In Computational Statistics and Data Analysis.Google Scholar
- Christian Bischof, Peyvand Khademi, Andrew Mauer, and Alan Carle. 1996. ADIFOR 2.0: Automatic differentiation of Fortran 77 programs. IEEE Computational Science and Engineering 3, 3 (1996), 18–32. Google Scholar
Digital Library
- Christian H Bischof, HM Bucker, Bruno Lang, Arno Rasch, and Andre Vehreschild. 2002. Combining source transformation and operator overloading techniques to compute derivatives for MATLAB programs. In Source Code Analysis and Manipulation, 2002. Proceedings. Second IEEE International Workshop on. IEEE, 65–72. Google Scholar
Digital Library
- Charisee Chiw, Gordon Kindlmann, John Reppy, Lamont Samuels, and Nick Seltzer. 2012. Diderot: A Parallel DSL for Image Analysis and Visualization (PLDI ’12). ACM, 111–120. Google Scholar
Digital Library
- Koen Claessen, Mary Sheeran, and Bo Joel Svensson. 2012. Expressive Array Constructs in an Embedded GP U Kernel Programming Language (DAMP ’12). ACM, NY, USA, 21–30. Google Scholar
Digital Library
- Cliff Click. 1995. Global code motion/global value numbering. ACM Sigplan Notices 30, 6 (1995), 246–257. Google Scholar
Digital Library
- Duncan Coutts, Roman Leshchinskiy, and Don Stewart. 2007. Stream Fusion. From Lists to Streams to Nothing at All (ICFP ’07). Google Scholar
Digital Library
- Olivier Danvy and Andrzej Filinski. 1990. Abstracting control. In Proceedings of the 1990 ACM conference on LISP and functional programming. ACM, 151–160. Google Scholar
Digital Library
- Frédéric De Mesmay, Arpad Rimmel, Yevgen Voronenko, and Markus Püschel. 2009. Bandit-based optimization on graphs with application to library performance tuning. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 729–736. Google Scholar
Digital Library
- Zachary DeVito, Michael Mara, Michael Zollhöfer, Gilbert Bernstein, Jonathan Ragan-Kelley, Christian Theobalt, Pat Hanrahan, Matthew Fisher, and Matthias Nießner. 2016. Opt: A Domain Specific Language for Non-linear Least Squares Optimization in Graphics and Imaging. arXiv preprint arXiv:1604.06525 (2016).Google Scholar
- Conal Elliott. 2018. The Simple Essence of Automatic Differentiation. Proc. ACM Program. Lang. 2, ICFP, Article 70 (July 2018), 70:1–70:29 pages.Google Scholar
Digital Library
- Conal M Elliott. 2009. Beautiful differentiation. In ACM Sigplan Notices, Vol. 44. ACM, 191–202. Google Scholar
Digital Library
- Cormac Flanagan, Amr Sabry, Bruce F Duba, and Matthias Felleisen. 1993. The Essence of Compiling with Continuations. In ACM SIGPLAN Notices, Vol. 28. ACM, 237–247. Google Scholar
Digital Library
- Shaun A Forth. 2006. An efficient overloaded implementation of forward mode automatic differentiation in MATLAB. ACM Transactions on Mathematical Software (TOMS) 32, 2 (2006), 195–222. Google Scholar
Digital Library
- Jeremy Gibbons. 2006. Fission for Program Comprehension. In Mathematics of Program Construction, Tarmo Uustalu (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 162–179. Google Scholar
Digital Library
- Andrew Gill, John Launchbury, and Simon L Peyton Jones. 1993. A short cut to deforestation (FPCA). ACM, 223–232. Google Scholar
Digital Library
- Clemens Grelck and Sven-Bodo Scholz. 2006. SAC—A Functional Array Language for Efficient Multi-threaded Execution. Int. Journal of Parallel Programming 34, 4 (2006), 383–427. Google Scholar
Digital Library
- Brian Guenter. 2007. Efficient Symbolic Differentiation for Graphics Applications. ACM Trans. Graph. 26, 3, Article 108 (July 2007). Google Scholar
Digital Library
- Laurent Hascoet and Valérie Pascual. 2013. The Tapenade Automatic Differentiation Tool: Principles, Model, and Specification. ACM Trans. Math. Softw. 39, 3, Article 20 (May 2013), 43 pages. Google Scholar
Digital Library
- Troels Henriksen, Niels GW Serup, Martin Elsman, Fritz Henglein, and Cosmin E Oancea. 2017. Futhark: purely functional GP U-programming with nested parallelism and in-place array updates. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 556–571. Google Scholar
Digital Library
- Florent Hivert and N Thiéry. 2004. MuPAD-Combinat, an open-source package for research in algebraic combinatorics. Sém. Lothar. Combin 51 (2004), 70.Google Scholar
- Robin J. Hogan. 2014. Fast Reverse-Mode Automatic Differentiation Using Expression Templates in C++. ACM Trans. Math. Softw. 40, 4, Article 26 (July 2014), 16 pages. Google Scholar
Digital Library
- Paul Hudak. 1996. Building Domain-specific Embedded Languages. ACM Comput. Surv. 28, 4 (Dec. 1996). Google Scholar
Digital Library
- Graham Hutton. 1999. A tutorial on the universality and expressiveness of fold. Journal of Functional Programming 9, 4 (1999), 355–372. Google Scholar
Digital Library
- Kenneth E Iverson. 1962. A Programming Language. In Proceedings of the May 1-3, 1962, spring joint computer conference. ACM, 345–351. Google Scholar
Digital Library
- Hong-Jiang Zhang Ji-rong Wen, Jian-Yun Nie and. 2002. Query Clustering Using User Logs. ACM Transactions on Information Systems 20, 1 (2002). Google Scholar
Digital Library
- Simon Peyton Jones, Andrew Tolmach, and Tony Hoare. 2001. Playing by the rules: rewriting as a practical optimisation technique in GHC. In Haskell workshop, Vol. 1. 203–233.Google Scholar
- Manohar Jonnalagedda and Sandro Stucki. 2015. Fold-based Fusion As a Library: A Generative Programming Pearl. In Proceedings of the 6th ACM SIGPLAN Symposium on Scala. ACM, 41–50. Google Scholar
Digital Library
- Jerzy Karczmarczuk. 1999. Functional differentiation of computer programs. ACM SIGPLAN Notices 34, 1 (1999), 195–203. Google Scholar
Digital Library
- Kamil A Khan and Paul I Barton. 2015. A vector forward mode of automatic differentiation for generalized derivative evaluation. Optimization Methods and Software 30, 6 (2015), 1185–1212. Google Scholar
Digital Library
- Oleg Kiselyov, Aggelos Biboudis, Nick Palladinos, and Yannis Smaragdakis. 2017. Stream Fusion, to Completeness. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL 2017). ACM, New York, NY, USA, 285–299. Google Scholar
Digital Library
- Chris Leary and Todd Wang. 2017. XLA: TensorFlow, compiled. TensorFlow Dev Summit (2017).Google Scholar
- Kenneth Levenberg. 1944. A method for the solution of certain non-linear problems in least squares. Quarterly of applied mathematics 2, 2 (1944), 164–168.Google Scholar
- Chao Liu, Hung-chih Yang, Jinliang Fan, Li-Wei He, and Yi-Min Wang. 2010. Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce. In Proceedings of the 19th international conference on World wide web. ACM, 681–690. Google Scholar
Digital Library
- Dougal Maclaurin, David Duvenaud, and Ryan P Adams. 2015. Autograd: Effortless gradients in numpy. In ICML 2015 AutoML Workshop.Google Scholar
- Donald W Marquardt. 1963. An algorithm for least-squares estimation of nonlinear parameters. Journal of the society for Industrial and Applied Mathematics 11, 2 (1963), 431–441.Google Scholar
Cross Ref
- Jorge J Moré. 1978. The Levenberg-Marquardt algorithm: implementation and theory. In Numerical analysis. Springer, 105–116.Google Scholar
- Sri Hari Krishna Narayanan, Boyana Norris, and Beata Winnicka. 2010. ADIC2: Development of a component source transformation system for differentiating C and C++. Procedia Computer Science 1, 1 (2010), 1845–1853.Google Scholar
Cross Ref
- Peter Norvig. 1992. Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp. Morgan Kaufmann. Google Scholar
Digital Library
- Lionel Parreaux, Amir Shaikhha, and Christoph E. Koch. 2017a. Quoted Staged Rewriting: A Practical Approach to Librarydefined Optimizations. In Proceedings of the 16th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE 2017). ACM, New York, NY, USA, 131–145. Google Scholar
Digital Library
- Lionel Parreaux, Antoine Voizard, Amir Shaikhha, and Christoph E. Koch. 2017b. Unifying Analytic and Statically-typed Quasiquotes. Proc. ACM Program. Lang. 2, POPL, Article 13 (Dec. 2017), 33 pages. Google Scholar
Digital Library
- Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch. (2017).Google Scholar
- Barak A Pearlmutter and Jeffrey Mark Siskind. 2007. Lazy multivariate higher-order forward-mode AD. In ACM SIGPLAN Notices, Vol. 42. ACM, 155–160. Google Scholar
Digital Library
- Barak A Pearlmutter and Jeffrey Mark Siskind. 2008. Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator. ACM Transactions on Programming Languages and Systems (TOPLAS) 30, 2 (2008), 7. Google Scholar
Digital Library
- Markus Puschel, José MF Moura, Jeremy R Johnson, David Padua, Manuela M Veloso, Bryan W Singer, Jianxin Xiong, Franz Franchetti, Aca Gacic, Yevgen Voronenko, et al. 2005. SPIRAL: Code generation for DSP transforms. Proc. IEEE 93, 2 (2005), 232–275.Google Scholar
Cross Ref
- Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines (PLDI ’13). Google Scholar
Digital Library
- Jarrett Revels, Miles Lubin, and Theodore Papamarkou. 2016. Forward-mode automatic differentiation in Julia. arXiv preprint arXiv:1607.07892 (2016).Google Scholar
- Tiark Rompf and Martin Odersky. 2010. Lightweight Modular Staging: A Pragmatic Approach to Runtime Code Generation and Compiled DSLs. In the ninth international conference on Generative programming and component engineering (GPCE ’10). ACM, New York, NY, USA, 127–136. Google Scholar
Digital Library
- Tiark Rompf, Arvind K. Sujeeth, Nada Amin, Kevin J. Brown, Vojin Jovanovic, HyoukJoong Lee, Manohar Jonnalagedda, Kunle Olukotun, and Martin Odersky. 2013. Optimizing Data Structures in High-level Programs: New Directions for Extensible Compilers Based on Staging. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’13). ACM, New York, NY, USA, 497–510. Google Scholar
Digital Library
- Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Garret Catron, Summer Deng, Roman Dzhabarov, Nick Gibson, James Hegeman, Meghan Lele, Roman Levenstein, et al. 2018. Glow: Graph lowering compiler techniques for neural networks. arXiv preprint arXiv:1805.00907 (2018).Google Scholar
- Amir Shaikhha, Mohammad Dashti, and Christoph Koch. 2018. Push versus Pull-Based Loop Fusion in Query Engines. Journal of Functional Programming 28 (2018), e10.Google Scholar
Cross Ref
- Amir Shaikhha, Andrew Fitzgibbon, Simon Peyton Jones, and Dimitrios Vytiniotis. 2017. Destination-passing Style for Efficient Memory Management. In Proceedings of the 6th ACM SIGPLAN International Workshop on Functional HighPerformance Computing (FHPC 2017). ACM, New York, NY, USA, 12–23. Google Scholar
Digital Library
- Amir Shaikhha and Lionel Parreaux. 2019. Finally, a Polymorphic Linear Algebra Language. In Proceedings of the 33rd European Conference on Object-Oriented Programming (ECOOP’19).Google Scholar
- Jeffrey Mark Siskind and Barak A Pearlmutter. 2005. Perturbation confusion and referential transparency: Correct functional implementation of forward-mode AD. (2005).Google Scholar
- Jeffrey Mark Siskind and Barak A Pearlmutter. 2008. Nesting forward-mode AD in a functional framework. Higher-Order and Symbolic Computation 21, 4 (2008), 361–376. Google Scholar
Digital Library
- Daniele G Spampinato and Markus Püschel. 2016. A basic linear algebra compiler for structured matrices. In CGO ’16. ACM. Google Scholar
Digital Library
- Suvrit Sra and Inderjit S. Dhillon. 2006. Nonnegative matrix approximation: algorithms and applications. Technical Report.Google Scholar
- Michel Steuwer, Christian Fensch, Sam Lindley, and Christophe Dubach. 2015. Generating Performance Portable Code Using Rewrite Rules: From High-level Functional Expressions to High-performance OpenCL Code. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming (ICFP 2015). ACM, New York, NY, USA, 205–217. Google Scholar
Digital Library
- Arvind Sujeeth, HyoukJoong Lee, Kevin Brown, Tiark Rompf, Hassan Chafi, Michael Wu, Anand Atreya, Martin Odersky, and Kunle Olukotun. 2011. OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning (ICML ’11). 609–616. Google Scholar
Digital Library
- Josef Svenningsson. 2002. Shortcut Fusion for Accumulating Parameters & Zip-like Functions (ICFP ’02). ACM, 124–132. Google Scholar
Digital Library
- Bo Joel Svensson and Josef Svenningsson. 2014. Defunctionalizing Push Arrays (FHPC ’14). ACM, NY, USA, 43–52.Google Scholar
- Walid Taha and Tim Sheard. 2000. MetaML and multi-stage programming with explicit annotations. Theor. Comput. Sci. 248, 1-2 (2000), 211–242. Google Scholar
Digital Library
- Bill Triggs, Philip F McLauchlan, Richard I Hartley, and Andrew W Fitzgibbon. 1999. Bundle adjustment—a modern synthesis. In Inter. workshop on vision algorithms. Springer, 298–372. Google Scholar
Digital Library
- Philip Wadler. 1988. Deforestation: Transforming programs to eliminate trees. In ESOP’88. Springer, 344–358. Google Scholar
Digital Library
- Fei Wang, James Decker, Xilun Wu, Gregory Essertel, and Tiark Rompf. 2018. Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming. In Advances in Neural Information Processing Systems. 10200– 10211. Google Scholar
Digital Library
- Matthew J Weinstein and Anil V Rao. 2016. Algorithm: ADiGator, a toolbox for the algorithmic differentiation of mathematical functions in MATLAB using source transformation via operator overloading. ACM Trans. Math. Softw (2016). Google Scholar
Digital Library
- Christopher Zach. 2014. Robust bundle adjustment revisited. In European Conference on Computer Vision. Springer, 772–787.Google Scholar
Cross Ref
Index Terms
Efficient differentiable programming in a functional array-processing language
Recommendations
When is a functional program not a functional program?
In an impure functional language, there are programs whose behaviour is completely functional (in that they behave extensionally on inputs), but the functions they compute cannot be written in the purely functional fragment of the language. That is, the ...
When is a functional program not a functional program?
ICFP '99: Proceedings of the fourth ACM SIGPLAN international conference on Functional programmingIn an impure functional language, there are programs whose behaviour is completely functional (in that they behave extensionally on inputs), but the functions they compute cannot be written in the purely functional fragment of the language. That is, the ...
Language Design for Program Manipulation
The design of procedural and object-oriented programming languages is considered with respect to how easily programs written in those languages can be formally manipulated. Current procedural languages such as Pascal, Modula-2 and Ada; generally support ...






Comments