skip to main content
research-article
Open Access

Efficient differentiable programming in a functional array-processing language

Published:26 July 2019Publication History
Skip Abstract Section

Abstract

We present a system for the automatic differentiation (AD) of a higher-order functional array-processing language. The core functional language underlying this system simultaneously supports both source-to-source forward-mode AD and global optimisations such as loop transformations. In combination, gradient computation with forward-mode AD can be as efficient as reverse mode, and that the Jacobian matrices required for numerical algorithms such as Gauss-Newton and Levenberg-Marquardt can be efficiently computed.

Skip Supplemental Material Section

Supplemental Material

a97-shaikhha.webm

References

  1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning.. In OSDI, Vol. 16. 265–283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Sameer Agarwal, Noah Snavely, Steven M Seitz, and Richard Szeliski. 2010. Bundle adjustment in the large. In European conference on computer vision. Springer, 29–42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Eugene Agichtein, Eric Brill, and Susan Dumais. 2006. Improving Web Search Ranking by Incorporating User Behavior Information. In SIGIR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Johan Anker and Josef Svenningsson. 2013. An EDSL approach to high performance Haskell programming. In ACM Haskell Symposium. 1–12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ricardo Baeza-Yates, Carlos Hurtado, and Marcelo Mendoza. 2004. Query Recommendation Using Query Logs in Search Engines. In EDBT. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Atilim Gunes Baydin, Barak A Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. 2015b. Automatic differentiation in machine learning: a survey. arXiv preprint arXiv:1502.05767 (2015).Google ScholarGoogle Scholar
  7. Atilim Gunes Baydin, Barak A Pearlmutter, and Jeffrey Mark Siskind. 2015a. Diffsharp: Automatic differentiation library. arXiv preprint arXiv:1511.07727 (2015).Google ScholarGoogle Scholar
  8. James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. 2010. Theano: A CP U and GP U math compiler in Python. In Proc. 9th Python in Science Conf. 1–7.Google ScholarGoogle Scholar
  9. Michael W. Berry, Murray Browne, Amy N. Langville, V. Paul Pauca, and Robert J. Plemmons. 2006. Algorithms and applications for approximate nonnegative matrix factorization. In Computational Statistics and Data Analysis.Google ScholarGoogle Scholar
  10. Christian Bischof, Peyvand Khademi, Andrew Mauer, and Alan Carle. 1996. ADIFOR 2.0: Automatic differentiation of Fortran 77 programs. IEEE Computational Science and Engineering 3, 3 (1996), 18–32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Christian H Bischof, HM Bucker, Bruno Lang, Arno Rasch, and Andre Vehreschild. 2002. Combining source transformation and operator overloading techniques to compute derivatives for MATLAB programs. In Source Code Analysis and Manipulation, 2002. Proceedings. Second IEEE International Workshop on. IEEE, 65–72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Charisee Chiw, Gordon Kindlmann, John Reppy, Lamont Samuels, and Nick Seltzer. 2012. Diderot: A Parallel DSL for Image Analysis and Visualization (PLDI ’12). ACM, 111–120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Koen Claessen, Mary Sheeran, and Bo Joel Svensson. 2012. Expressive Array Constructs in an Embedded GP U Kernel Programming Language (DAMP ’12). ACM, NY, USA, 21–30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cliff Click. 1995. Global code motion/global value numbering. ACM Sigplan Notices 30, 6 (1995), 246–257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Duncan Coutts, Roman Leshchinskiy, and Don Stewart. 2007. Stream Fusion. From Lists to Streams to Nothing at All (ICFP ’07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Olivier Danvy and Andrzej Filinski. 1990. Abstracting control. In Proceedings of the 1990 ACM conference on LISP and functional programming. ACM, 151–160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Frédéric De Mesmay, Arpad Rimmel, Yevgen Voronenko, and Markus Püschel. 2009. Bandit-based optimization on graphs with application to library performance tuning. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 729–736. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Zachary DeVito, Michael Mara, Michael Zollhöfer, Gilbert Bernstein, Jonathan Ragan-Kelley, Christian Theobalt, Pat Hanrahan, Matthew Fisher, and Matthias Nießner. 2016. Opt: A Domain Specific Language for Non-linear Least Squares Optimization in Graphics and Imaging. arXiv preprint arXiv:1604.06525 (2016).Google ScholarGoogle Scholar
  19. Conal Elliott. 2018. The Simple Essence of Automatic Differentiation. Proc. ACM Program. Lang. 2, ICFP, Article 70 (July 2018), 70:1–70:29 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Conal M Elliott. 2009. Beautiful differentiation. In ACM Sigplan Notices, Vol. 44. ACM, 191–202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Cormac Flanagan, Amr Sabry, Bruce F Duba, and Matthias Felleisen. 1993. The Essence of Compiling with Continuations. In ACM SIGPLAN Notices, Vol. 28. ACM, 237–247. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Shaun A Forth. 2006. An efficient overloaded implementation of forward mode automatic differentiation in MATLAB. ACM Transactions on Mathematical Software (TOMS) 32, 2 (2006), 195–222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jeremy Gibbons. 2006. Fission for Program Comprehension. In Mathematics of Program Construction, Tarmo Uustalu (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 162–179. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Andrew Gill, John Launchbury, and Simon L Peyton Jones. 1993. A short cut to deforestation (FPCA). ACM, 223–232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Clemens Grelck and Sven-Bodo Scholz. 2006. SAC—A Functional Array Language for Efficient Multi-threaded Execution. Int. Journal of Parallel Programming 34, 4 (2006), 383–427. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Brian Guenter. 2007. Efficient Symbolic Differentiation for Graphics Applications. ACM Trans. Graph. 26, 3, Article 108 (July 2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Laurent Hascoet and Valérie Pascual. 2013. The Tapenade Automatic Differentiation Tool: Principles, Model, and Specification. ACM Trans. Math. Softw. 39, 3, Article 20 (May 2013), 43 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Troels Henriksen, Niels GW Serup, Martin Elsman, Fritz Henglein, and Cosmin E Oancea. 2017. Futhark: purely functional GP U-programming with nested parallelism and in-place array updates. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 556–571. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Florent Hivert and N Thiéry. 2004. MuPAD-Combinat, an open-source package for research in algebraic combinatorics. Sém. Lothar. Combin 51 (2004), 70.Google ScholarGoogle Scholar
  30. Robin J. Hogan. 2014. Fast Reverse-Mode Automatic Differentiation Using Expression Templates in C++. ACM Trans. Math. Softw. 40, 4, Article 26 (July 2014), 16 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Paul Hudak. 1996. Building Domain-specific Embedded Languages. ACM Comput. Surv. 28, 4 (Dec. 1996). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Graham Hutton. 1999. A tutorial on the universality and expressiveness of fold. Journal of Functional Programming 9, 4 (1999), 355–372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Kenneth E Iverson. 1962. A Programming Language. In Proceedings of the May 1-3, 1962, spring joint computer conference. ACM, 345–351. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Hong-Jiang Zhang Ji-rong Wen, Jian-Yun Nie and. 2002. Query Clustering Using User Logs. ACM Transactions on Information Systems 20, 1 (2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Simon Peyton Jones, Andrew Tolmach, and Tony Hoare. 2001. Playing by the rules: rewriting as a practical optimisation technique in GHC. In Haskell workshop, Vol. 1. 203–233.Google ScholarGoogle Scholar
  36. Manohar Jonnalagedda and Sandro Stucki. 2015. Fold-based Fusion As a Library: A Generative Programming Pearl. In Proceedings of the 6th ACM SIGPLAN Symposium on Scala. ACM, 41–50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jerzy Karczmarczuk. 1999. Functional differentiation of computer programs. ACM SIGPLAN Notices 34, 1 (1999), 195–203. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Kamil A Khan and Paul I Barton. 2015. A vector forward mode of automatic differentiation for generalized derivative evaluation. Optimization Methods and Software 30, 6 (2015), 1185–1212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Oleg Kiselyov, Aggelos Biboudis, Nick Palladinos, and Yannis Smaragdakis. 2017. Stream Fusion, to Completeness. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL 2017). ACM, New York, NY, USA, 285–299. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Chris Leary and Todd Wang. 2017. XLA: TensorFlow, compiled. TensorFlow Dev Summit (2017).Google ScholarGoogle Scholar
  41. Kenneth Levenberg. 1944. A method for the solution of certain non-linear problems in least squares. Quarterly of applied mathematics 2, 2 (1944), 164–168.Google ScholarGoogle Scholar
  42. Chao Liu, Hung-chih Yang, Jinliang Fan, Li-Wei He, and Yi-Min Wang. 2010. Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce. In Proceedings of the 19th international conference on World wide web. ACM, 681–690. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Dougal Maclaurin, David Duvenaud, and Ryan P Adams. 2015. Autograd: Effortless gradients in numpy. In ICML 2015 AutoML Workshop.Google ScholarGoogle Scholar
  44. Donald W Marquardt. 1963. An algorithm for least-squares estimation of nonlinear parameters. Journal of the society for Industrial and Applied Mathematics 11, 2 (1963), 431–441.Google ScholarGoogle ScholarCross RefCross Ref
  45. Jorge J Moré. 1978. The Levenberg-Marquardt algorithm: implementation and theory. In Numerical analysis. Springer, 105–116.Google ScholarGoogle Scholar
  46. Sri Hari Krishna Narayanan, Boyana Norris, and Beata Winnicka. 2010. ADIC2: Development of a component source transformation system for differentiating C and C++. Procedia Computer Science 1, 1 (2010), 1845–1853.Google ScholarGoogle ScholarCross RefCross Ref
  47. Peter Norvig. 1992. Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Lionel Parreaux, Amir Shaikhha, and Christoph E. Koch. 2017a. Quoted Staged Rewriting: A Practical Approach to Librarydefined Optimizations. In Proceedings of the 16th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE 2017). ACM, New York, NY, USA, 131–145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Lionel Parreaux, Antoine Voizard, Amir Shaikhha, and Christoph E. Koch. 2017b. Unifying Analytic and Statically-typed Quasiquotes. Proc. ACM Program. Lang. 2, POPL, Article 13 (Dec. 2017), 33 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch. (2017).Google ScholarGoogle Scholar
  51. Barak A Pearlmutter and Jeffrey Mark Siskind. 2007. Lazy multivariate higher-order forward-mode AD. In ACM SIGPLAN Notices, Vol. 42. ACM, 155–160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Barak A Pearlmutter and Jeffrey Mark Siskind. 2008. Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator. ACM Transactions on Programming Languages and Systems (TOPLAS) 30, 2 (2008), 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Markus Puschel, José MF Moura, Jeremy R Johnson, David Padua, Manuela M Veloso, Bryan W Singer, Jianxin Xiong, Franz Franchetti, Aca Gacic, Yevgen Voronenko, et al. 2005. SPIRAL: Code generation for DSP transforms. Proc. IEEE 93, 2 (2005), 232–275.Google ScholarGoogle ScholarCross RefCross Ref
  54. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines (PLDI ’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Jarrett Revels, Miles Lubin, and Theodore Papamarkou. 2016. Forward-mode automatic differentiation in Julia. arXiv preprint arXiv:1607.07892 (2016).Google ScholarGoogle Scholar
  56. Tiark Rompf and Martin Odersky. 2010. Lightweight Modular Staging: A Pragmatic Approach to Runtime Code Generation and Compiled DSLs. In the ninth international conference on Generative programming and component engineering (GPCE ’10). ACM, New York, NY, USA, 127–136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Tiark Rompf, Arvind K. Sujeeth, Nada Amin, Kevin J. Brown, Vojin Jovanovic, HyoukJoong Lee, Manohar Jonnalagedda, Kunle Olukotun, and Martin Odersky. 2013. Optimizing Data Structures in High-level Programs: New Directions for Extensible Compilers Based on Staging. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’13). ACM, New York, NY, USA, 497–510. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Garret Catron, Summer Deng, Roman Dzhabarov, Nick Gibson, James Hegeman, Meghan Lele, Roman Levenstein, et al. 2018. Glow: Graph lowering compiler techniques for neural networks. arXiv preprint arXiv:1805.00907 (2018).Google ScholarGoogle Scholar
  59. Amir Shaikhha, Mohammad Dashti, and Christoph Koch. 2018. Push versus Pull-Based Loop Fusion in Query Engines. Journal of Functional Programming 28 (2018), e10.Google ScholarGoogle ScholarCross RefCross Ref
  60. Amir Shaikhha, Andrew Fitzgibbon, Simon Peyton Jones, and Dimitrios Vytiniotis. 2017. Destination-passing Style for Efficient Memory Management. In Proceedings of the 6th ACM SIGPLAN International Workshop on Functional HighPerformance Computing (FHPC 2017). ACM, New York, NY, USA, 12–23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Amir Shaikhha and Lionel Parreaux. 2019. Finally, a Polymorphic Linear Algebra Language. In Proceedings of the 33rd European Conference on Object-Oriented Programming (ECOOP’19).Google ScholarGoogle Scholar
  62. Jeffrey Mark Siskind and Barak A Pearlmutter. 2005. Perturbation confusion and referential transparency: Correct functional implementation of forward-mode AD. (2005).Google ScholarGoogle Scholar
  63. Jeffrey Mark Siskind and Barak A Pearlmutter. 2008. Nesting forward-mode AD in a functional framework. Higher-Order and Symbolic Computation 21, 4 (2008), 361–376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Daniele G Spampinato and Markus Püschel. 2016. A basic linear algebra compiler for structured matrices. In CGO ’16. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Suvrit Sra and Inderjit S. Dhillon. 2006. Nonnegative matrix approximation: algorithms and applications. Technical Report.Google ScholarGoogle Scholar
  66. Michel Steuwer, Christian Fensch, Sam Lindley, and Christophe Dubach. 2015. Generating Performance Portable Code Using Rewrite Rules: From High-level Functional Expressions to High-performance OpenCL Code. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming (ICFP 2015). ACM, New York, NY, USA, 205–217. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Arvind Sujeeth, HyoukJoong Lee, Kevin Brown, Tiark Rompf, Hassan Chafi, Michael Wu, Anand Atreya, Martin Odersky, and Kunle Olukotun. 2011. OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning (ICML ’11). 609–616. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Josef Svenningsson. 2002. Shortcut Fusion for Accumulating Parameters & Zip-like Functions (ICFP ’02). ACM, 124–132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Bo Joel Svensson and Josef Svenningsson. 2014. Defunctionalizing Push Arrays (FHPC ’14). ACM, NY, USA, 43–52.Google ScholarGoogle Scholar
  70. Walid Taha and Tim Sheard. 2000. MetaML and multi-stage programming with explicit annotations. Theor. Comput. Sci. 248, 1-2 (2000), 211–242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Bill Triggs, Philip F McLauchlan, Richard I Hartley, and Andrew W Fitzgibbon. 1999. Bundle adjustment—a modern synthesis. In Inter. workshop on vision algorithms. Springer, 298–372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Philip Wadler. 1988. Deforestation: Transforming programs to eliminate trees. In ESOP’88. Springer, 344–358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Fei Wang, James Decker, Xilun Wu, Gregory Essertel, and Tiark Rompf. 2018. Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming. In Advances in Neural Information Processing Systems. 10200– 10211. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Matthew J Weinstein and Anil V Rao. 2016. Algorithm: ADiGator, a toolbox for the algorithmic differentiation of mathematical functions in MATLAB using source transformation via operator overloading. ACM Trans. Math. Softw (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Christopher Zach. 2014. Robust bundle adjustment revisited. In European Conference on Computer Vision. Springer, 772–787.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Efficient differentiable programming in a functional array-processing language

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!