skip to main content
research-article
Open Access

ADEV: Sound Automatic Differentiation of Expected Values of Probabilistic Programs

Published:11 January 2023Publication History
Skip Abstract Section

Abstract

Optimizing the expected values of probabilistic processes is a central problem in computer science and its applications, arising in fields ranging from artificial intelligence to operations research to statistical computing. Unfortunately, automatic differentiation techniques developed for deterministic programs do not in general compute the correct gradients needed for widely used solutions based on gradient-based optimization.

In this paper, we present ADEV, an extension to forward-mode AD that correctly differentiates the expectations of probabilistic processes represented as programs that make random choices. Our algorithm is a source-to-source program transformation on an expressive, higher-order language for probabilistic computation, with both discrete and continuous probability distributions. The result of our transformation is a new probabilistic program, whose expected return value is the derivative of the original program’s expectation. This output program can be run to generate unbiased Monte Carlo estimates of the desired gradient, that can be used within the inner loop of stochastic gradient descent. We prove ADEV correct using logical relations over the denotations of the source and target probabilistic programs. Because it modularly extends forward-mode AD, our algorithm lends itself to a concise implementation strategy, which we exploit to develop a prototype in just a few dozen lines of Haskell (https://github.com/probcomp/adev).

Skip Supplemental Material Section

Supplemental Material

References

  1. Martín Abadi and Gordon D. Plotkin. 2020. A simple differentiable programming language. Proc. ACM Program. Lang., 4, 38:1–38:28. https://doi.org/10.1145/3371106 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Amal J. Ahmed. 2006. Step-Indexed Syntactic Logical Relations for Recursive and Quantified Types. In Programming Languages and Systems, 15th European Symposium on Programming, ESOP 2006, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2006, Vienna, Austria, March 27-28, 2006, Proceedings, Peter Sestoft (Ed.) (Lecture Notes in Computer Science, Vol. 3924). Springer, 69–83. https://doi.org/10.1007/11693024_6 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Andrew W. Appel, Paul-André Melliès, Christopher D. Richards, and Jérôme Vouillon. 2007. A very modal model of a modern, major, general type system. In Proceedings of the 34th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2007, Nice, France, January 17-19, 2007, Martin Hofmann and Matthias Felleisen (Eds.). ACM, 109–122. https://doi.org/10.1145/1190216.1190235 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Gaurav Arya, Moritz Schauer, Frank Schäfer, and Chris Rackauckas. 2022. Automatic Differentiation of Programs with Discrete Randomness. CoRR, abs/2210.08572 (2022), https://doi.org/10.48550/arXiv.2210.08572 arXiv:2210.08572. Google ScholarGoogle Scholar
  5. Sai Praveen Bangaru, Jesse Michel, Kevin Mu, Gilbert Bernstein, Tzu-Mao Li, and Jonathan Ragan-Kelley. 2021. Systematically differentiating parametric discontinuities. ACM Trans. Graph., 40, 4 (2021), 107:1–107:18. https://doi.org/10.1145/3450626.3459775 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Gilles Barthe, Raphaëlle Crubillé, Ugo Dal Lago, and Francesco Gavazzo. 2020. On the Versatility of Open Logical Relations - Continuity, Automatic Differentiation, and a Containment Theorem. In Programming Languages and Systems - 29th European Symposium on Programming, ESOP 2020, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020, Dublin, Ireland, April 25-30, 2020, Proceedings, Peter Müller (Ed.) (Lecture Notes in Computer Science, Vol. 12075). Springer, 56–83. https://doi.org/10.1007/978-3-030-44914-8_3 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Eli Bingham, Jonathan P. Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj Pradhan, Theofanis Karaletsos, Rohit Singh, Paul A. Szerlip, Paul Horsfall, and Noah D. Goodman. 2019. Pyro: Deep Universal Probabilistic Programming. J. Mach. Learn. Res., 20 (2019), 28:1–28:6. http://jmlr.org/papers/v20/18-403.html Google ScholarGoogle Scholar
  8. Aloïs Brunel, Damiano Mazza, and Michele Pagani. 2020. Backpropagation in the simply typed lambda-calculus with linear negation. Proc. ACM Program. Lang., 4, POPL (2020), 64:1–64:27. https://doi.org/10.1145/3371132 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bob Carpenter, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. Stan: A Probabilistic Programming Language. Journal of Statistical Software, 76, 1 (2017), 1–32. https://doi.org/10.18637/jss.v076.i01 Google ScholarGoogle ScholarCross RefCross Ref
  10. Marco Cusumano-Towner, Alexander K Lew, and Vikash K Mansinghka. 2020. Automating involutive MCMC using probabilistic and differentiable programming. arXiv preprint arXiv:2007.09871. Google ScholarGoogle Scholar
  11. Marco F. Cusumano-Towner, Feras A. Saad, Alexander K. Lew, and Vikash K. Mansinghka. 2019. Gen: a general-purpose probabilistic programming system with programmable inference. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, Phoenix, AZ, USA, June 22-26, 2019, Kathryn S. McKinley and Kathleen Fisher (Eds.). ACM, 221–236. https://doi.org/10.1145/3314221.3314642 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. SW Director and R Rohrer. 1969. Automated network design-the frequency-domain case. IEEE Transactions on Circuit Theory, 16, 3 (1969), 330–337. Google ScholarGoogle ScholarCross RefCross Ref
  13. Thomas Ehrhard, Michele Pagani, and Christine Tasson. 2018. Measurable cones and stable, measurable functions: a model for probabilistic higher-order programming. Proc. ACM Program. Lang., 2, POPL (2018), 59:1–59:28. https://doi.org/10.1145/3158147 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Mikhail Figurnov, Shakir Mohamed, and Andriy Mnih. 2018. Implicit Reparameterization Gradients. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (Eds.). 439–450. https://proceedings.neurips.cc/paper/2018/hash/92c8c96e4c37100777c7190b76d28233-Abstract.html Google ScholarGoogle Scholar
  15. Jakob N. Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric P. Xing, and Shimon Whiteson. 2018. DiCE: The Infinitely Differentiable Monte Carlo Estimator. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, Jennifer G. Dy and Andreas Krause (Eds.) (Proceedings of Machine Learning Research, Vol. 80). PMLR, 1524–1533. http://proceedings.mlr.press/v80/foerster18a.html Google ScholarGoogle Scholar
  16. Andreas Griewank and Andrea Walther. 2008. Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition. SIAM. isbn:978-0-89871-659-7 https://doi.org/10.1137/1.9780898717761 Google ScholarGoogle ScholarCross RefCross Ref
  17. Bernd Heidergott and Felisa J Vázquez-Abad. 2000. Measure valued differentiation for stochastic processes: The finite horizon case. Eurandom. Google ScholarGoogle Scholar
  18. Chris Heunen, Ohad Kammar, Sam Staton, and Hongseok Yang. 2017. A convenient category for higher-order probability theory. In 32nd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2017, Reykjavik, Iceland, June 20-23, 2017. IEEE Computer Society, 1–12. https://doi.org/10.1109/LICS.2017.8005137 Google ScholarGoogle ScholarCross RefCross Ref
  19. Mathieu Huot, Sam Staton, and Matthijs Vákár. 2020. Correctness of Automatic Differentiation via Diffeologies and Categorical Gluing. In Foundations of Software Science and Computation Structures - 23rd International Conference, FOSSACS 2020, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020, Dublin, Ireland, April 25-30, 2020, Proceedings, Jean Goubault-Larrecq and Barbara König (Eds.) (Lecture Notes in Computer Science, Vol. 12077). Springer, 319–338. https://doi.org/10.1007/978-3-030-45231-5_17 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Shin-ya Katsumata. 2013. Relating computational effects by TT-lifting. Information and Computation, 222 (2013), 228–246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). arxiv:1312.6114 Google ScholarGoogle Scholar
  22. Nathan L. Kleinman, James C. Spall, and Daniel Q. Naiman. 1999. Simulation-Based Optimization with Stochastic Approximation Using Common Random Numbers. Management Science, 45 (1999), 1570–1578. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Anders Kock. 2011. Commutative monads as a theory of distributions. arXiv preprint arXiv:1108.5952. Google ScholarGoogle Scholar
  24. Faustyna Krawiec, Simon Peyton Jones, Neel Krishnaswami, Tom Ellis, Richard A. Eisenberg, and Andrew W. Fitzgibbon. 2022. Provably correct, asymptotically efficient, higher-order reverse-mode automatic differentiation. Proc. ACM Program. Lang., 6, POPL (2022), 1–30. https://doi.org/10.1145/3498710 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Wonyeol Lee, Xavier Rival, and Hongseok Yang. 2022. Smoothness Analysis for Probabilistic Programs with Application to Optimised Variational Inference. CoRR, abs/2208.10530 (2022), https://doi.org/10.48550/arXiv.2208.10530 arXiv:2208.10530. Google ScholarGoogle Scholar
  26. Wonyeol Lee, Hangyeol Yu, Xavier Rival, and Hongseok Yang. 2020. On Correctness of Automatic Differentiation for Non-Differentiable Functions. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https://proceedings.neurips.cc/paper/2020/hash/4aaa76178f8567e05c8e8295c96171d8-Abstract.html Google ScholarGoogle Scholar
  27. Wonyeol Lee, Hangyeol Yu, Xavier Rival, and Hongseok Yang. 2020. Towards verified stochastic variational inference for probabilistic programs. Proc. ACM Program. Lang., 4, POPL (2020), 16:1–16:33. https://doi.org/10.1145/3371084 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Wonyeol Lee, Hangyeol Yu, and Hongseok Yang. 2018. Reparameterization Gradient for Non-differentiable Models. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (Eds.). 5558–5568. https://proceedings.neurips.cc/paper/2018/hash/b096577e264d1ebd6b41041f392eec23-Abstract.html Google ScholarGoogle Scholar
  29. Alexander K. Lew, Marco F. Cusumano-Towner, Benjamin Sherman, Michael Carbin, and Vikash K. Mansinghka. 2020. Trace types and denotational semantics for sound programmable inference in probabilistic languages. Proc. ACM Program. Lang., 4, POPL (2020), 19:1–19:32. https://doi.org/10.1145/3371087 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Alexander K. Lew, Mathieu Huot, and Vikash K. Mansinghka. 2021. Towards Denotational Semantics of AD for Higher-Order, Recursive, Probabilistic Languages. CoRR, abs/2111.15456 (2021), arXiv:2111.15456. arxiv:2111.15456 Google ScholarGoogle Scholar
  31. Carol Mak, C.-H. Luke Ong, Hugo Paquet, and Dominik Wagner. 2021. Densities of Almost Surely Terminating Probabilistic Programs are Differentiable Almost Everywhere. In Programming Languages and Systems - 30th European Symposium on Programming, ESOP 2021, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021, Luxembourg City, Luxembourg, March 27 - April 1, 2021, Proceedings, Nobuko Yoshida (Ed.) (Lecture Notes in Computer Science, Vol. 12648). Springer, 432–461. https://doi.org/10.1007/978-3-030-72019-3_16 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Vikash Mansinghka, Daniel Selsam, and Yura N. Perov. 2014. Venture: a higher-order probabilistic programming platform with programmable inference. CoRR, abs/1404.0099 (2014), arXiv:1404.0099. arxiv:1404.0099 Google ScholarGoogle Scholar
  33. Vikash K. Mansinghka, Ulrich Schaechtle, Shivam Handa, Alexey Radul, Yutian Chen, and Martin C. Rinard. 2018. Probabilistic programming with programmable inference. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018, Jeffrey S. Foster and Dan Grossman (Eds.). ACM, 603–616. https://doi.org/10.1145/3192366.3192409 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Damiano Mazza and Michele Pagani. 2021. Automatic differentiation in PCF. Proc. ACM Program. Lang., 5, POPL (2021), 1–27. https://doi.org/10.1145/3434309 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Andriy Mnih and Karol Gregor. 2014. Neural Variational Inference and Learning in Belief Networks. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014 (JMLR Workshop and Conference Proceedings, Vol. 32). JMLR.org, 1791–1799. http://proceedings.mlr.press/v32/mnih14.html Google ScholarGoogle Scholar
  36. Shakir Mohamed, Mihaela Rosca, Michael Figurnov, and Andriy Mnih. 2020. Monte Carlo Gradient Estimation in Machine Learning. J. Mach. Learn. Res., 21 (2020), 132:1–132:62. http://jmlr.org/papers/v21/19-346.html Google ScholarGoogle Scholar
  37. Christian A. Naesseth, Francisco J. R. Ruiz, Scott W. Linderman, and David M. Blei. 2017. Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA, Aarti Singh and Xiaojin (Jerry) Zhu (Eds.) (Proceedings of Machine Learning Research, Vol. 54). PMLR, 489–498. http://proceedings.mlr.press/v54/naesseth17a.html Google ScholarGoogle Scholar
  38. Siddharth Narayanaswamy, Brooks Paige, Jan-Willem van de Meent, Alban Desmaison, Noah D. Goodman, Pushmeet Kohli, Frank D. Wood, and Philip H. S. Torr. 2017. Learning Disentangled Representations with Semi-Supervised Deep Generative Models. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5925–5935. https://proceedings.neurips.cc/paper/2017/hash/9cb9ed4f35cf7c2f295cc2bc6f732a84-Abstract.html Google ScholarGoogle Scholar
  39. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Z. Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. 8024–8035. https://proceedings.neurips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html Google ScholarGoogle Scholar
  40. Brigitte Pientka, David Thibodeau, Andreas Abel, Francisco Ferreira, and Rébecca Zucchini. 2019. A Type Theory for Defining Logics and Proofs. In 34th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2019, Vancouver, BC, Canada, June 24-27, 2019. IEEE, 1–13. https://doi.org/10.1109/LICS.2019.8785683 Google ScholarGoogle ScholarCross RefCross Ref
  41. Louis B. Rall. 1981. Automatic Differentiation: Techniques and Applications (Lecture Notes in Computer Science, Vol. 120). Springer. isbn:3-540-10861-0 https://doi.org/10.1007/3-540-10861-0 Google ScholarGoogle ScholarCross RefCross Ref
  42. Rajesh Ranganath, Sean Gerrish, and David M. Blei. 2014. Black Box Variational Inference. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, AISTATS 2014, Reykjavik, Iceland, April 22-25, 2014 (JMLR Workshop and Conference Proceedings, Vol. 33). JMLR.org, 814–822. http://proceedings.mlr.press/v33/ranganath14.html Google ScholarGoogle Scholar
  43. John Schulman. 2016. Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs. Ph.D. Dissertation. University of California, Berkeley, USA. https://www.escholarship.org/uc/item/9z908523 Google ScholarGoogle Scholar
  44. John Schulman, Nicolas Heess, Theophane Weber, and Pieter Abbeel. 2015. Gradient Estimation Using Stochastic Computation Graphs. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett (Eds.). 3528–3536. https://proceedings.neurips.cc/paper/2015/hash/de03beffeed9da5f3639a621bcab5dd4-Abstract.html Google ScholarGoogle Scholar
  45. Adam Ścibior, Ohad Kammar, Matthijs Vákár, Sam Staton, Hongseok Yang, Yufei Cai, Klaus Ostermann, Sean K. Moss, Chris Heunen, and Zoubin Ghahramani. 2018. Denotational validation of higher-order Bayesian inference. Proc. ACM Program. Lang., 2, POPL (2018), 60:1–60:29. https://doi.org/10.1145/3158148 Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Adam Ścibior, Vaden Masrani, and Frank Wood. 2021. Differentiable Particle Filtering without Modifying the Forward Pass. CoRR, abs/2106.10314 (2021), arXiv:2106.10314. arxiv:2106.10314 Google ScholarGoogle Scholar
  47. Benjamin Sherman, Jesse Michel, and Michael Carbin. 2021. λ _S: computable semantics for differentiable programming with higher-order functions and datatypes. Proc. ACM Program. Lang., 5, POPL (2021), 1–31. https://doi.org/10.1145/3434284 Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Matthijs Vákár. 2020. Denotational Correctness of Forward-Mode Automatic Differentiation for Iteration and Recursion. arXiv preprint arXiv:2007.05282. Google ScholarGoogle Scholar
  49. Matthijs Vákár, Ohad Kammar, and Sam Staton. 2019. A domain theory for statistical probabilistic programming. Proc. ACM Program. Lang., 3, POPL (2019), 36:1–36:29. https://doi.org/10.1145/3290349 Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Emile van Krieken, Jakub M. Tomczak, and Annette ten Teije. 2021. Storchastic: A Framework for General Stochastic Automatic Differentiation. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 7574–7587. https://proceedings.neurips.cc/paper/2021/hash/3dfe2f633108d604df160cd1b01710db-Abstract.html Google ScholarGoogle Scholar
  51. Théophane Weber, Nicolas Heess, Lars Buesing, and David Silver. 2019. Credit Assignment Techniques in Stochastic Computation Graphs. In The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019, 16-18 April 2019, Naha, Okinawa, Japan, Kamalika Chaudhuri and Masashi Sugiyama (Eds.) (Proceedings of Machine Learning Research, Vol. 89). PMLR, 2650–2660. http://proceedings.mlr.press/v89/weber19a.html Google ScholarGoogle Scholar
  52. David Wingate, Noah D. Goodman, Andreas Stuhlmüller, and Jeffrey Mark Siskind. 2011. Nonstandard Interpretations of Probabilistic Programs for Efficient Inference. In Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, Granada, Spain, John Shawe-Taylor, Richard S. Zemel, Peter L. Bartlett, Fernando C. N. Pereira, and Kilian Q. Weinberger (Eds.). 1152–1160. https://proceedings.neurips.cc/paper/2011/hash/0d7de1aca9299fe63f3e0041f02638a3-Abstract.html Google ScholarGoogle Scholar
  53. Yizhou Zhang and Nada Amin. 2022. Reasoning about "reasoning about reasoning": semantics and contextual equivalence for probabilistic programs with nested queries and recursion. Proc. ACM Program. Lang., 6, POPL (2022), 1–28. https://doi.org/10.1145/3498677 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ADEV: Sound Automatic Differentiation of Expected Values of Probabilistic Programs

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!