Abstract
Optimizing the expected values of probabilistic processes is a central problem in computer science and its applications, arising in fields ranging from artificial intelligence to operations research to statistical computing. Unfortunately, automatic differentiation techniques developed for deterministic programs do not in general compute the correct gradients needed for widely used solutions based on gradient-based optimization.
In this paper, we present ADEV, an extension to forward-mode AD that correctly differentiates the expectations of probabilistic processes represented as programs that make random choices. Our algorithm is a source-to-source program transformation on an expressive, higher-order language for probabilistic computation, with both discrete and continuous probability distributions. The result of our transformation is a new probabilistic program, whose expected return value is the derivative of the original program’s expectation. This output program can be run to generate unbiased Monte Carlo estimates of the desired gradient, that can be used within the inner loop of stochastic gradient descent. We prove ADEV correct using logical relations over the denotations of the source and target probabilistic programs. Because it modularly extends forward-mode AD, our algorithm lends itself to a concise implementation strategy, which we exploit to develop a prototype in just a few dozen lines of Haskell (https://github.com/probcomp/adev).
Supplemental Material
Available for Download
Paper with appendix: This is a copy of the paper "ADEV: Sound Automatic Differentiation of Expected Values of Probabilistic Programs" that contains the appendices. Appendix A contains a code listing, giving minimal Haskell source code for implementing ADEV. (A fuller-featured online version is maintained at https://github.com/probcomp/adev.) Appendix B describes 11 modular extensions to ADEV, encoding new gradient estimation strategies or variance reduction techniques. Appendix C gives additional figures omitted from the main paper for space, including the semantics of ADEV primitives, details on the static analysis described in Section 5, and a compendium of the logical relations used for our correctness proof at all types in our language. Finally, Appendix D gives a category-theoretic exposition of some of the ideas behind our correctness proof and the quasi-Borel semantics introduced in Section 5.
- Martín Abadi and Gordon D. Plotkin. 2020. A simple differentiable programming language. Proc. ACM Program. Lang., 4, 38:1–38:28. https://doi.org/10.1145/3371106
Google Scholar
Digital Library
- Amal J. Ahmed. 2006. Step-Indexed Syntactic Logical Relations for Recursive and Quantified Types. In Programming Languages and Systems, 15th European Symposium on Programming, ESOP 2006, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2006, Vienna, Austria, March 27-28, 2006, Proceedings, Peter Sestoft (Ed.) (Lecture Notes in Computer Science, Vol. 3924). Springer, 69–83. https://doi.org/10.1007/11693024_6
Google Scholar
Digital Library
- Andrew W. Appel, Paul-André Melliès, Christopher D. Richards, and Jérôme Vouillon. 2007. A very modal model of a modern, major, general type system. In Proceedings of the 34th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2007, Nice, France, January 17-19, 2007, Martin Hofmann and Matthias Felleisen (Eds.). ACM, 109–122. https://doi.org/10.1145/1190216.1190235
Google Scholar
Digital Library
- Gaurav Arya, Moritz Schauer, Frank Schäfer, and Chris Rackauckas. 2022. Automatic Differentiation of Programs with Discrete Randomness. CoRR, abs/2210.08572 (2022), https://doi.org/10.48550/arXiv.2210.08572 arXiv:2210.08572.
Google Scholar
- Sai Praveen Bangaru, Jesse Michel, Kevin Mu, Gilbert Bernstein, Tzu-Mao Li, and Jonathan Ragan-Kelley. 2021. Systematically differentiating parametric discontinuities. ACM Trans. Graph., 40, 4 (2021), 107:1–107:18. https://doi.org/10.1145/3450626.3459775
Google Scholar
Digital Library
- Gilles Barthe, Raphaëlle Crubillé, Ugo Dal Lago, and Francesco Gavazzo. 2020. On the Versatility of Open Logical Relations - Continuity, Automatic Differentiation, and a Containment Theorem. In Programming Languages and Systems - 29th European Symposium on Programming, ESOP 2020, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020, Dublin, Ireland, April 25-30, 2020, Proceedings, Peter Müller (Ed.) (Lecture Notes in Computer Science, Vol. 12075). Springer, 56–83. https://doi.org/10.1007/978-3-030-44914-8_3
Google Scholar
Digital Library
- Eli Bingham, Jonathan P. Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj Pradhan, Theofanis Karaletsos, Rohit Singh, Paul A. Szerlip, Paul Horsfall, and Noah D. Goodman. 2019. Pyro: Deep Universal Probabilistic Programming. J. Mach. Learn. Res., 20 (2019), 28:1–28:6. http://jmlr.org/papers/v20/18-403.html
Google Scholar
- Aloïs Brunel, Damiano Mazza, and Michele Pagani. 2020. Backpropagation in the simply typed lambda-calculus with linear negation. Proc. ACM Program. Lang., 4, POPL (2020), 64:1–64:27. https://doi.org/10.1145/3371132
Google Scholar
Digital Library
- Bob Carpenter, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. Stan: A Probabilistic Programming Language. Journal of Statistical Software, 76, 1 (2017), 1–32. https://doi.org/10.18637/jss.v076.i01
Google Scholar
Cross Ref
- Marco Cusumano-Towner, Alexander K Lew, and Vikash K Mansinghka. 2020. Automating involutive MCMC using probabilistic and differentiable programming. arXiv preprint arXiv:2007.09871.
Google Scholar
- Marco F. Cusumano-Towner, Feras A. Saad, Alexander K. Lew, and Vikash K. Mansinghka. 2019. Gen: a general-purpose probabilistic programming system with programmable inference. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, Phoenix, AZ, USA, June 22-26, 2019, Kathryn S. McKinley and Kathleen Fisher (Eds.). ACM, 221–236. https://doi.org/10.1145/3314221.3314642
Google Scholar
Digital Library
- SW Director and R Rohrer. 1969. Automated network design-the frequency-domain case. IEEE Transactions on Circuit Theory, 16, 3 (1969), 330–337.
Google Scholar
Cross Ref
- Thomas Ehrhard, Michele Pagani, and Christine Tasson. 2018. Measurable cones and stable, measurable functions: a model for probabilistic higher-order programming. Proc. ACM Program. Lang., 2, POPL (2018), 59:1–59:28. https://doi.org/10.1145/3158147
Google Scholar
Digital Library
- Mikhail Figurnov, Shakir Mohamed, and Andriy Mnih. 2018. Implicit Reparameterization Gradients. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (Eds.). 439–450. https://proceedings.neurips.cc/paper/2018/hash/92c8c96e4c37100777c7190b76d28233-Abstract.html
Google Scholar
- Jakob N. Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric P. Xing, and Shimon Whiteson. 2018. DiCE: The Infinitely Differentiable Monte Carlo Estimator. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, Jennifer G. Dy and Andreas Krause (Eds.) (Proceedings of Machine Learning Research, Vol. 80). PMLR, 1524–1533. http://proceedings.mlr.press/v80/foerster18a.html
Google Scholar
- Andreas Griewank and Andrea Walther. 2008. Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition. SIAM. isbn:978-0-89871-659-7 https://doi.org/10.1137/1.9780898717761
Google Scholar
Cross Ref
- Bernd Heidergott and Felisa J Vázquez-Abad. 2000. Measure valued differentiation for stochastic processes: The finite horizon case. Eurandom.
Google Scholar
- Chris Heunen, Ohad Kammar, Sam Staton, and Hongseok Yang. 2017. A convenient category for higher-order probability theory. In 32nd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2017, Reykjavik, Iceland, June 20-23, 2017. IEEE Computer Society, 1–12. https://doi.org/10.1109/LICS.2017.8005137
Google Scholar
Cross Ref
- Mathieu Huot, Sam Staton, and Matthijs Vákár. 2020. Correctness of Automatic Differentiation via Diffeologies and Categorical Gluing. In Foundations of Software Science and Computation Structures - 23rd International Conference, FOSSACS 2020, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020, Dublin, Ireland, April 25-30, 2020, Proceedings, Jean Goubault-Larrecq and Barbara König (Eds.) (Lecture Notes in Computer Science, Vol. 12077). Springer, 319–338. https://doi.org/10.1007/978-3-030-45231-5_17
Google Scholar
Digital Library
- Shin-ya Katsumata. 2013. Relating computational effects by TT-lifting. Information and Computation, 222 (2013), 228–246.
Google Scholar
Digital Library
- Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). arxiv:1312.6114
Google Scholar
- Nathan L. Kleinman, James C. Spall, and Daniel Q. Naiman. 1999. Simulation-Based Optimization with Stochastic Approximation Using Common Random Numbers. Management Science, 45 (1999), 1570–1578.
Google Scholar
Digital Library
- Anders Kock. 2011. Commutative monads as a theory of distributions. arXiv preprint arXiv:1108.5952.
Google Scholar
- Faustyna Krawiec, Simon Peyton Jones, Neel Krishnaswami, Tom Ellis, Richard A. Eisenberg, and Andrew W. Fitzgibbon. 2022. Provably correct, asymptotically efficient, higher-order reverse-mode automatic differentiation. Proc. ACM Program. Lang., 6, POPL (2022), 1–30. https://doi.org/10.1145/3498710
Google Scholar
Digital Library
- Wonyeol Lee, Xavier Rival, and Hongseok Yang. 2022. Smoothness Analysis for Probabilistic Programs with Application to Optimised Variational Inference. CoRR, abs/2208.10530 (2022), https://doi.org/10.48550/arXiv.2208.10530 arXiv:2208.10530.
Google Scholar
- Wonyeol Lee, Hangyeol Yu, Xavier Rival, and Hongseok Yang. 2020. On Correctness of Automatic Differentiation for Non-Differentiable Functions. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https://proceedings.neurips.cc/paper/2020/hash/4aaa76178f8567e05c8e8295c96171d8-Abstract.html
Google Scholar
- Wonyeol Lee, Hangyeol Yu, Xavier Rival, and Hongseok Yang. 2020. Towards verified stochastic variational inference for probabilistic programs. Proc. ACM Program. Lang., 4, POPL (2020), 16:1–16:33. https://doi.org/10.1145/3371084
Google Scholar
Digital Library
- Wonyeol Lee, Hangyeol Yu, and Hongseok Yang. 2018. Reparameterization Gradient for Non-differentiable Models. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (Eds.). 5558–5568. https://proceedings.neurips.cc/paper/2018/hash/b096577e264d1ebd6b41041f392eec23-Abstract.html
Google Scholar
- Alexander K. Lew, Marco F. Cusumano-Towner, Benjamin Sherman, Michael Carbin, and Vikash K. Mansinghka. 2020. Trace types and denotational semantics for sound programmable inference in probabilistic languages. Proc. ACM Program. Lang., 4, POPL (2020), 19:1–19:32. https://doi.org/10.1145/3371087
Google Scholar
Digital Library
- Alexander K. Lew, Mathieu Huot, and Vikash K. Mansinghka. 2021. Towards Denotational Semantics of AD for Higher-Order, Recursive, Probabilistic Languages. CoRR, abs/2111.15456 (2021), arXiv:2111.15456. arxiv:2111.15456
Google Scholar
- Carol Mak, C.-H. Luke Ong, Hugo Paquet, and Dominik Wagner. 2021. Densities of Almost Surely Terminating Probabilistic Programs are Differentiable Almost Everywhere. In Programming Languages and Systems - 30th European Symposium on Programming, ESOP 2021, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021, Luxembourg City, Luxembourg, March 27 - April 1, 2021, Proceedings, Nobuko Yoshida (Ed.) (Lecture Notes in Computer Science, Vol. 12648). Springer, 432–461. https://doi.org/10.1007/978-3-030-72019-3_16
Google Scholar
Digital Library
- Vikash Mansinghka, Daniel Selsam, and Yura N. Perov. 2014. Venture: a higher-order probabilistic programming platform with programmable inference. CoRR, abs/1404.0099 (2014), arXiv:1404.0099. arxiv:1404.0099
Google Scholar
- Vikash K. Mansinghka, Ulrich Schaechtle, Shivam Handa, Alexey Radul, Yutian Chen, and Martin C. Rinard. 2018. Probabilistic programming with programmable inference. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018, Jeffrey S. Foster and Dan Grossman (Eds.). ACM, 603–616. https://doi.org/10.1145/3192366.3192409
Google Scholar
Digital Library
- Damiano Mazza and Michele Pagani. 2021. Automatic differentiation in PCF. Proc. ACM Program. Lang., 5, POPL (2021), 1–27. https://doi.org/10.1145/3434309
Google Scholar
Digital Library
- Andriy Mnih and Karol Gregor. 2014. Neural Variational Inference and Learning in Belief Networks. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014 (JMLR Workshop and Conference Proceedings, Vol. 32). JMLR.org, 1791–1799. http://proceedings.mlr.press/v32/mnih14.html
Google Scholar
- Shakir Mohamed, Mihaela Rosca, Michael Figurnov, and Andriy Mnih. 2020. Monte Carlo Gradient Estimation in Machine Learning. J. Mach. Learn. Res., 21 (2020), 132:1–132:62. http://jmlr.org/papers/v21/19-346.html
Google Scholar
- Christian A. Naesseth, Francisco J. R. Ruiz, Scott W. Linderman, and David M. Blei. 2017. Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA, Aarti Singh and Xiaojin (Jerry) Zhu (Eds.) (Proceedings of Machine Learning Research, Vol. 54). PMLR, 489–498. http://proceedings.mlr.press/v54/naesseth17a.html
Google Scholar
- Siddharth Narayanaswamy, Brooks Paige, Jan-Willem van de Meent, Alban Desmaison, Noah D. Goodman, Pushmeet Kohli, Frank D. Wood, and Philip H. S. Torr. 2017. Learning Disentangled Representations with Semi-Supervised Deep Generative Models. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5925–5935. https://proceedings.neurips.cc/paper/2017/hash/9cb9ed4f35cf7c2f295cc2bc6f732a84-Abstract.html
Google Scholar
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Z. Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. 8024–8035. https://proceedings.neurips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html
Google Scholar
- Brigitte Pientka, David Thibodeau, Andreas Abel, Francisco Ferreira, and Rébecca Zucchini. 2019. A Type Theory for Defining Logics and Proofs. In 34th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2019, Vancouver, BC, Canada, June 24-27, 2019. IEEE, 1–13. https://doi.org/10.1109/LICS.2019.8785683
Google Scholar
Cross Ref
- Louis B. Rall. 1981. Automatic Differentiation: Techniques and Applications (Lecture Notes in Computer Science, Vol. 120). Springer. isbn:3-540-10861-0 https://doi.org/10.1007/3-540-10861-0
Google Scholar
Cross Ref
- Rajesh Ranganath, Sean Gerrish, and David M. Blei. 2014. Black Box Variational Inference. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, AISTATS 2014, Reykjavik, Iceland, April 22-25, 2014 (JMLR Workshop and Conference Proceedings, Vol. 33). JMLR.org, 814–822. http://proceedings.mlr.press/v33/ranganath14.html
Google Scholar
- John Schulman. 2016. Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs. Ph.D. Dissertation. University of California, Berkeley, USA. https://www.escholarship.org/uc/item/9z908523
Google Scholar
- John Schulman, Nicolas Heess, Theophane Weber, and Pieter Abbeel. 2015. Gradient Estimation Using Stochastic Computation Graphs. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett (Eds.). 3528–3536. https://proceedings.neurips.cc/paper/2015/hash/de03beffeed9da5f3639a621bcab5dd4-Abstract.html
Google Scholar
- Adam Ścibior, Ohad Kammar, Matthijs Vákár, Sam Staton, Hongseok Yang, Yufei Cai, Klaus Ostermann, Sean K. Moss, Chris Heunen, and Zoubin Ghahramani. 2018. Denotational validation of higher-order Bayesian inference. Proc. ACM Program. Lang., 2, POPL (2018), 60:1–60:29. https://doi.org/10.1145/3158148
Google Scholar
Digital Library
- Adam Ścibior, Vaden Masrani, and Frank Wood. 2021. Differentiable Particle Filtering without Modifying the Forward Pass. CoRR, abs/2106.10314 (2021), arXiv:2106.10314. arxiv:2106.10314
Google Scholar
- Benjamin Sherman, Jesse Michel, and Michael Carbin. 2021. λ _S: computable semantics for differentiable programming with higher-order functions and datatypes. Proc. ACM Program. Lang., 5, POPL (2021), 1–31. https://doi.org/10.1145/3434284
Google Scholar
Digital Library
- Matthijs Vákár. 2020. Denotational Correctness of Forward-Mode Automatic Differentiation for Iteration and Recursion. arXiv preprint arXiv:2007.05282.
Google Scholar
- Matthijs Vákár, Ohad Kammar, and Sam Staton. 2019. A domain theory for statistical probabilistic programming. Proc. ACM Program. Lang., 3, POPL (2019), 36:1–36:29. https://doi.org/10.1145/3290349
Google Scholar
Digital Library
- Emile van Krieken, Jakub M. Tomczak, and Annette ten Teije. 2021. Storchastic: A Framework for General Stochastic Automatic Differentiation. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 7574–7587. https://proceedings.neurips.cc/paper/2021/hash/3dfe2f633108d604df160cd1b01710db-Abstract.html
Google Scholar
- Théophane Weber, Nicolas Heess, Lars Buesing, and David Silver. 2019. Credit Assignment Techniques in Stochastic Computation Graphs. In The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019, 16-18 April 2019, Naha, Okinawa, Japan, Kamalika Chaudhuri and Masashi Sugiyama (Eds.) (Proceedings of Machine Learning Research, Vol. 89). PMLR, 2650–2660. http://proceedings.mlr.press/v89/weber19a.html
Google Scholar
- David Wingate, Noah D. Goodman, Andreas Stuhlmüller, and Jeffrey Mark Siskind. 2011. Nonstandard Interpretations of Probabilistic Programs for Efficient Inference. In Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, Granada, Spain, John Shawe-Taylor, Richard S. Zemel, Peter L. Bartlett, Fernando C. N. Pereira, and Kilian Q. Weinberger (Eds.). 1152–1160. https://proceedings.neurips.cc/paper/2011/hash/0d7de1aca9299fe63f3e0041f02638a3-Abstract.html
Google Scholar
- Yizhou Zhang and Nada Amin. 2022. Reasoning about "reasoning about reasoning": semantics and contextual equivalence for probabilistic programs with nested queries and recursion. Proc. ACM Program. Lang., 6, POPL (2022), 1–28. https://doi.org/10.1145/3498677
Google Scholar
Digital Library
Index Terms
ADEV: Sound Automatic Differentiation of Expected Values of Probabilistic Programs
Recommendations
A Denotational Semantics for Low-Level Probabilistic Programs with Nondeterminism
AbstractProbabilistic programming is an increasingly popular formalism for modeling randomness and uncertainty. Designing semantic models for probabilistic programs has been extensively studied, but is technically challenging. Particular complications ...
Bayonet: probabilistic inference for networks
PLDI '18Network operators often need to ensure that important probabilistic properties are met, such as that the probability of network congestion is below a certain threshold. Ensuring such properties is challenging and requires both a suitable language for ...
Bayonet: probabilistic inference for networks
PLDI 2018: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and ImplementationNetwork operators often need to ensure that important probabilistic properties are met, such as that the probability of network congestion is below a certain threshold. Ensuring such properties is challenging and requires both a suitable language for ...






Comments