Abstract
Probabilistic programming is the idea of writing models from statistics and machine learning using program notations and reasoning about these models using generic inference engines. Recently its combination with deep learning has been explored intensely, which led to the development of so called deep probabilistic programming languages, such as Pyro, Edward and ProbTorch. At the core of this development lie inference engines based on stochastic variational inference algorithms. When asked to find information about the posterior distribution of a model written in such a language, these algorithms convert this posterior-inference query into an optimisation problem and solve it approximately by a form of gradient ascent or descent. In this paper, we analyse one of the most fundamental and versatile variational inference algorithms, called score estimator or REINFORCE, using tools from denotational semantics and program analysis. We formally express what this algorithm does on models denoted by programs, and expose implicit assumptions made by the algorithm on the models. The violation of these assumptions may lead to an undefined optimisation objective or the loss of convergence guarantee of the optimisation process. We then describe rules for proving these assumptions, which can be automated by static program analyses. Some of our rules use nontrivial facts from continuous mathematics, and let us replace requirements about integrals in the assumptions, such as integrability of functions defined in terms of programs' denotations, by conditions involving differentiation or boundedness, which are much easier to prove automatically (and manually). Following our general methodology, we have developed a static program analysis for the Pyro programming language that aims at discharging the assumption about what we call model-guide support match. Our analysis is applied to the eight representative model-guide pairs from the Pyro webpage, which include sophisticated neural network models such as AIR. It finds a bug in one of these cases, reveals a non-standard use of an inference engine in another, and shows that the assumptions are met in the remaining six cases.
Supplemental Material
- Sooraj Bhat, Ashish Agarwal, Richard W. Vuduc, and Alexander G. Gray. 2012. A type theory for probability density functions. In Principles of Programming Languages (POPL). 545–556.Google Scholar
- Sooraj Bhat, Johannes Borgström, Andrew D. Gordon, and Claudio V. Russo. 2013. Deriving Probability Density Functions from Probabilistic Functional Programs. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS). 508–522.Google Scholar
- Eli Bingham, Jonathan P. Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj Pradhan, Theofanis Karaletsos, Rohit Singh, Paul A. Szerlip, Paul Horsfall, and Noah D. Goodman. 2019. Pyro: Deep Universal Probabilistic Programming. Journal of Machine Learning Research 20, 28 (2019), 1–6.Google Scholar
Digital Library
- Johannes Borgström, Ugo Dal Lago, Andrew D. Gordon, and Marcin Szymczak. 2016. A lambda-calculus foundation for universal probabilistic programming. In International Conference on Functional Programming (ICFP). 33–46.Google Scholar
Digital Library
- Yuri Burda, Roger B. Grosse, and Ruslan Salakhutdinov. 2016. Importance Weighted Autoencoders. In International Conference on Learning Representations (ICLR).Google Scholar
- Bob Carpenter, Andrew Gelman, Matthew Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. Stan: A Probabilistic Programming Language. Journal of Statistical Software, Articles 76, 1 (2017), 1–32.Google Scholar
Cross Ref
- Arun Tejasvi Chaganty, Aditya V. Nori, and Sriram K. Rajamani. 2013. Efficiently Sampling Probabilistic Programs via Program Analysis. In Artificial Intelligence and Statistics (AISTATS). 153–160.Google Scholar
- Aleksandar Chakarov and Sriram Sankaranarayanan. 2013. Probabilistic Program Analysis with Martingales. In Computer Aided Verification (CAV). 511–526.Google Scholar
- Swarat Chaudhuri, Sumit Gulwani, and Roberto Lublinerman. 2010. Continuity analysis of programs. In Principles of Programming Languages (POPL). 57–70.Google Scholar
- Patrick Cousot and Radhia Cousot. 1977. Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints. In Principles of Programming Languages (POPL). 238–252.Google Scholar
Digital Library
- Patrick Cousot and Radhia Cousot. 1979. Systematic design of program analysis frameworks. In Principles of Programming Languages (POPL). 269–282.Google Scholar
- Patrick Cousot and Radhia Cousot. 1992. Abstract Interpretation Frameworks. Journal of Logic and Computation 2, 4 (1992), 511–547.Google Scholar
Cross Ref
- Patrick Cousot and Michael Monerau. 2012. Probabilistic Abstract Interpretation. In European Symposium on Programming (ESOP). 169–193.Google Scholar
- Thomas Ehrhard, Christine Tasson, and Michele Pagani. 2014. Probabilistic coherence spaces are fully abstract for probabilistic PCF. In Principles of Programming Languages (POPL). 309–320.Google Scholar
- S. M. Ali Eslami, Nicolas Heess, Theophane Weber, Yuval Tassa, David Szepesvari, Koray Kavukcuoglu, and Geoffrey E. Hinton. 2016. Attend, Infer, Repeat: Fast Scene Understanding with Generative Models. In Neural Information Processing Systems (NIPS). 3233–3241.Google Scholar
Digital Library
- Timon Gehr, Sasa Misailovic, and Martin T. Vechev. 2016. PSI: Exact Symbolic Inference for Probabilistic Programs. In Computer Aided Verification (CAV). 62–83.Google Scholar
- Charles J. Geyer. 2011. Introduction to Markov Chain Monte Carlo. In Handbook of Markov Chain Monte Carlo, Steve Brooks, Andrew Gelman, Galin L. Jones, and Xiao-Li Meng (Eds.). Chapman and Hall/CRC, Chapter 1, 3–48.Google Scholar
- Hamid Ghourchian, Amin Gohari, and Arash Amini. 2017. Existence and Continuity of Differential Entropy for a Class of Distributions. IEEE Communications Letters 21, 7 (2017), 1469–1472.Google Scholar
Cross Ref
- Noah Goodman, Vikash Mansinghka, Daniel M Roy, Keith Bonawitz, and Joshua B Tenenbaum. 2008. Church: a language for generative models. In Uncertainty in Artificial Intelligence (UAI). 220–229.Google Scholar
- Andrew D. Gordon, Thore Graepel, Nicolas Rolland, Claudio Russo, Johannes Borgstrom, and John Guiver. 2014. Tabular: A Schema-driven Probabilistic Programming Language. In Principles of Programming Languages (POPL). 321–334.Google Scholar
Digital Library
- Peter J. Green. 1995. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82, 4 (1995), 711–732.Google Scholar
Cross Ref
- Wilfred Keith Hastings. 1970. Monte Carlo Sampling Methods Using Markov Chains and Their Applications. Biometrika 57, 1 (1970), 97–109.Google Scholar
Cross Ref
- Chris Heunen, Ohad Kammar, Sam Staton, and Hongseok Yang. 2017. A convenient category for higher-order probability theory. In Logic in Computer Science (LICS). 1–12.Google Scholar
- Matthew D. Hoffman, David M. Blei, Chong Wang, and John Paisley. 2013. Stochastic Variational Inference. Journal of Machine Learning Research 14 (2013), 1303–1347.Google Scholar
Digital Library
- Chung-Kil Hur, Aditya V. Nori, Sriram K. Rajamani, and Selva Samuel. 2015. A Provably Correct Sampler for Probabilistic Programs. In Foundation of Software Technology and Theoretical Computer Science (FSTTCS). 475–488.Google Scholar
- C. Jones and Gordon D. Plotkin. 1989. A Probabilistic Powerdomain of Evaluations. In Logic in Computer Science (LICS). 186–195.Google Scholar
Digital Library
- Diederik P. Kingma, Danilo J. Rezende, Shakir Mohamed, and Max Welling. 2014. Semi-supervised Learning with Deep Generative Models. In Neural Information Processing Systems (NIPS). 3581–3589.Google Scholar
- Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In International Conference on Learning Representations (ICLR).Google Scholar
- Oleg Kiselyov. 2016. Probabilistic Programming Language and its Incremental Evaluation. In Asian Symposium on Programming Languages and Systems (APLAS). 357–376.Google Scholar
Cross Ref
- Achim Klenke. 2014. Probability Theory: A Comprehensive Course (second ed.). Springer-Verlag London.Google Scholar
Cross Ref
- Dexter Kozen. 1981. Semantics of Probabilistic Programs. J. Comput. System Sci. 22, 3 (1981), 328–350.Google Scholar
- Rahul G. Krishnan, Uri Shalit, and David Sontag. 2017. Structured Inference Networks for Nonlinear State Space Models. In AAAI Conference on Artificial Intelligence (AAAI). 2101–2109.Google Scholar
- Alp Kucukelbir, Rajesh Ranganath, Andrew Gelman, and David M. Blei. 2015. Automatic Variational Inference in Stan. In Neural Information Processing Systems (NIPS). 568–576.Google Scholar
- Alp Kucukelbir, Dustin Tran, Rajesh Ranganath, Andrew Gelman, and David M. Blei. 2017. Automatic Differentiation Variational Inference. Journal of Machine Learning Research 18 (2017), 14:1–14:45.Google Scholar
- Tuan Anh Le, Atilim Gunes Baydin, and Frank Wood. 2017. Inference Compilation and Universal Probabilistic Programming. In Artificial Intelligence and Statistics (AISTATS). 1338–1348.Google Scholar
- Wonyeol Lee, Hangyeol Yu, Xavier Rival, and Hongseok Yang. 2019. Towards Verified Stochastic Variational Inference for Probabilistic Programs. arXiv:1907.08827 (2019).Google Scholar
- Vikash K. Mansinghka, Daniel Selsam, and Yura N. Perov. 2014. Venture: a higher-order probabilistic programming platform with programmable inference. arXiv:1404.0099 (2014).Google Scholar
- Nicholas Metropolis, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller, and Edward Teller. 1953. Equation of State Calculations by Fast Computing Machines. The Journal of Chemical Physics 21, 6 (1953), 1087–1092.Google Scholar
Cross Ref
- T. Minka, J.M. Winn, J.P. Guiver, S. Webster, Y. Zaykov, B. Yangel, A. Spengler, and J. Bronskill. 2014. Infer.NET 2.6. Microsoft Research Cambridge. http://research.microsoft.com/infernet.Google Scholar
- Matthew Mirman, Timon Gehr, and Martin T. Vechev. 2018. Differentiable Abstract Interpretation for Provably Robust Neural Networks. In International Conference on Machine Learning (ICML). 3575–3583.Google Scholar
- David Monniaux. 2000. Abstract Interpretation of Probabilistic Semantics. In Static Analysis Symposium (SAS). 322–339.Google Scholar
- David Monniaux. 2001. Backwards Abstract Interpretation of Probabilistic Programs. In European Symposium on Programming (ESOP). 367–382.Google Scholar
- Chandra Nair, Balaji Prabhakar, and Devavrat Shah. 2006. On Entropy for Mixtures of Discrete and Continuous Variables. arXiv:cs/0607075 (2006).Google Scholar
- Praveen Narayanan, Jacques Carette, Wren Romano, Chung-chieh Shan, and Robert Zinkov. 2016. Probabilistic inference by program transformation in Hakaru (system description). In Functional and Logic Programming (FLOPS). 62–79.Google Scholar
- Radford M. Neal and Geoffrey E. Hinton. 1998. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants. In Learning in Graphical Models. 355–368.Google Scholar
Digital Library
- Aditya V. Nori, Chung-Kil Hur, Sriram K. Rajamani, and Selva Samuel. 2014. R2: An Efficient MCMC Sampler for Probabilistic Programs. In AAAI Conference on Artificial Intelligence (AAAI). 2476–2482.Google Scholar
- John William Paisley, David M. Blei, and Michael I. Jordan. 2012. Variational Bayesian Inference with Stochastic Search. In International Conference on Machine Learning (ICML). 1363–1370.Google Scholar
- Rajesh Ranganath, Sean Gerrish, and David M. Blei. 2014. Black Box Variational Inference. In Artificial Intelligence and Statistics (AISTATS). 814–822.Google Scholar
- Rajesh Ranganath, Linpeng Tang, Laurent Charlin, and David Blei. 2015. Deep Exponential Families. In Artificial Intelligence and Statistics (AISTATS). 762–771.Google Scholar
- Adam Scibior, Ohad Kammar, Matthijs Vákár, Sam Staton, Hongseok Yang, Yufei Cai, Klaus Ostermann, Sean K. Moss, Chris Heunen, and Zoubin Ghahramani. 2018. Denotational validation of higher-order Bayesian inference. PACMPL 2, POPL (2018), 60:1–60:29.Google Scholar
- N. Siddharth, Brooks Paige, Jan-Willem van de Meent, Alban Desmaison, Noah D. Goodman, Pushmeet Kohli, Frank Wood, and Philip Torr. 2017. Learning Disentangled Representations with Semi-Supervised Deep Generative Models. In Neural Information Processing Systems (NIPS). 5927–5937.Google Scholar
- Steffen Smolka, Praveen Kumar, Nate Foster, Dexter Kozen, and Alexandra Silva. 2017. Cantor meets scott: semantic foundations for probabilistic networks. In Principles of Programming Languages (POPL). 557–571.Google Scholar
- Akash Srivastava and Charles A. Sutton. 2017. Autoencoding Variational Inference For Topic Models. In International Conference on Learning Representations (ICLR).Google Scholar
- Sam Staton. 2017. Commutative Semantics for Probabilistic Programming. In European Symposium on Programming (ESOP). 855–879.Google Scholar
- Sam Staton, Hongseok Yang, Frank D. Wood, Chris Heunen, and Ohad Kammar. 2016. Semantics for probabilistic programming: higher-order functions, continuous distributions, and soft constraints. In Logic in Computer Science (LICS). 525–534.Google Scholar
- Neil Toronto, Jay McCarthy, and David Van Horn. 2015. Running Probabilistic Programs Backwards. In European Symposium on Programming (ESOP). 53–79.Google Scholar
- Dustin Tran, Matthew D. Hoffman, Dave Moore, Christopher Suter, Srinivas Vasudevan, and Alexey Radul. 2018. Simple, Distributed, and Accelerated Probabilistic Programming. In Neural Information Processing Systems (NeurIPS). 7609–7620.Google Scholar
- Dustin Tran, Alp Kucukelbir, Adji B. Dieng, Maja R. Rudolph, Dawen Liang, and David M. Blei. 2016. Edward: A library for probabilistic modeling, inference, and criticism. arXiv:1610.09787 (2016).Google Scholar
- Uber AI Labs. 2019a. Pyro examples. http://pyro.ai/examples/ . Version used: April 1, 2019.Google Scholar
- Uber AI Labs. 2019b. Pyro regression test suite. https://github.com/pyro- ppl/pyro/blob/dev/tests/infer/test_valid_models.py . Version used: March 1, 2019.Google Scholar
- Matthijs Vákár, Ohad Kammar, and Sam Staton. 2019. A domain theory for statistical probabilistic programming. PACMPL 3, POPL (2019), 36:1–36:29.Google Scholar
- Jan-Willem van de Meent, Brooks Paige, David Tolpin, and Frank D. Wood. 2016. Black-Box Policy Search with Probabilistic Programs. In Artificial Intelligence and Statistics (AISTATS). 1195–1204.Google Scholar
- Di Wang, Jan Hoffmann, and Thomas W. Reps. 2018. PMAF: an algebraic framework for static analysis of probabilistic programs. In Programming Language Design and Implementation (PLDI). 513–528.Google Scholar
- Ronald J. Williams. 1992. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning 8, 3-4 (1992), 229–256.Google Scholar
Digital Library
- David Wingate and Theophane Weber. 2013. Automated Variational Inference in Probabilistic Programming. arXiv:1301.1299 (2013).Google Scholar
- Frank Wood, Jan Willem van de Meent, and Vikash Mansinghka. 2014. A New Approach to Probabilistic Programming Inference. In Artificial Intelligence and Statistics (AISTATS). 1024–1032.Google Scholar
- Yi Wu, Siddharth Srivastava, Nicholas Hay, Simon Du, and Stuart J. Russell. 2018. Discrete-Continuous Mixtures in Probabilistic Programming: Generalized Semantics and Inference Algorithms. In International Conference on Machine Learning (ICML). 5339–5348.Google Scholar
- Hongseok Yang. 2019. Implementing Inference Algorithms for Probabilistic Programs. https://github.com/hongseok- yang/ probprog19/blob/master/Lectures/Lecture6/Note6.pdf . Lecture Note of the 2019 Course on Probabilistic Programming at KAIST.Google Scholar
Index Terms
Towards verified stochastic variational inference for probabilistic programs
Recommendations
Probabilistic programming with stochastic variational message passing
AbstractStochastic approximation methods for variational inference have recently gained popularity in the probabilistic programming community since these methods are amenable to automation and allow online, scalable, and universal approximate ...
Automatic differentiation variational inference
Probabilistic modeling is iterative. A scientist posits a simple model, fits it to her data, refines it according to her analysis, and repeats. However, fitting complex models to large data is a bottleneck in this process. Deriving algorithms for new ...
Stochastic variational inference
We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet ...






Comments