skip to main content
research-article
Open Access
Artifacts Available
Artifacts Evaluated & Reusable

Towards verified stochastic variational inference for probabilistic programs

Published:20 December 2019Publication History
Skip Abstract Section

Abstract

Probabilistic programming is the idea of writing models from statistics and machine learning using program notations and reasoning about these models using generic inference engines. Recently its combination with deep learning has been explored intensely, which led to the development of so called deep probabilistic programming languages, such as Pyro, Edward and ProbTorch. At the core of this development lie inference engines based on stochastic variational inference algorithms. When asked to find information about the posterior distribution of a model written in such a language, these algorithms convert this posterior-inference query into an optimisation problem and solve it approximately by a form of gradient ascent or descent. In this paper, we analyse one of the most fundamental and versatile variational inference algorithms, called score estimator or REINFORCE, using tools from denotational semantics and program analysis. We formally express what this algorithm does on models denoted by programs, and expose implicit assumptions made by the algorithm on the models. The violation of these assumptions may lead to an undefined optimisation objective or the loss of convergence guarantee of the optimisation process. We then describe rules for proving these assumptions, which can be automated by static program analyses. Some of our rules use nontrivial facts from continuous mathematics, and let us replace requirements about integrals in the assumptions, such as integrability of functions defined in terms of programs' denotations, by conditions involving differentiation or boundedness, which are much easier to prove automatically (and manually). Following our general methodology, we have developed a static program analysis for the Pyro programming language that aims at discharging the assumption about what we call model-guide support match. Our analysis is applied to the eight representative model-guide pairs from the Pyro webpage, which include sophisticated neural network models such as AIR. It finds a bug in one of these cases, reveals a non-standard use of an inference engine in another, and shows that the assumptions are met in the remaining six cases.

Skip Supplemental Material Section

Supplemental Material

a16-lee.webm

References

  1. Sooraj Bhat, Ashish Agarwal, Richard W. Vuduc, and Alexander G. Gray. 2012. A type theory for probability density functions. In Principles of Programming Languages (POPL). 545–556.Google ScholarGoogle Scholar
  2. Sooraj Bhat, Johannes Borgström, Andrew D. Gordon, and Claudio V. Russo. 2013. Deriving Probability Density Functions from Probabilistic Functional Programs. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS). 508–522.Google ScholarGoogle Scholar
  3. Eli Bingham, Jonathan P. Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj Pradhan, Theofanis Karaletsos, Rohit Singh, Paul A. Szerlip, Paul Horsfall, and Noah D. Goodman. 2019. Pyro: Deep Universal Probabilistic Programming. Journal of Machine Learning Research 20, 28 (2019), 1–6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Johannes Borgström, Ugo Dal Lago, Andrew D. Gordon, and Marcin Szymczak. 2016. A lambda-calculus foundation for universal probabilistic programming. In International Conference on Functional Programming (ICFP). 33–46.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Yuri Burda, Roger B. Grosse, and Ruslan Salakhutdinov. 2016. Importance Weighted Autoencoders. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  6. Bob Carpenter, Andrew Gelman, Matthew Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. Stan: A Probabilistic Programming Language. Journal of Statistical Software, Articles 76, 1 (2017), 1–32.Google ScholarGoogle ScholarCross RefCross Ref
  7. Arun Tejasvi Chaganty, Aditya V. Nori, and Sriram K. Rajamani. 2013. Efficiently Sampling Probabilistic Programs via Program Analysis. In Artificial Intelligence and Statistics (AISTATS). 153–160.Google ScholarGoogle Scholar
  8. Aleksandar Chakarov and Sriram Sankaranarayanan. 2013. Probabilistic Program Analysis with Martingales. In Computer Aided Verification (CAV). 511–526.Google ScholarGoogle Scholar
  9. Swarat Chaudhuri, Sumit Gulwani, and Roberto Lublinerman. 2010. Continuity analysis of programs. In Principles of Programming Languages (POPL). 57–70.Google ScholarGoogle Scholar
  10. Patrick Cousot and Radhia Cousot. 1977. Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints. In Principles of Programming Languages (POPL). 238–252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Patrick Cousot and Radhia Cousot. 1979. Systematic design of program analysis frameworks. In Principles of Programming Languages (POPL). 269–282.Google ScholarGoogle Scholar
  12. Patrick Cousot and Radhia Cousot. 1992. Abstract Interpretation Frameworks. Journal of Logic and Computation 2, 4 (1992), 511–547.Google ScholarGoogle ScholarCross RefCross Ref
  13. Patrick Cousot and Michael Monerau. 2012. Probabilistic Abstract Interpretation. In European Symposium on Programming (ESOP). 169–193.Google ScholarGoogle Scholar
  14. Thomas Ehrhard, Christine Tasson, and Michele Pagani. 2014. Probabilistic coherence spaces are fully abstract for probabilistic PCF. In Principles of Programming Languages (POPL). 309–320.Google ScholarGoogle Scholar
  15. S. M. Ali Eslami, Nicolas Heess, Theophane Weber, Yuval Tassa, David Szepesvari, Koray Kavukcuoglu, and Geoffrey E. Hinton. 2016. Attend, Infer, Repeat: Fast Scene Understanding with Generative Models. In Neural Information Processing Systems (NIPS). 3233–3241.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Timon Gehr, Sasa Misailovic, and Martin T. Vechev. 2016. PSI: Exact Symbolic Inference for Probabilistic Programs. In Computer Aided Verification (CAV). 62–83.Google ScholarGoogle Scholar
  17. Charles J. Geyer. 2011. Introduction to Markov Chain Monte Carlo. In Handbook of Markov Chain Monte Carlo, Steve Brooks, Andrew Gelman, Galin L. Jones, and Xiao-Li Meng (Eds.). Chapman and Hall/CRC, Chapter 1, 3–48.Google ScholarGoogle Scholar
  18. Hamid Ghourchian, Amin Gohari, and Arash Amini. 2017. Existence and Continuity of Differential Entropy for a Class of Distributions. IEEE Communications Letters 21, 7 (2017), 1469–1472.Google ScholarGoogle ScholarCross RefCross Ref
  19. Noah Goodman, Vikash Mansinghka, Daniel M Roy, Keith Bonawitz, and Joshua B Tenenbaum. 2008. Church: a language for generative models. In Uncertainty in Artificial Intelligence (UAI). 220–229.Google ScholarGoogle Scholar
  20. Andrew D. Gordon, Thore Graepel, Nicolas Rolland, Claudio Russo, Johannes Borgstrom, and John Guiver. 2014. Tabular: A Schema-driven Probabilistic Programming Language. In Principles of Programming Languages (POPL). 321–334.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Peter J. Green. 1995. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82, 4 (1995), 711–732.Google ScholarGoogle ScholarCross RefCross Ref
  22. Wilfred Keith Hastings. 1970. Monte Carlo Sampling Methods Using Markov Chains and Their Applications. Biometrika 57, 1 (1970), 97–109.Google ScholarGoogle ScholarCross RefCross Ref
  23. Chris Heunen, Ohad Kammar, Sam Staton, and Hongseok Yang. 2017. A convenient category for higher-order probability theory. In Logic in Computer Science (LICS). 1–12.Google ScholarGoogle Scholar
  24. Matthew D. Hoffman, David M. Blei, Chong Wang, and John Paisley. 2013. Stochastic Variational Inference. Journal of Machine Learning Research 14 (2013), 1303–1347.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Chung-Kil Hur, Aditya V. Nori, Sriram K. Rajamani, and Selva Samuel. 2015. A Provably Correct Sampler for Probabilistic Programs. In Foundation of Software Technology and Theoretical Computer Science (FSTTCS). 475–488.Google ScholarGoogle Scholar
  26. C. Jones and Gordon D. Plotkin. 1989. A Probabilistic Powerdomain of Evaluations. In Logic in Computer Science (LICS). 186–195.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Diederik P. Kingma, Danilo J. Rezende, Shakir Mohamed, and Max Welling. 2014. Semi-supervised Learning with Deep Generative Models. In Neural Information Processing Systems (NIPS). 3581–3589.Google ScholarGoogle Scholar
  28. Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  29. Oleg Kiselyov. 2016. Probabilistic Programming Language and its Incremental Evaluation. In Asian Symposium on Programming Languages and Systems (APLAS). 357–376.Google ScholarGoogle ScholarCross RefCross Ref
  30. Achim Klenke. 2014. Probability Theory: A Comprehensive Course (second ed.). Springer-Verlag London.Google ScholarGoogle ScholarCross RefCross Ref
  31. Dexter Kozen. 1981. Semantics of Probabilistic Programs. J. Comput. System Sci. 22, 3 (1981), 328–350.Google ScholarGoogle Scholar
  32. Rahul G. Krishnan, Uri Shalit, and David Sontag. 2017. Structured Inference Networks for Nonlinear State Space Models. In AAAI Conference on Artificial Intelligence (AAAI). 2101–2109.Google ScholarGoogle Scholar
  33. Alp Kucukelbir, Rajesh Ranganath, Andrew Gelman, and David M. Blei. 2015. Automatic Variational Inference in Stan. In Neural Information Processing Systems (NIPS). 568–576.Google ScholarGoogle Scholar
  34. Alp Kucukelbir, Dustin Tran, Rajesh Ranganath, Andrew Gelman, and David M. Blei. 2017. Automatic Differentiation Variational Inference. Journal of Machine Learning Research 18 (2017), 14:1–14:45.Google ScholarGoogle Scholar
  35. Tuan Anh Le, Atilim Gunes Baydin, and Frank Wood. 2017. Inference Compilation and Universal Probabilistic Programming. In Artificial Intelligence and Statistics (AISTATS). 1338–1348.Google ScholarGoogle Scholar
  36. Wonyeol Lee, Hangyeol Yu, Xavier Rival, and Hongseok Yang. 2019. Towards Verified Stochastic Variational Inference for Probabilistic Programs. arXiv:1907.08827 (2019).Google ScholarGoogle Scholar
  37. Vikash K. Mansinghka, Daniel Selsam, and Yura N. Perov. 2014. Venture: a higher-order probabilistic programming platform with programmable inference. arXiv:1404.0099 (2014).Google ScholarGoogle Scholar
  38. Nicholas Metropolis, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller, and Edward Teller. 1953. Equation of State Calculations by Fast Computing Machines. The Journal of Chemical Physics 21, 6 (1953), 1087–1092.Google ScholarGoogle ScholarCross RefCross Ref
  39. T. Minka, J.M. Winn, J.P. Guiver, S. Webster, Y. Zaykov, B. Yangel, A. Spengler, and J. Bronskill. 2014. Infer.NET 2.6. Microsoft Research Cambridge. http://research.microsoft.com/infernet.Google ScholarGoogle Scholar
  40. Matthew Mirman, Timon Gehr, and Martin T. Vechev. 2018. Differentiable Abstract Interpretation for Provably Robust Neural Networks. In International Conference on Machine Learning (ICML). 3575–3583.Google ScholarGoogle Scholar
  41. David Monniaux. 2000. Abstract Interpretation of Probabilistic Semantics. In Static Analysis Symposium (SAS). 322–339.Google ScholarGoogle Scholar
  42. David Monniaux. 2001. Backwards Abstract Interpretation of Probabilistic Programs. In European Symposium on Programming (ESOP). 367–382.Google ScholarGoogle Scholar
  43. Chandra Nair, Balaji Prabhakar, and Devavrat Shah. 2006. On Entropy for Mixtures of Discrete and Continuous Variables. arXiv:cs/0607075 (2006).Google ScholarGoogle Scholar
  44. Praveen Narayanan, Jacques Carette, Wren Romano, Chung-chieh Shan, and Robert Zinkov. 2016. Probabilistic inference by program transformation in Hakaru (system description). In Functional and Logic Programming (FLOPS). 62–79.Google ScholarGoogle Scholar
  45. Radford M. Neal and Geoffrey E. Hinton. 1998. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants. In Learning in Graphical Models. 355–368.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Aditya V. Nori, Chung-Kil Hur, Sriram K. Rajamani, and Selva Samuel. 2014. R2: An Efficient MCMC Sampler for Probabilistic Programs. In AAAI Conference on Artificial Intelligence (AAAI). 2476–2482.Google ScholarGoogle Scholar
  47. John William Paisley, David M. Blei, and Michael I. Jordan. 2012. Variational Bayesian Inference with Stochastic Search. In International Conference on Machine Learning (ICML). 1363–1370.Google ScholarGoogle Scholar
  48. Rajesh Ranganath, Sean Gerrish, and David M. Blei. 2014. Black Box Variational Inference. In Artificial Intelligence and Statistics (AISTATS). 814–822.Google ScholarGoogle Scholar
  49. Rajesh Ranganath, Linpeng Tang, Laurent Charlin, and David Blei. 2015. Deep Exponential Families. In Artificial Intelligence and Statistics (AISTATS). 762–771.Google ScholarGoogle Scholar
  50. Adam Scibior, Ohad Kammar, Matthijs Vákár, Sam Staton, Hongseok Yang, Yufei Cai, Klaus Ostermann, Sean K. Moss, Chris Heunen, and Zoubin Ghahramani. 2018. Denotational validation of higher-order Bayesian inference. PACMPL 2, POPL (2018), 60:1–60:29.Google ScholarGoogle Scholar
  51. N. Siddharth, Brooks Paige, Jan-Willem van de Meent, Alban Desmaison, Noah D. Goodman, Pushmeet Kohli, Frank Wood, and Philip Torr. 2017. Learning Disentangled Representations with Semi-Supervised Deep Generative Models. In Neural Information Processing Systems (NIPS). 5927–5937.Google ScholarGoogle Scholar
  52. Steffen Smolka, Praveen Kumar, Nate Foster, Dexter Kozen, and Alexandra Silva. 2017. Cantor meets scott: semantic foundations for probabilistic networks. In Principles of Programming Languages (POPL). 557–571.Google ScholarGoogle Scholar
  53. Akash Srivastava and Charles A. Sutton. 2017. Autoencoding Variational Inference For Topic Models. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  54. Sam Staton. 2017. Commutative Semantics for Probabilistic Programming. In European Symposium on Programming (ESOP). 855–879.Google ScholarGoogle Scholar
  55. Sam Staton, Hongseok Yang, Frank D. Wood, Chris Heunen, and Ohad Kammar. 2016. Semantics for probabilistic programming: higher-order functions, continuous distributions, and soft constraints. In Logic in Computer Science (LICS). 525–534.Google ScholarGoogle Scholar
  56. Neil Toronto, Jay McCarthy, and David Van Horn. 2015. Running Probabilistic Programs Backwards. In European Symposium on Programming (ESOP). 53–79.Google ScholarGoogle Scholar
  57. Dustin Tran, Matthew D. Hoffman, Dave Moore, Christopher Suter, Srinivas Vasudevan, and Alexey Radul. 2018. Simple, Distributed, and Accelerated Probabilistic Programming. In Neural Information Processing Systems (NeurIPS). 7609–7620.Google ScholarGoogle Scholar
  58. Dustin Tran, Alp Kucukelbir, Adji B. Dieng, Maja R. Rudolph, Dawen Liang, and David M. Blei. 2016. Edward: A library for probabilistic modeling, inference, and criticism. arXiv:1610.09787 (2016).Google ScholarGoogle Scholar
  59. Uber AI Labs. 2019a. Pyro examples. http://pyro.ai/examples/ . Version used: April 1, 2019.Google ScholarGoogle Scholar
  60. Uber AI Labs. 2019b. Pyro regression test suite. https://github.com/pyro- ppl/pyro/blob/dev/tests/infer/test_valid_models.py . Version used: March 1, 2019.Google ScholarGoogle Scholar
  61. Matthijs Vákár, Ohad Kammar, and Sam Staton. 2019. A domain theory for statistical probabilistic programming. PACMPL 3, POPL (2019), 36:1–36:29.Google ScholarGoogle Scholar
  62. Jan-Willem van de Meent, Brooks Paige, David Tolpin, and Frank D. Wood. 2016. Black-Box Policy Search with Probabilistic Programs. In Artificial Intelligence and Statistics (AISTATS). 1195–1204.Google ScholarGoogle Scholar
  63. Di Wang, Jan Hoffmann, and Thomas W. Reps. 2018. PMAF: an algebraic framework for static analysis of probabilistic programs. In Programming Language Design and Implementation (PLDI). 513–528.Google ScholarGoogle Scholar
  64. Ronald J. Williams. 1992. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning 8, 3-4 (1992), 229–256.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. David Wingate and Theophane Weber. 2013. Automated Variational Inference in Probabilistic Programming. arXiv:1301.1299 (2013).Google ScholarGoogle Scholar
  66. Frank Wood, Jan Willem van de Meent, and Vikash Mansinghka. 2014. A New Approach to Probabilistic Programming Inference. In Artificial Intelligence and Statistics (AISTATS). 1024–1032.Google ScholarGoogle Scholar
  67. Yi Wu, Siddharth Srivastava, Nicholas Hay, Simon Du, and Stuart J. Russell. 2018. Discrete-Continuous Mixtures in Probabilistic Programming: Generalized Semantics and Inference Algorithms. In International Conference on Machine Learning (ICML). 5339–5348.Google ScholarGoogle Scholar
  68. Hongseok Yang. 2019. Implementing Inference Algorithms for Probabilistic Programs. https://github.com/hongseok- yang/ probprog19/blob/master/Lectures/Lecture6/Note6.pdf . Lecture Note of the 2019 Course on Probabilistic Programming at KAIST.Google ScholarGoogle Scholar

Index Terms

  1. Towards verified stochastic variational inference for probabilistic programs

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader
                About Cookies On This Site

                We use cookies to ensure that we give you the best experience on our website.

                Learn more

                Got it!