skip to main content

Smoothness Analysis for Probabilistic Programs with Application to Optimised Variational Inference

Published:11 January 2023Publication History
Skip Abstract Section

Abstract

We present a static analysis for discovering differentiable or more generally smooth parts of a given probabilistic program, and show how the analysis can be used to improve the pathwise gradient estimator, one of the most popular methods for posterior inference and model learning. Our improvement increases the scope of the estimator from differentiable models to non-differentiable ones without requiring manual intervention of the user; the improved estimator automatically identifies differentiable parts of a given probabilistic program using our static analysis, and applies the pathwise gradient estimator to the identified parts while using a more general but less efficient estimator, called score estimator, for the rest of the program. Our analysis has a surprisingly subtle soundness argument, partly due to the misbehaviours of some target smoothness properties when viewed from the perspective of program analysis designers. For instance, some smoothness properties, such as partial differentiability and partial continuity, are not preserved by function composition, and this makes it difficult to analyse sequential composition soundly without heavily sacrificing precision. We formulate five assumptions on a target smoothness property, prove the soundness of our analysis under those assumptions, and show that our leading examples satisfy these assumptions. We also show that by using information from our analysis instantiated for differentiability, our improved gradient estimator satisfies an important differentiability requirement and thus computes the correct estimate on average (i.e., returns an unbiased estimate) under a regularity condition. Our experiments with representative probabilistic programs in the Pyro language show that our static analysis is capable of identifying smooth parts of those programs accurately, and making our improved pathwise gradient estimator exploit all the opportunities for high performance in those programs.

References

  1. Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein Generative Adversarial Networks. In International Conference on Machine Learning (ICML). 214–223. Google ScholarGoogle Scholar
  2. Gilles Barthe, Raphaëlle Crubillé, Ugo Dal Lago, and Francesco Gavazzo. 2020. On the Versatility of Open Logical Relations - Continuity, Automatic Differentiation, and a Containment Theorem. In European Symposium on Programming (ESOP). 56–83. https://doi.org/10.1007/978-3-030-44914-8_3 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Eli Bingham, Jonathan P. Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj Pradhan, Theofanis Karaletsos, Rohit Singh, Paul A. Szerlip, Paul Horsfall, and Noah D. Goodman. 2019. Pyro: Deep Universal Probabilistic Programming. Journal of Machine Learning Research, 20, 28 (2019), 1–6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Blanchet, P. Cousot, R. Cousot, J. Feret, L. Mauborgne, A. Miné, D. Monniaux, and X. Rival. 2003. A Static Analyzer for Large Safety Critical Software. In Programming Languages, Design and Implementation (PLDI). 196–207. https://doi.org/10.1145/781131.781153 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Vladimir I. Bogachev. 2007. Measure Theory (first ed.). Springer. https://doi.org/10.1007/978-3-540-34514-5 Google ScholarGoogle ScholarCross RefCross Ref
  6. Bob Carpenter, Andrew Gelman, Matthew Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. Stan: A Probabilistic Programming Language. Journal of Statistical Software, 76, 1 (2017), 1–32. https://doi.org/10.18637/jss.v076.i01 Google ScholarGoogle ScholarCross RefCross Ref
  7. Arun Tejasvi Chaganty, Aditya V. Nori, and Sriram K. Rajamani. 2013. Efficiently Sampling Probabilistic Programs via Program Analysis. In Artificial Intelligence and Statistics (AISTATS). 153–160. Google ScholarGoogle Scholar
  8. Swarat Chaudhuri, Sumit Gulwani, and Roberto Lublinerman. 2010. Continuity analysis of programs. In Principles of Programming Languages (POPL). 57–70. https://doi.org/10.1145/1706299.1706308 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Swarat Chaudhuri, Sumit Gulwani, and Roberto Lublinerman. 2012. Continuity and robustness of programs. Commun. ACM, 55, 8 (2012), 107–115. https://doi.org/10.1145/2240236.2240262 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Guillaume Claret, Sriram K. Rajamani, Aditya V. Nori, Andrew D. Gordon, and Johannes Borgström. 2013. Bayesian inference using data flow analysis. In Foundations of Software Engineering (FSE). 92–102. https://doi.org/10.1145/2491411.2491423 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. R. Clarkson and F. B. Schneider. 2008. Hyperproperties. In Computer Security Foundations (CSF). 51–65. https://doi.org/10.1109/CSF.2008.7 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Patrick Cousot and Radhia Cousot. 1977. Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints. In Principles of Programming Languages (POPL). 238–252. https://doi.org/10.1145/512950.512973 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Patrick Cousot and Radhia Cousot. 1979. Systematic design of program analysis frameworks. In Principles of Programming Languages (POPL). 269–282. https://doi.org/10.1145/567752.567778 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. M. Ali Eslami, Nicolas Heess, Theophane Weber, Yuval Tassa, David Szepesvari, Koray Kavukcuoglu, and Geoffrey E. Hinton. 2016. Attend, Infer, Repeat: Fast Scene Understanding with Generative Models. In Neural Information Processing Systems (NIPS). 3233–3241. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hong Ge, Kai Xu, and Zoubin Ghahramani. 2018. Turing: A Language for Flexible Probabilistic Inference. In Artificial Intelligence and Statistics (AISTATS). 1682–1690. Google ScholarGoogle Scholar
  16. Timon Gehr, Sasa Misailovic, and Martin T. Vechev. 2016. PSI: Exact Symbolic Inference for Probabilistic Programs. In Computer Aided Verification (CAV). 62–83. https://doi.org/10.1007/978-3-319-41528-4_4 Google ScholarGoogle ScholarCross RefCross Ref
  17. Noah Goodman, Vikash Mansinghka, Daniel M Roy, Keith Bonawitz, and Joshua B Tenenbaum. 2008. Church: a language for generative models. In Uncertainty in Artificial Intelligence (UAI). 220–229. Google ScholarGoogle Scholar
  18. Andrew D. Gordon, Thore Graepel, Nicolas Rolland, Claudio Russo, Johannes Borgstrom, and John Guiver. 2014. Tabular: A Schema-driven Probabilistic Programming Language. In Principles of Programming Languages (POPL). 321–334. https://doi.org/10.1145/2578855.2535850 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Maria I. Gorinova, Andrew D. Gordon, Charles Sutton, and Matthijs Vákár. 2022. Conditional Independence by Typing. ACM Trans. Program. Lang. Syst., 44, 1 (2022), 4:1–4:54. https://doi.org/10.1145/3490421 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Maria I. Gorinova, Dave Moore, and Matthew D. Hoffman. 2020. Automatic Reparameterisation of Probabilistic Programs. In International Conference on Machine Learning (ICML). 3648–3657. Google ScholarGoogle Scholar
  21. Steven Holtzen, Guy Van den Broeck, and Todd D. Millstein. 2020. Scaling exact inference for discrete probabilistic programs. Proc. ACM Program. Lang., 4, OOPSLA (2020), 140:1–140:31. https://doi.org/10.1145/3428208 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Hyunjik Kim, George Papamakarios, and Andriy Mnih. 2021. The Lipschitz Constant of Self-Attention. In International Conference on Machine Learning (ICML). 5562–5571. Google ScholarGoogle Scholar
  23. Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In International Conference on Learning Representations (ICLR). Google ScholarGoogle Scholar
  24. Alp Kucukelbir, Rajesh Ranganath, Andrew Gelman, and David M. Blei. 2015. Automatic Variational Inference in Stan. In Neural Information Processing Systems (NIPS). 568–576. Google ScholarGoogle Scholar
  25. Jacob Laurel, Rem Yang, Gagandeep Singh, and Sasa Misailovic. 2022. A dual number abstraction for static analysis of Clarke Jacobians. Proc. ACM Program. Lang., 6, POPL (2022), 1–30. https://doi.org/10.1145/3498718 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. John M. Lee. 2012. Introduction to Smooth Manifolds (second ed.). Springer. https://doi.org/10.1007/978-1-4419-9982-5 Google ScholarGoogle ScholarCross RefCross Ref
  27. Wonyeol Lee, Xavier Rival, and Hongseok Yang. 2022. Artifact for the Paper “Smoothness Analysis for Probabilistic Programs with Application to Optimised Variational Inference”. https://doi.org/10.5281/zenodo.7246597 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Wonyeol Lee, Xavier Rival, and Hongseok Yang. 2022. Smoothness Analysis for Probabilistic Programs with Application to Optimised Variational Inference. arXiv:2208.10530, https://doi.org/10.48550/ARXIV.2208.10530 Google ScholarGoogle Scholar
  29. Wonyeol Lee, Hangyeol Yu, Xavier Rival, and Hongseok Yang. 2020. Towards verified stochastic variational inference for probabilistic programs. Proc. ACM Program. Lang., 4, POPL (2020), 16:1–16:33. https://doi.org/10.1145/3371084 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Alexander K. Lew, Marco F. Cusumano-Towner, Benjamin Sherman, Michael Carbin, and Vikash K. Mansinghka. 2020. Trace types and denotational semantics for sound programmable inference in probabilistic languages. Proc. ACM Program. Lang., 4, POPL (2020), 19:1–19:32. https://doi.org/10.1145/3371087 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ravi Mangal, Kartik Sarangmath, Aditya V. Nori, and Alessandro Orso. 2020. Probabilistic Lipschitz Analysis of Neural Networks. In Static Analysis Symposium (SAS). 274–309. https://doi.org/10.1007/978-3-030-65474-0_13 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Vikash K. Mansinghka, Daniel Selsam, and Yura N. Perov. 2014. Venture: a higher-order probabilistic programming platform with programmable inference. arXiv:1404.0099, https://doi.org/10.48550/ARXIV.1404.0099 Google ScholarGoogle Scholar
  33. T. Minka, J.M. Winn, J.P. Guiver, S. Webster, Y. Zaykov, B. Yangel, A. Spengler, and J. Bronskill. 2014. Infer.NET 2.6. https://dotnet.github.io/infer/ Google ScholarGoogle Scholar
  34. Praveen Narayanan, Jacques Carette, Wren Romano, Chung-chieh Shan, and Robert Zinkov. 2016. Probabilistic inference by program transformation in Hakaru (system description). In Functional and Logic Programming (FLOPS). 62–79. https://doi.org/10.1007/978-3-319-29604-3_5 Google ScholarGoogle ScholarCross RefCross Ref
  35. Radford M. Neal. 2011. MCMC Using Hamiltonian Dynamics. In Handbook of Markov Chain Monte Carlo. 113–162. https://doi.org/10.1201/b10905 Google ScholarGoogle ScholarCross RefCross Ref
  36. Aditya V. Nori, Chung-Kil Hur, Sriram K. Rajamani, and Selva Samuel. 2014. R2: An Efficient MCMC Sampler for Probabilistic Programs. In AAAI Conference on Artificial Intelligence (AAAI). 2476–2482. https://doi.org/10.1609/aaai.v28i1.9060 Google ScholarGoogle ScholarCross RefCross Ref
  37. André Platzer. 2018. Logical Foundations of Cyber-Physical Systems. Springer. https://doi.org/10.1007/978-3-319-63588-0 Google ScholarGoogle ScholarCross RefCross Ref
  38. Rajesh Ranganath, Sean Gerrish, and David M. Blei. 2014. Black Box Variational Inference. In Artificial Intelligence and Statistics (AISTATS). 814–822. Google ScholarGoogle Scholar
  39. Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic Backpropagation and Approximate Inference in Deep Generative Models. In International Conference on Machine Learning (ICML). 7344–7353. Google ScholarGoogle Scholar
  40. Daniel Ritchie, Paul Horsfall, and Noah D. Goodman. 2016. Deep Amortized Inference for Probabilistic Programs. arXiv:1610.05735, https://doi.org/10.48550/ARXIV.1610.05735 Google ScholarGoogle Scholar
  41. Daniel Ritchie, Andreas Stuhlmüller, and Noah D. Goodman. 2016. C3: Lightweight Incrementalized MCMC for Probabilistic Programs using Continuations and Callsite Caching. In Artificial Intelligence and Statistics (AISTATS). 28–37. Google ScholarGoogle Scholar
  42. John Salvatier, Thomas V. Wiecki, and Christopher Fonnesbeck. 2016. Probabilistic programming in Python using PyMC3. PeerJ Comput. Sci., 2 (2016), e55. https://doi.org/10.7717/peerj-cs.55 Google ScholarGoogle ScholarCross RefCross Ref
  43. John Schulman, Nicolas Heess, Theophane Weber, and Pieter Abbeel. 2015. Gradient Estimation Using Stochastic Computation Graphs. In Neural Information Processing Systems (NIPS). 3528–3536. Google ScholarGoogle Scholar
  44. N. Siddharth, Brooks Paige, Jan-Willem van de Meent, Alban Desmaison, Noah D. Goodman, Pushmeet Kohli, Frank Wood, and Philip Torr. 2017. Learning Disentangled Representations with Semi-Supervised Deep Generative Models. In Neural Information Processing Systems (NIPS). 5927–5937. Google ScholarGoogle Scholar
  45. Sam Staton, Hongseok Yang, Frank D. Wood, Chris Heunen, and Ohad Kammar. 2016. Semantics for probabilistic programming: higher-order functions, continuous distributions, and soft constraints. In Logic in Computer Science (LICS). 525–534. https://doi.org/10.1145/2933575.2935313 Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. David Tolpin, Jan-Willem van de Meent, Hongseok Yang, and Frank D. Wood. 2016. Design and Implementation of Probabilistic Programming Language Anglican. In Implementation and Application of Functional Programming Languages (IFL). 6:1–6:12. https://doi.org/10.1145/3064899.3064910 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Dustin Tran, Matthew D. Hoffman, Dave Moore, Christopher Suter, Srinivas Vasudevan, and Alexey Radul. 2018. Simple, Distributed, and Accelerated Probabilistic Programming. In Neural Information Processing Systems (NeurIPS). 7609–7620. Google ScholarGoogle Scholar
  48. Dustin Tran, Alp Kucukelbir, Adji B. Dieng, Maja R. Rudolph, Dawen Liang, and David M. Blei. 2016. Edward: A library for probabilistic modeling, inference, and criticism. arXiv:1610.09787, https://doi.org/10.48550/ARXIV.1610.09787 Google ScholarGoogle Scholar
  49. Uber AI Labs. 2022. Pyro examples. http://pyro.ai/examples/ Version used: June 18, 2022 Google ScholarGoogle Scholar
  50. Jan-Willem van de Meent, Brooks Paige, Hongseok Yang, and Frank Wood. 2018. An Introduction to Probabilistic Programming. arXiv:1809.10756, https://doi.org/10.48550/ARXIV.1809.10756 Google ScholarGoogle Scholar
  51. Di Wang, Jan Hoffmann, and Thomas W. Reps. 2021. Sound probabilistic inference via guide types. In Programming Language Design and Implementation (PLDI). 788–803. https://doi.org/10.1145/3453483.3454077 Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. WebPPL. 2019. https://github.com/probmods/webppl/blob/v0.9.15/src/header.wppl#L510 Google ScholarGoogle Scholar
  53. Ronald J. Williams. 1992. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning, 8, 3-4 (1992), 229–256. https://doi.org/10.1007/BF00992696 Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. David Wingate, Noah D. Goodman, Andreas Stuhlmüller, and Jeffrey Mark Siskind. 2011. Nonstandard Interpretations of Probabilistic Programs for Efficient Inference. In Neural Information Processing Systems (NIPS). 1152–1160. Google ScholarGoogle Scholar
  55. David Wingate, Andreas Stuhlmüller, and Noah D. Goodman. 2011. Lightweight Implementations of Probabilistic Programming Languages Via Transformational Compilation. In Artificial Intelligence and Statistics (AISTATS). 770–778. Google ScholarGoogle Scholar
  56. David Wingate and Theophane Weber. 2013. Automated Variational Inference in Probabilistic Programming. arXiv:1301.1299, https://doi.org/10.48550/ARXIV.1301.1299 Google ScholarGoogle Scholar
  57. Frank Wood, Jan Willem van de Meent, and Vikash Mansinghka. 2014. A New Approach to Probabilistic Programming Inference. In Artificial Intelligence and Statistics (AISTATS). 1024–1032. Google ScholarGoogle Scholar
  58. Chenling Xu, Romain Lopez, Edouard Mehlman, Jeffrey Regier, Michael I Jordan, and Nir Yosef. 2021. Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models. Molecular systems biology, 17, 1 (2021), e9620. https://doi.org/10.15252/msb.20209620 Google ScholarGoogle ScholarCross RefCross Ref
  59. Yuan Zhou, Hongseok Yang, Yee Whye Teh, and Tom Rainforth. 2020. Divide, Conquer, and Combine: a New Inference Strategy for Probabilistic Programs with Stochastic Support. In International Conference on Machine Learning (ICML). 11534–11545. Google ScholarGoogle Scholar

Index Terms

  1. Smoothness Analysis for Probabilistic Programs with Application to Optimised Variational Inference

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Article Metrics

            • Downloads (Last 12 months)149
            • Downloads (Last 6 weeks)23

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!