Abstract
Probabilistic programming languages are valuable because they allow domain experts to express probabilistic models and inference algorithms without worrying about irrelevant details. However, for decades there remained an important and popular class of probabilistic inference algorithms whose efficient implementation required manual low-level coding that is tedious and error-prone. They are algorithms whose idiomatic expression requires random array variables that are latent or whose likelihood is conjugate. Although that is how practitioners communicate and compose these algorithms on paper, executing such expressions requires eliminating the latent variables and recognizing the conjugacy by symbolic mathematics. Moreover, matching the performance of handwritten code requires speeding up loops by more than a constant factor.
We show how probabilistic programs that directly and concisely express these desired inference algorithms can be compiled while maintaining efficiency. We introduce new transformations that turn high-level probabilistic programs with arrays into pure loop code. We then make great use of domain-specific invariants and norms to optimize the code, and to specialize and JIT-compile the code per execution. The resulting performance is competitive with manual implementations.
Supplemental Material
- Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. 1986. Compilers: Principles, Techniques, and Tools. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA. Google Scholar
Digital Library
- Thomas Bayes. 1763. An Essay towards Solving a Problem in the Doctrine of Chances. Philosophical Transactions of the Royal Society of London 53 (1763), 370–418.Google Scholar
- Michael Betancourt. 2017. A Conceptual Introduction to Hamiltonian Monte Carlo. e-Print 1701.02434. arXiv.org. https: //arxiv.org/abs/1701.02434Google Scholar
- David Blackwell. 1947. Conditional Expectation and Unbiased Sequential Estimation. The Annals of Mathematical Statistics 18, 1 (March 1947), 105–110.Google Scholar
Cross Ref
- David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research 3, Jan. (Jan. 2003), 993–1022. Google Scholar
Digital Library
- Johannes Borgström, Andrew D. Gordon, Long Ouyang, Claudio V. Russo, Adam Scibior, and Marcin Szymczak. 2016. Fabular: Regression Formulas as Probabilistic Programming. In Proceedings of the 43th Symposium on Principles of Programming Languages (POPL). ACM Press, 271–283. Google Scholar
Digital Library
- Wray L. Buntine. 1994. Operations for Learning with Graphical Models. Journal of Artificial Intelligence Research 2 (1994), 159–225. Google Scholar
Digital Library
- Jacques Carette and Chung-chieh Shan. 2016. Simplifying Probabilistic Programs Using Computer Algebra. In Practical Aspects of Declarative Languages: 18th International Symposium, PADL 2016 (Lecture Notes in Computer Science), Marco Gavanelli and John H. Reppy (Eds.). 135–152.Google Scholar
- Bob Carpenter, Andrew Gelman, Matthew Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. Stan: A Probabilistic Programming Language. Journal of Statistical Software 76, 1 (2017), 1–32.Google Scholar
- George Casella and Christian P. Robert. 1996. Rao-Blackwellisation of Sampling Schemes. Biometrika 83, 1 (1996), 81–94.Google Scholar
Cross Ref
- Frédéric Chyzak and Bruno Salvy. 1998. Non-commutative Elimination in Ore Algebras Proves Multivariate Holonomic Identities. Journal of Symbolic Computation 26, 2 (1998), 187–227. Google Scholar
Digital Library
- Samantha R. Cook, Andrew Gelman, and Donald B. Rubin. 2006. Validation of Software for Bayesian Models Using Posterior Quantiles. Journal of Computational and Graphical Statistics 15, 3 (2006), 675–692.Google Scholar
Cross Ref
- Luc De Raedt, Angelika Kimmig, and Hannu Toivonen. 2007. ProbLog: A Probabilistic Prolog and its Application in Link Discovery. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, Manuela M. Veloso (Ed.). 2462–2467. Google Scholar
Digital Library
- Rodrigo de Salvo Braz, Eyal Amir, and Dan Roth. 2007. Lifted First-Order Probabilistic Inference. In Introduction to Statistical Relational Learning, Lise Getoor and Ben Taskar (Eds.). MIT Press, 433–451.Google Scholar
- Rodrigo de Salvo Braz and Ciaran O’Reilly. 2017. Exact Inference for Relational Graphical Models with Interpreted Functions: Lifted Probabilistic Inference Modulo Theories, Gal Elidan, Kristian Kersting, and Alexander T. Ihler (Eds.). AUAI Press. Google Scholar
Digital Library
- Rodrigo de Salvo Braz, Ciaran O’Reilly, Vibhav Gogate, and Rina Dechter. 2016. Probabilistic Inference Modulo Theories. In Proceedings of the 25th International Joint Conference on Artificial Intelligence, Subbarao Kambhampati (Ed.). AAAI Press, 3591–3599. http://www.ijcai.org/Abstract/16/506 Google Scholar
Digital Library
- Rina Dechter. 1998. Bucket Elimination: A Unifying Framework for Probabilistic Inference. In Learning and Inference in Graphical Models, Michael I. Jordan (Ed.). Kluwer, Dordrecht. Paperback: Learning in Graphical Models, MIT Press. Google Scholar
Digital Library
- Dua Dheeru and Efi Karra Taniskidou. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle Scholar
- Bernd Fischer and Johann Schumann. 2003. AutoBayes: A System for Generating Data Analysis Programs from Statistical Models. Journal of Functional Programming 13, 3 (2003), 483–508. Google Scholar
Digital Library
- Cormac Flanagan, Amr Sabry, Bruce F. Duba, and Matthias Felleisen. 1993. The Essence of Compiling with Continuations. In Proceedings of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation (PLDI ’93). ACM, New York, NY, USA, 237–247. Google Scholar
Digital Library
- Timon Gehr, Sasa Misailovic, and Martin T. Vechev. 2016. PSI: Exact Symbolic Inference for Probabilistic Programs. In Proceedings of the 28th International Conference on Computer Aided Verification, Part I (Lecture Notes in Computer Science), Swarat Chaudhuri and Azadeh Farzan (Eds.). Springer, 62–83.Google Scholar
- Alan E. Gelfand and Adrian F. M. Smith. 1990. Sampling-Based Approaches to Calculating Marginal Densities. J. Amer. Statist. Assoc. 85, 410 (1990), 398–409.Google Scholar
Cross Ref
- Andrew Gelman, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. 2014. Bayesian Data Analysis (third ed.). CRC Press.Google Scholar
- John Geweke. 2004. Getting It Right. J. Amer. Statist. Assoc. 99, 467 (2004), 799–804.Google Scholar
Cross Ref
- Michèle Giry. 1982. A Categorical Approach to Probability Theory. In Categorical Aspects of Topology and Analysis: Proceedings of an International Conference Held at Carleton University, Ottawa, August 11–15, 1981, Bernhard Banaschewski (Ed.). Springer, 68–85.Google Scholar
Cross Ref
- Noah D. Goodman, Vikash K. Mansinghka, Daniel Roy, Keith Bonawitz, and Joshua B. Tenenbaum. 2008. Church: A Language for Generative Models. In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, David Allen McAllester and Petri Myllymäki (Eds.). 220–229. Google Scholar
Digital Library
- Noah D. Goodman and Andreas Stuhlmüller. 2014. The Design and Implementation of Probabilistic Programming Languages. http://dippl.org .Google Scholar
- Thomas L. Griffiths and Mark Steyvers. 2004. Finding Scientific Topics. Proceedings of the National Academy of Sciences 101, suppl 1 (2004), 5228–5235. https://www.pnas.org/content/101/suppl_1/5228Google Scholar
Cross Ref
- Matthew D. Hoffman and Andrew Gelman. 2014. The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research 15, 1 (2014), 1593–1623. Google Scholar
Digital Library
- Matthew D. Hoffman, Matthew J. Johnson, and Dustin Tran. 2018. Autoconj: Recognizing and Exploiting Conjugacy Without a Domain-Specific Language. In Advances in Neural Information Processing Systems, Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (Eds.). 10739–10749. http://papers.nips.cc/ paper/8270-autoconj-recognizing-and-exploiting-conjugacy-without-a-domain-specific-language.pdf Google Scholar
Digital Library
- Daniel Huang, Jean-Baptiste Tristan, and Greg Morrisett. 2017. Compiling Markov Chain Monte Carlo Algorithms for Probabilistic Modeling. In PLDI ’17: Proceedings of the ACM Conference on Programming Language Design and Implementation, Albert Cohen and Martin T. Vechev (Eds.). ACM Press, 111–125. Google Scholar
Digital Library
- Thorsten Joachims. 1997. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. In Proceedings of the Fourteenth International Conference on Machine Learning (ICML ’97). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 143–151. http://dl.acm.org/citation.cfm?id=645526.657278 Google Scholar
Digital Library
- Manuel Kauers. 2013. The Holonomic Toolkit. In Computer Algebra in Quantum Field Theory, Carsten Schneider and Johannes Blümlein (Eds.). Springer, 119–144.Google Scholar
- Oleg Kiselyov. 2016. Probabilistic Programming Language and its Incremental Evaluation. In Proceedings of APLAS 2016: 14th Asian Symposium on Programming Languages and Systems (Lecture Notes in Computer Science), Atsushi Igarashi (Ed.). Springer, 357–376.Google Scholar
Cross Ref
- Oleg Kiselyov and Chung-chieh Shan. 2009. Embedded Probabilistic Programming. In Proceedings of the Working Conference on Domain-Specific Languages (Lecture Notes in Computer Science), Walid Mohamed Taha (Ed.). Springer, 360–384. Google Scholar
Digital Library
- Daphne Koller and Nir Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques. MIT Press. Google Scholar
Digital Library
- Andrey N. Kolmogorov. 1950. Unbiased Estimates. Izvestiya Akademii Nauk SSSR Seriya Matematicheskaya 14, 4 (1950), 303–326.Google Scholar
- Jun S. Liu. 1994. The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem. J. Amer. Statist. Assoc. 89, 427 (1994), 958–966.Google Scholar
Cross Ref
- Jun S. Liu, Wing Hung Wong, and Augustine Kong. 1994. Covariance Structure of the Gibbs Sampler with Applications to the Comparisons of Estimators and Augmentation Schemes. Biometrika 81, 1 (1994), 27–40.Google Scholar
Cross Ref
- David J. Lunn, Andrew Thomas, Nicky Best, and David Spiegelhalter. 2000. WinBUGS—A Bayesian Modelling Framework: Concepts, Structure, and Extensibility. Statistics and Computing 10, 4 (2000), 325–337. Google Scholar
Digital Library
- David J. C. MacKay. 1998. Introduction to Monte Carlo Methods. In Learning and Inference in Graphical Models, Michael I. Jordan (Ed.). Kluwer, Dordrecht. Paperback: Learning in Graphical Models, MIT Press. Google Scholar
Digital Library
- Vikash Mansinghka, Daniel Selsam, and Yura Perov. 2014. Venture: a Higher-Order Probabilistic Programming Platform with Programmable Inference. e-Print 1404.0099. arXiv.org.Google Scholar
- Andrew McCallum and Kamal Nigam. 1998. A Comparison of Event Models for Naive Bayes Text Classification. In AAAI-98 workshop on learning for text categorization, Vol. 752. 41–48.Google Scholar
- Andrew Kachites McCallum. 2002. MALLET: A Machine Learning for Language Toolkit. http://mallet.cs.umass.eduGoogle Scholar
- Xiao-Li Meng and David A. van Dyk. 1999. Seeking Efficient Data Augmentation Schemes via Conditional and Marginal Augmentation. Biometrika 86, 2 (1999), 301–320.Google Scholar
Cross Ref
- Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, and Andrey Kolobov. 2007. BLOG: Probabilistic Models with Unknown Objects. In Introduction to Statistical Relational Learning, Lise Getoor and Ben Taskar (Eds.). MIT Press, Chapter 13, 373–398. Google Scholar
Digital Library
- Lawrence M. Murray, Daniel Lundén, Jan Kudlicka, David Broman, and Thomas B. Schön. 2018. Delayed Sampling and Automatic Rao-Blackwellization of Probabilistic Programs. In Proceedings of AISTATS 2018: 21st International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research), Amos Storkey and Fernando Perez-Cruz (Eds.). 1037–1046.Google Scholar
- Praveen Narayanan, Jacques Carette, Wren Romano, Chung-chieh Shan, and Robert Zinkov. 2016. Probabilistic Inference by Program Transformation in Hakaru (System Description). In Proceedings of FLOPS 2016: 13th International Symposium on Functional and Logic Programming (Lecture Notes in Computer Science), Oleg Kiselyov and Andy King (Eds.). Springer, 62–79.Google Scholar
Cross Ref
- Praveen Narayanan and Chung-chieh Shan. 2017. Symbolic Conditioning of Arrays in Probabilistic Programs. Proceedings of the ACM on Programming Languages 1, ICFP (2017), 11:1–11:25. Google Scholar
Digital Library
- Radford M. Neal. 2011. MCMC Using Hamiltonian Dynamics. In Handbook of Markov Chain Monte Carlo, Steve Brooks, Andrew Gelman, Galin Jones, and Xiao-Li Meng (Eds.). CRC Press, Chapter 5.Google Scholar
- Aditya V. Nori, Chung-Kil Hur, Sriram K. Rajamani, and Selva Samuel. 2014. R2: An Efficient MCMC Sampler for Probabilistic Programs. In Proceedings of the 28th AAAI Conference on Artificial Intelligence, Carla E. Brodley and Peter Stone (Eds.). AAAI Press, 2476–2482. Google Scholar
Digital Library
- Fritz H. Obermeyer, Eli Bingham, Martin Jankowiak, Neeraj Pradhan, and Noah Goodman. 2018. Automated Enumeration of Discrete Latent Variables. (2018). Poster at PROBPROG 2018.Google Scholar
- Anand Patil, David Huard, and Christopher J. Fonnesbeck. 2010. PyMC: Bayesian Stochastic Modelling in Python. Journal of Statistical Software 35, 4 (July 2010), 1–81.Google Scholar
Cross Ref
- Karl Pearson. 1894. III. Contributions to the Mathematical Theory of Evolution. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences 185 (1894), 71–110.Google Scholar
Cross Ref
- Avi Pfeffer. 2007. The Design and Implementation of IBAL: A General-Purpose Probabilistic Language. In Introduction to Statistical Relational Learning, Lise Getoor and Ben Taskar (Eds.). MIT Press, Chapter 14, 399–432.Google Scholar
- Avi Pfeffer. 2016. Practical Probabilistic Programming. Manning Publications. Google Scholar
Digital Library
- Martyn Plummer. 2003. JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In Proceedings of the 3rd International Workshop on Distributed Statistical Computing.Google Scholar
- David Pollard. 2001. A User’s Guide to Measure Theoretic Probability. Cambridge University Press.Google Scholar
- David Poole and Nevin Lianwen Zhang. 2003. Exploiting Contextual Independence In Probabilistic Inference. Journal of Artificial Intelligence Research 18 (2003), 263–313. Google Scholar
Cross Ref
- Lawrence R. Rabiner. 1989. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc. IEEE 77, 2 (Feb. 1989), 257–286. Google Scholar
Digital Library
- Norman Ramsey and Avi Pfeffer. 2002. Stochastic Lambda Calculus and Monads of Probability Distributions. In Proceedings of the 29th Symposium on Principles of Programming Languages (POPL). ACM Press, 154–165. Google Scholar
Digital Library
- C. Radhakrishna Rao. 1945. Information and Accuracy Attainable in the Estimation of Statistical Parameters. Bulletin of the Calcutta Mathematical Society 37, 3 (1945), 81–91.Google Scholar
- Philip Resnik and Eric Hardisty. 2010. Gibbs Sampling for the Uninitiated. Technical Report CS-TR-4956 UMIACS-TR-2010-04 LAMP-TR-153. University of Maryland.Google Scholar
- Scott Sanner and Ehsan Abbasnejad. 2012. Symbolic Variable Elimination for Discrete and Continuous Graphical Models. In Proceedings of the 26th AAAI Conference on Artificial Intelligence, Jörg Hoffmann and Bart Selman (Eds.). AAAI Press, 1954–1960. Google Scholar
Digital Library
- Chung-chieh Shan and Norman Ramsey. 2017. Exact Bayesian Inference by Symbolic Disintegration. In Proceedings of the 44th Symposium on Principles of Programming Languages (POPL). ACM Press, 130–144. Google Scholar
Digital Library
- Sam Staton. 2017. Commutative Semantics for Probabilistic Programming. In Programming Languages and Systems: Proceedings of ESOP 2017, 26th European Symposium on Programming (Lecture Notes in Computer Science), Yang Hongseok (Ed.). Springer, 855–879. Google Scholar
Digital Library
- Dustin Tran, Matthew D. Hoffman, Rif A. Saurous, Eugene Brevdo, Kevin Murphy, and David M. Blei. 2017. Deep Probabilistic Programming. e-Print 1701.03757. arXiv.org. 5th International Conference on Learning Representations.Google Scholar
- Jean-Baptiste Tristan, Daniel Huang, Joseph Tassarotti, Adam C. Pocock, Stephen J. Green, and Guy Lewis Steele, Jr. 2014. Augur: a Modeling Language for Data-Parallel Probabilistic Inference. e-Print 1312.3613. arXiv.org. http://arxiv.org/abs/ 1312.3613Google Scholar
- Deepak Venugopal and Vibhav Gogate. 2013. Dynamic Blocking and Collapsing for Gibbs Sampling. In Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence, Ann Nicholson and Padhraic Smyth (Eds.). 664–673. Google Scholar
Digital Library
- Herbert S. Wilf and Doron Zeilberger. 1992. An Algorithmic Proof Theory for Hypergeometric (Ordinary and “q”) Multisum/Integral Identities. Inventiones mathematicae 108 (1992), 557–633.Google Scholar
- David Wingate, Andreas Stuhlmüller, and Noah D. Goodman. 2011. Lightweight Implementations of Probabilistic Programming Languages Via Transformational Compilation. In Proceedings of AISTATS 2011: 14th International Conference on Artificial Intelligence and Statistics (JMLR Workshop and Conference Proceedings), Geoffrey Gordon, David Dunson, and Miroslav Dudík (Eds.). MIT Press, 770–778.Google Scholar
- Frank Wood, Jan Willem van de Meent, and Vikash Mansinghka. 2014. A New Approach to Probabilistic Programming Inference. In Proceedings of AISTATS 2014: 17th International Conference on Artificial Intelligence and Statistics (JMLR Workshop and Conference Proceedings). 1024–1032.Google Scholar
- Yi Wu, Lei Li, Stuart J. Russell, and Rastislav Bodík. 2016. Swift: Compiled Inference for Probabilistic Programming Languages. In Proceedings of the 25th International Joint Conference on Artificial Intelligence, Subbarao Kambhampati (Ed.). AAAI Press, 3637–3645. http://www.ijcai.org/Abstract/16/512 Google Scholar
Digital Library
- Nevin Lianwen Zhang and David L. Poole. 1994. A Simple Approach to Bayesian Network Computations. In Proceedings of the 10th Canadian Conference on Artificial Intelligence. 171–178.Google Scholar
- Nevin Lianwen Zhang and David L. Poole. 1996. Exploiting Causal Independence in Bayesian Network Inference. Journal of Artificial Intelligence Research 5 (1996), 301–328. Google Scholar
Digital Library
- Robert Zinkov and Chung-chieh Shan. 2017. Composing Inference Algorithms as Program Transformations, Gal Elidan, Kristian Kersting, and Alexander T. Ihler (Eds.). AUAI Press.Google Scholar
Index Terms
From high-level inference algorithms to efficient code
Recommendations
Exact Bayesian inference by symbolic disintegration
POPL '17: Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming LanguagesBayesian inference, of posterior knowledge from prior knowledge and observed evidence, is typically defined by Bayes's rule, which says the posterior multiplied by the probability of an observation equals a joint probability. But the observation of a ...
Pseudo-marginal Bayesian inference for Gaussian process latent variable models
AbstractA Bayesian inference framework for supervised Gaussian process latent variable models is introduced. The framework overcomes the high correlations between latent variables and hyperparameters by collapsing the statistical model through approximate ...
Efficient inference for nonparametric Hawkes processes using auxiliary latent variables
The expressive ability of classic Hawkes processes is limited due to the parametric assumption on the baseline intensity and triggering kernel. Therefore, it is desirable to perform inference in a data-driven, nonparametric approach. Many recent works ...






Comments