skip to main content
research-article
Open access

Formally Verified Samplers from Probabilistic Programs with Loops and Conditioning

Published: 06 June 2023 Publication History

Abstract

We present Zar: a formally verified compiler pipeline from discrete probabilistic programs with unbounded loops in the conditional probabilistic guarded command language (cpGCL) to proved-correct executable samplers in the random bit model. We exploit the key idea that all discrete probability distributions can be reduced to unbiased coin-flipping schemes. The compiler pipeline first translates cpGCL programs into choice-fix trees, an intermediate representation suitable for reduction of biased probabilistic choices. Choice-fix trees are then translated to coinductive interaction trees for execution within the random bit model. The correctness of the composed translations establishes the sampling equidistribution theorem: compiled samplers are correct wrt. the conditional weakest pre-expectation semantics of cpGCL source programs. Zar is implemented and fully verified in the Coq proof assistant. We extract verified samplers to OCaml and Python and empirically validate them on a number of illustrative examples.

Formats available

You can view the full content in the following formats:

References

[1]
Samson Abramsky and Achim Jung. 1994. Domain Theory. In Handbook of Logic in Computer Science, S. Abramsky, D. M. Gabbay, and T. S. E. Maibaum (Eds.). 3, Clarendon Press, 1–168. isbn:019853762X https://global.oup.com/academic/product/handbook-of-logic-in-computer-science-9780198537625
[2]
Sheldon B. Akers. 1978. Binary Decision Diagrams. IEEE Trans. Computers, 27, 6 (1978), 509–516. https://doi.org/10.1109/TC.1978.1675141
[3]
Randy Allen and Ken Kennedy. 1987. Automatic Translation of Fortran Programs to Vector Form. ACM Trans. Program. Lang. Syst., 9, 4 (1987), 491–542. https://doi.org/10.1145/29873.29875
[4]
Diego F. Aranha, Felipe Rodrigues Novaes, Akira Takahashi, Mehdi Tibouchi, and Yuval Yarom. 2020. LadderLeak: Breaking ECDSA with Less than One Bit of Nonce Leakage. In CCS ’20: 2020 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, USA, November 9-13, 2020, Jay Ligatti, Xinming Ou, Jonathan Katz, and Giovanni Vigna (Eds.). ACM, 225–242. https://doi.org/10.1145/3372297.3417268
[5]
J. Scott Armstrong. 1985. Long-Range Forecasting: From Crystal Ball to Computer. John Wiley & Sons, New York.
[6]
Alexander Bagnall, Gordon Stewart, and Anindya Banerjee. 2020. Coinductive Trees for Exact Inference of Probabilistic Programs. In LAFI 2020: Languages for Inference.
[7]
Alexander Bagnall, Gordon Stewart, and Anindya Banerjee. 2023. Formally Verified Samplers From Probabilistic Programs With Loops and Conditioning. https://doi.org/10.5281/zenodo.7809333
[8]
Verónica Becher and Serge Grigorieff. 2022. Randomness and uniform distribution modulo one. Inf. Comput., 285, Part (2022), 104857. https://doi.org/10.1016/j.ic.2021.104857
[9]
Eli Bingham, Jonathan P. Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj Pradhan, Theofanis Karaletsos, Rohit Singh, Paul A. Szerlip, Paul Horsfall, and Noah D. Goodman. 2019. Pyro: Deep Universal Probabilistic Programming. J. Mach. Learn. Res., 20 (2019), 28:1–28:6. http://jmlr.org/papers/v20/18-403.html
[10]
George E. .P. Box and George C. Tiao. 2011. Bayesian Inference in Statistical Analysis. John Wiley & Sons.
[11]
Clément L Canonne, Gautam Kamath, and Thomas Steinke. 2020. The discrete gaussian for differential privacy. Advances in Neural Information Processing Systems, 33 (2020), 15676–15688.
[12]
Arthur Charguéraud. 2017. CoqAndAxioms. https://github.com/coq/coq/wiki/CoqAndAxioms
[13]
Mark Chavira and Adnan Darwiche. 2008. On probabilistic inference by weighted model counting. Artif. Intell., 172, 6-7 (2008), 772–799. https://doi.org/10.1016/j.artint.2007.11.002
[14]
Adam Chlipala. 2013. Certified Programming with Dependent Types - A Pragmatic Introduction to the Coq Proof Assistant. MIT Press. isbn:978-0-262-02665-9 http://mitpress.mit.edu/books/certified-programming-dependent-types
[15]
Adnan Darwiche and Pierre Marquis. 2002. A Knowledge Compilation Map. J. Artif. Intell. Res., 17 (2002), 229–264. https://doi.org/10.1613/jair.989
[16]
Leonardo Mendonça de Moura, Soonho Kong, Jeremy Avigad, Floris van Doorn, and Jakob von Raumer. 2015. The Lean Theorem Prover (System Description). In Automated Deduction - CADE-25 - 25th International Conference on Automated Deduction, Berlin, Germany, August 1-7, 2015, Proceedings, Amy P. Felty and Aart Middeldorp (Eds.) (Lecture Notes in Computer Science, Vol. 9195). Springer, 378–388. https://doi.org/10.1007/978-3-319-21401-6_26
[17]
Edsger W. Dijkstra. 1975. Guarded Commands, Nondeterminacy and Formal Derivation of Programs. Commun. ACM, 18, 8 (1975), 453–457. https://doi.org/10.1145/360933.360975
[18]
Rodney G. Downey and Evan J. Griffiths. 2004. Schnorr randomness. J. Symb. Log., 69, 2 (2004), 533–554. https://doi.org/10.2178/jsl/1082418542
[19]
Saikat Dutta, Owolabi Legunsen, Zixin Huang, and Sasa Misailovic. 2018. Testing probabilistic programming systems. In Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November 04-09, 2018, Gary T. Leavens, Alessandro Garcia, and Corina S. Pasareanu (Eds.). ACM, 574–586. https://doi.org/10.1145/3236024.3236057
[20]
Saikat Dutta, Wenxian Zhang, Zixin Huang, and Sasa Misailovic. 2019. Storm: program reduction for testing and debugging probabilistic programming systems. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26-30, 2019, Marlon Dumas, Dietmar Pfahl, Sven Apel, and Alessandra Russo (Eds.). ACM, 729–739. https://doi.org/10.1145/3338906.3338972
[21]
Derek Elkins. 2021. Tying the Knot. https://wiki.haskell.org/Tying_the_Knot
[22]
Charles Geyer. 2011. Introduction to Markov Chain Monte Carlo. Handbook of markov chain monte carlo, 20116022 (2011), 45.
[23]
Arpita Ghosh, Tim Roughgarden, and Mukund Sundararajan. 2009. Universally utility-maximizing privacy mechanisms. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, Bethesda, MD, USA, May 31 - June 2, 2009, Michael Mitzenmacher (Ed.). ACM, 351–360. https://doi.org/10.1145/1536414.1536464
[24]
Noah D. Goodman, Vikash Mansinghka, Daniel M. Roy, Kallista A. Bonawitz, and Joshua B. Tenenbaum. 2012. Church: a language for generative models. CoRR, abs/1206.3255 (2012), arXiv:1206.3255. arxiv:1206.3255
[25]
Andrew D. Gordon, Thomas A. Henzinger, Aditya V. Nori, and Sriram K. Rajamani. 2014. Probabilistic programming. In Proceedings of the on Future of Software Engineering, FOSE 2014, Hyderabad, India, May 31 - June 7, 2014, James D. Herbsleb and Matthew B. Dwyer (Eds.). ACM, 167–181. https://doi.org/10.1145/2593882.2593900
[26]
Carl A. Gunter. 1993. Semantics of programming languages - structures and techniques. MIT Press. isbn:978-0-262-07143-7
[27]
Paul R Halmos. 2013. Measure theory. 18, Springer.
[28]
Steven Holtzen, Guy Van den Broeck, and Todd D. Millstein. 2020. Scaling exact inference for discrete probabilistic programs. Proc. ACM Program. Lang., 4, OOPSLA (2020), 140:1–140:31. https://doi.org/10.1145/3428208
[29]
Steven Holtzen, Todd D. Millstein, and Guy Van den Broeck. 2019. Symbolic Exact Inference for Discrete Probabilistic Programs. CoRR, abs/1904.02079 (2019), arXiv:1904.02079. arxiv:1904.02079
[30]
Daniel Huang, Jean-Baptiste Tristan, and Greg Morrisett. 2017. Compiling Markov chain Monte Carlo algorithms for probabilistic modeling. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017, Albert Cohen and Martin T. Vechev (Eds.). ACM, 111–125. https://doi.org/10.1145/3062341.3062375
[31]
Chung-Kil Hur, Georg Neis, Derek Dreyer, and Viktor Vafeiadis. 2013. The power of parameterization in coinductive proof. In The 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’13, Rome, Italy - January 23 - 25, 2013, Roberto Giacobazzi and Radhia Cousot (Eds.). ACM, 193–206. https://doi.org/10.1145/2429069.2429093
[32]
Benjamin Lucien Kaminski. 2019. Advanced weakest precondition calculi for probabilistic programs. Ph. D. Dissertation. RWTH Aachen University, Germany. http://publications.rwth-aachen.de/record/755408
[33]
Donald E. Knuth and Andrew C. Yao. 1976. The Complexity of Nonuniform Random Number Generation. In Algorithms and Complexity: New Directions and Recent Results, Joseph F. Traub (Ed.). Academic Press.
[34]
Nicolas Koh, Yao Li, Yishuai Li, Li-yao Xia, Lennart Beringer, Wolf Honoré, William Mansky, Benjamin C. Pierce, and Steve Zdancewic. 2019. From C to interaction trees: specifying, verifying, and testing a networked server. In Proceedings of the 8th ACM SIGPLAN International Conference on Certified Programs and Proofs, CPP 2019, Cascais, Portugal, January 14-15, 2019, Assia Mahboubi and Magnus O. Myreen (Eds.). ACM, 234–248. https://doi.org/10.1145/3293880.3294106
[35]
Dexter Kozen and Alexandra Silva. 2017. Practical coinduction. Math. Struct. Comput. Sci., 27, 7 (2017), 1132–1152. https://doi.org/10.1017/S0960129515000493
[36]
Lauwerens Kuipers and Harald Niederreiter. 2012. Uniform distribution of sequences. Courier Corporation.
[37]
Solomon Kullback and Richard A Leibler. 1951. On information and sufficiency. The annals of mathematical statistics, 22, 1 (1951), 79–86.
[38]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE, 86, 11 (1998), 2278–2324. https://doi.org/10.1109/5.726791
[39]
Xavier Leroy. 2009. Formal verification of a realistic compiler. Commun. ACM, 52, 7 (2009), 107–115. https://doi.org/10.1145/1538788.1538814
[40]
Xavier Leroy, Damien Doligez, Alain Frisch, Jacques Garrigue, Didier Rémy, and Jérôme Vouillon. 2021. The OCaml system release 4.13: Documentation and user’s manual. Inria, 1–876. https://hal.inria.fr/hal-00930213
[41]
Mohsen Lesani, Li-yao Xia, Anders Kaseorg, Christian J. Bell, Adam Chlipala, Benjamin C. Pierce, and Steve Zdancewic. 2022. C4: verified transactional objects. Proc. ACM Program. Lang., 6, OOPSLA (2022), 1–31. https://doi.org/10.1145/3527324
[42]
Thomas Letan and Yann Régis-Gianas. 2020. FreeSpec: specifying, verifying, and executing impure computations in Coq. In Proceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs, CPP 2020, New Orleans, LA, USA, January 20-21, 2020, Jasmin Blanchette and Catalin Hritcu (Eds.). ACM, 32–46. https://doi.org/10.1145/3372885.3373812
[43]
Pierre Letouzey. 2008. Extraction in Coq: An Overview. In Logic and Theory of Algorithms, 4th Conference on Computability in Europe, CiE 2008, Athens, Greece, June 15-20, 2008, Proceedings, Arnold Beckmann, Costas Dimitracopoulos, and Benedikt Löwe (Eds.) (Lecture Notes in Computer Science, Vol. 5028). Springer, 359–369. https://doi.org/10.1007/978-3-540-69407-6_39
[44]
Per Martin-Löf. 1966. The Definition of Random Sequences. Inf. Control., 9, 6 (1966), 602–619. https://doi.org/10.1016/S0019-9958(66)80018-9
[45]
Federico Olmedo, Friedrich Gretz, Nils Jansen, Benjamin Lucien Kaminski, Joost-Pieter Katoen, and Annabelle McIver. 2018. Conditioning in Probabilistic Programming. ACM Trans. Program. Lang. Syst., 40, 1 (2018), 4:1–4:50. https://doi.org/10.1145/3156018
[46]
Federico Olmedo, Benjamin Lucien Kaminski, Joost-Pieter Katoen, and Christoph Matheja. 2016. Reasoning about recursive probabilistic programs. In 2016 31st Annual ACM/IEEE Symposium on Logic in Computer Science (LICS). 1–10.
[47]
Daniel Patterson and Amal Ahmed. 2019. The next 700 compiler correctness theorems (functional pearl). Proc. ACM Program. Lang., 3, ICFP (2019), 85:1–85:29. https://doi.org/10.1145/3341689
[48]
pythonlib. 2022. pythonlib. https://github.com/janestreet/pythonlib
[49]
Sebastian Raschka and Vahid Mirjalili. 2019. Python machine learning: Machine learning and deep learning with Python, scikit-learn, and TensorFlow 2. Packt Publishing Ltd.
[50]
Reuven Y Rubinstein and Dirk P Kroese. 2016. Simulation and the Monte Carlo method. John Wiley & Sons.
[51]
Feras Saad, Cameron E. Freer, Martin C. Rinard, and Vikash Mansinghka. 2020. The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions. In The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020, 26-28 August 2020, Online [Palermo, Sicily, Italy], Silvia Chiappa and Roberto Calandra (Eds.) (Proceedings of Machine Learning Research, Vol. 108). PMLR, 1036–1046. http://proceedings.mlr.press/v108/saad20a.html
[52]
Feras A. Saad, Cameron E. Freer, Martin C. Rinard, and Vikash K. Mansinghka. 2020. Optimal approximate sampling from discrete probability distributions. Proc. ACM Program. Lang., 4, POPL (2020), 36:1–36:31. https://doi.org/10.1145/3371104
[53]
SciPy. 2022. scipy.stats. https://docs.scipy.org/doc/scipy/reference/stats.html
[54]
Kudelski Security. 2020. The definitive guide to "Modulo Bias and how to avoid it"!. https://research.kudelskisecurity.com/2020/07/28/the-definitive-guide-to-modulo-bias-and-how-to-avoid-it/
[55]
Daniel Selsam, Percy Liang, and David L Dill. 2018. Formal methods for probabilistic programming. In Workshop on Probabilistic Programming Languages, Semantics, and Systems.
[56]
Claude E. Shannon. 1948. A mathematical theory of communication. Bell Syst. Tech. J., 27, 3 (1948), 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
[57]
Marcin Szymczak and Joost-Pieter Katoen. 2020. Weakest Preexpectation Semantics for Bayesian Inference. CoRR, abs/2005.09013 (2020), arXiv:2005.09013. arxiv:2005.09013
[58]
Di Wang, Jan Hoffmann, and Thomas W. Reps. 2021. Sound probabilistic inference via guide types. In PLDI ’21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, Virtual Event, Canada, June 20-25, 2021, Stephen N. Freund and Eran Yahav (Eds.). ACM, 788–803. https://doi.org/10.1145/3453483.3454077
[59]
Hermann Weyl. 1916. Über die gleichverteilung von zahlen mod. eins. Math. Ann., 77, 3 (1916), 313–352.
[60]
Li-yao Xia, Yannick Zakowski, Paul He, Chung-Kil Hur, Gregory Malecha, Benjamin C. Pierce, and Steve Zdancewic. 2020. Interaction trees: representing recursive and impure programs in Coq. Proc. ACM Program. Lang., 4, POPL (2020), 51:1–51:32. https://doi.org/10.1145/3371119
[61]
Jonathan S Yedidia, William T Freeman, and Yair Weiss. 2003. Understanding belief propagation and its generalizations. Exploring artificial intelligence in the new millennium, 8 (2003), 236–239.
[62]
Hengchu Zhang, Wolf Honoré, Nicolas Koh, Yao Li, Yishuai Li, Li-yao Xia, Lennart Beringer, William Mansky, Benjamin C. Pierce, and Steve Zdancewic. 2021. Verifying an HTTP Key-Value Server with Interaction Trees and VST. In 12th International Conference on Interactive Theorem Proving, ITP 2021, June 29 to July 1, 2021, Rome, Italy (Virtual Conference), Liron Cohen and Cezary Kaliszyk (Eds.) (LIPIcs, Vol. 193). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 32:1–32:19. https://doi.org/10.4230/LIPIcs.ITP.2021.32
[63]
Raymond K Zhao, Ron Steinfeld, and Amin Sakzad. 2020. COSAC: Compact and scalable arbitrary-centered discrete Gaussian sampling over integers. In International Conference on Post-Quantum Cryptography. 284–303.

Cited By

View all
  • (2025)Parallelizable Feynman-Kac Models for Universal Probabilistic ProgrammingElectronic Proceedings in Theoretical Computer Science10.4204/EPTCS.428.8428(91-110)Online publication date: 16-Sep-2025
  • (2025)Random Variate Generation with Formal GuaranteesProceedings of the ACM on Programming Languages10.1145/37292519:PLDI(125-149)Online publication date: 13-Jun-2025
  • (2025)Guaranteed Bounds on Posterior Distributions of Discrete Probabilistic Programs with LoopsProceedings of the ACM on Programming Languages10.1145/37048749:POPL(1104-1135)Online publication date: 9-Jan-2025
  • Show More Cited By

Recommendations

Comments