skip to main content
research-article

A model-learner pattern for bayesian reasoning

Published:23 January 2013Publication History
Skip Abstract Section

Abstract

A Bayesian model is based on a pair of probability distributions, known as the prior and sampling distributions. A wide range of fundamental machine learning tasks, including regression, classification, clustering, and many others, can all be seen as Bayesian models. We propose a new probabilistic programming abstraction, a typed Bayesian model, which is based on a pair of probabilistic expressions for the prior and sampling distributions. A sampler for a model is an algorithm to compute synthetic data from its sampling distribution, while a learner for a model is an algorithm for probabilistic inference on the model. Models, samplers, and learners form a generic programming pattern for model-based inference. They support the uniform expression of common tasks including model testing, and generic compositions such as mixture models, evidence-based model averaging, and mixtures of experts. A formal semantics supports reasoning about model equivalence and implementation correctness. By developing a series of examples and three learner implementations based on exact inference, factor graphs, and Markov chain Monte Carlo, we demonstrate the broad applicability of this new programming pattern.

Skip Supplemental Material Section

Supplemental Material

r1d3_talk3.mp4

References

  1. , Olmedo, and Zanella Béguelin}Barthe:2012:CertiPrivG. Barthe, B. Köpf, F. Olmedo, and S. Zanella Béguelin. Probabilistic relational reasoning for differential privacy. In J. Field and M. Hicks, editors, phPOPL, pages 97--110. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Bhat, A. Agarwal, R. W. Vuduc, and A. G. Gray. A type theory for probability density functions. In J. Field and M. Hicks, editors, POPL, pages 545--556. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Bhat, J. Borgström, A. D. Gordon, and C. Russo. Deriving probability density functions from probabilistic functional programs. Draft paper, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. M. Bishop and M. Svensén. Bayesian hierarchical mixtures of experts. In C. Meek and U. Kjarulff, editors, Uncertainty in Artificial Intelligence (UAI'03), pages 57--64. Morgan Kaufmann, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3: 993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Borgström, A. D. Gordon, M. Greenberg, J. Margetson, and J. Van Gael. Measure transformer semantics for Bayesian machine learning. In European Symposium on Programming (ESOP'11), volume 6602 of LNCS, pages 77--96. Springer, 2011. Download available at http://research.microsoft.com/fun. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Bozga and O. Maler. On the representation of probabilities over structured domains. In Computer Aided Verification (CAV'09), pages 261--273, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Chavira and A. Darwiche. Compiling Bayesian networks using variable elimination. In International Joint Conference on on Artificial Intelligence (IJCAI'07), pages 2443--2449, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Claret, S. K. Rajamani, A. V. Nori, A. D. Gordon, and J. Borgström. Bayesian inference for probabilistic programs via symbolic execution. Technical Report MSR--TR--2012--86, Microsoft Research, 2012.Google ScholarGoogle Scholar
  10. P. Cousot and R. Cousot. Abstract interpretation: a unified lattice model for the static analysis of programs by construction or approximation of fixpoints. In POPL, pages 238--252, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Darwiche. Modeling and Reasoning with Bayesian Networks. CUP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. Daumé III. HBC: Hierarchical Bayes Compiler, 2008. Available at http://www.cs.utah.edu/ hal/HBC/.Google ScholarGoogle Scholar
  13. P. Domingos, S. Kok, D. Lowd, H. Poon, M. Richardson, and P. Singla. Markov logic. In L. De Raedt, P. Frasconi, K. Kersting, and S. Muggleton, editors, Probabilistic inductive logic programming, pages 92--117. Springer-Verlag, Berlin, Heidelberg, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Erwig and S. Kollmansberger. Functional pearls: Probabilistic functional programming in Haskell. J. Funct. Program., 16 (1): 21--34, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. W. R. Gilks, A. Thomas, and D. J. Spiegelhalter. A language and program for complex Bayesian modelling. The Statistician, 43: 169--178, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  16. M. Giry. A categorical approach to probability theory. In B. Banaschewski, editor, Categorical Aspects of Topology and Analysis, volume 915 of Lecture Notes in Mathematics, pages 68--85. Springer Berlin / Heidelberg, 1982.Google ScholarGoogle Scholar
  17. N. Goodman, V. K. Mansinghka, D. M. Roy, K. Bonawitz, and J. B. Tenenbaum. Church: a language for generative models. In Uncertainty in Artificial Intelligence (UAI'08), pages 220--229. AUAI Press, 2008.Google ScholarGoogle Scholar
  18. A. D. Gordon, M. Aizatulin, J. Borgström, G. Claret, T. Graepel, A. Nori, S. Rajamani, and C. Russo. A model-learner pattern for Bayesian reasoning. Technical Report MSR-TR-2013--1, Microsoft Research, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Guazzelli, M. Zeller, W. Chen, and G. Williams. PMML: An open standard for sharing models. The R Journal, 1 (1), May 2009.Google ScholarGoogle ScholarCross RefCross Ref
  20. V. Gupta, R. Jagadeesan, and P. Panangaden. Stochastic processes as concurrent constraint programs. In POPL, pages 189--202, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. W. K. Hastings. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57 (1): 97--109, 1970.Google ScholarGoogle ScholarCross RefCross Ref
  22. R. Herbrich, T. Minka, and T. Graepel. Trueskill™: A Bayesian skill rating system. In NIPS, pages 569--576, 2006.Google ScholarGoogle Scholar
  23. J. A. Hoeting, D. Madigan, A. E. Raftery, and C. T. Volinsky. Bayesian model averaging: A tutorial. Statistical Science, 14 (4): 382--401, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  24. R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton. Adaptive mixtures of local experts. Neural Computation, 3: 79--87, 1991. Google ScholarGoogle ScholarCross RefCross Ref
  25. C. Jones and G. D. Plotkin. A probabilistic powerdomain of evaluations. In Logic in Computer Science (LICS'89), pages 186--195. IEEE Computer Society, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. I. Jordan and R. A. Jacobs. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6 (2): 181--214, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. O. Kiselyov and C. Shan. Monolingual probabilistic programming using generalized coroutines. In Uncertainty in Artificial Intelligence (UAI'09), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. Koller, D. A. McAllester, and A. Pfeffer. Effective Bayesian inference for stochastic programs. In AAAI/IAAI, pages 740--747, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Kozen. Semantics of probabilistic programs. Journal of Computer and System Sciences, 22 (3): 328--350, 1981.Google ScholarGoogle ScholarCross RefCross Ref
  30. M. Z. Kwiatkowska, G. Norman, and D. Parker. Quantitative analysis with the probabilistic model checker PRISM. In Quantitative Aspects of Programming Languages (QAPL 2005), volume 153(2) of ENTCS, pages 5--31, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. J. C. MacKay. Information Theory, Inference, and Learning Algorithms. CUP, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. Mardziel, S. Magill, M. Hicks, and M. Srivatsa. Dynamic enforcement of knowledge-based security policies. In Computer Security Foundations Symposium (CSF'11), pages 114--128, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. McCallum, K. Schultz, and S. Singh. Factorie: Probabilistic programming via imperatively defined factor graphs. In NIPS, pages 1249--1257, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. McIver and C. Morgan. Abstraction, refinement and proof for probabilistic systems. Monographs in computer science. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. Equation of state calculations by fast computing machines. Journal of Chemical Physics, 21: 1087--1092, 1953.Google ScholarGoogle Scholar
  36. T. Minka. A family of algorithms for approximate Bayesian inference. PhD thesis, MIT, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. T. Minka and J. M. Winn. Gates. In phAdvances in Neural Information Processing Systems (NIPS'08), pages 1073--1080. MIT Press, 2008.Google ScholarGoogle Scholar
  38. T. Minka, J. Winn, J. Guiver, and A. Kannan. Infer.NET 2.3, Nov. 2009. Software available from http://research.microsoft.com/infernet.Google ScholarGoogle Scholar
  39. R. M. Neal. Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRG-TR-93--1, Dept. of Computer Science, University of Toronto, September 1993.Google ScholarGoogle Scholar
  40. S. Park, F. Pfenning, and S. Thrun. A probabilistic language based upon sampling functions. In POPL, pages 171--182. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J. Pearl and G. Shafer. Probabilistic reasoning in intelligent systems: Networks of plausible inference. Synthese-Dordrecht, 104 (1): 161, 1995.Google ScholarGoogle Scholar
  42. A. Pfeffer. IBAL: A probabilistic rational programming language. In B. Nebel, editor, International Joint Conference on Artificial Intelligence (IJCAI'01), pages 733--740. Morgan Kaufmann, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. A. Pfeffer. The design and implementation of IBAL: A general-purpose probabilistic language. In L. Getoor and B. Taskar, editors, Introduction to Statistical Relational Learning. MIT Press, 2007.Google ScholarGoogle Scholar
  44. A. Pfeffer. Practical probabilistic programming. In P. Frasconi and F. A. Lisi, editors, Inductive Logic Programming (ILP 2010), volume 6489 of Lecture Notes in Computer Science, pages 2--3. Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. D. Purves and V. Lyutsarev. Filzbach User Guide, 2012. Available at http://research.microsoft.com/en-us/um/cambridge/groups/science/tools/f%ilzbach/filzbach.htm.Google ScholarGoogle Scholar
  46. A. Radul. Report on the probabilistic language scheme. In Proceedings of the 2007 symposium on Dynamic languages, DLS'07, pages 2--10, New York, NY, USA, 2007. ACM. ISBN 978--1--59593--868--8. 10.1145/1297081.1297085. URL http://doi.acm.org/10.1145/1297081.1297085. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. N. Ramsey and A. Pfeffer. Stochastic lambda calculus and monads of probability distributions. In POPL, pages 154--165, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. N. Saheb-Djahromi. Probabilistic LCF. In Mathematical Foundations of Computer Science (MFCS), volume 64 of LNCS, pages 442--451. Springer, 1978.Google ScholarGoogle Scholar
  49. S. Sanner and D. A. McAllester. Affine Algebraic Decision Diagrams (AADDs) and their application to structured probabilistic inference. In International Joint Conference on on Artificial Intelligence (IJCAI'05), pages 1384--1390, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. J. Schumann, T. Pressburger, E. Denney, W. Buntine, and B. Fischer. AutoBayes program synthesis system users manual. Technical Report NASA/TM--2008--215366, NASA Ames Research Center, 2008.Google ScholarGoogle Scholar
  51. F. Somenzi. CUDD: CU decision diagram package, release 2.5.0, 2012. Software available from http://vlsi.colorado.edu.Google ScholarGoogle Scholar
  52. D. Syme. Leveraging .NET meta-programming components from F#: integrated queries and interoperable heterogeneous execution. In A. Kennedy and F. Pottier, editors, ML, pages 43--54. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. J. Winn and T. Minka. Probabilistic programming with Infer.NET. Machine Learning Summer School lecture notes, available at http://research.microsoft.com/ minka/papers/mlss2009/, 2009.Google ScholarGoogle Scholar

Index Terms

  1. A model-learner pattern for bayesian reasoning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 48, Issue 1
      POPL '13
      January 2013
      561 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2480359
      Issue’s Table of Contents
      • cover image ACM Conferences
        POPL '13: Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
        January 2013
        586 pages
        ISBN:9781450318327
        DOI:10.1145/2429069

      Copyright © 2013 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 January 2013

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!