skip to main content
article

Fabular: regression formulas as probabilistic programming

Published:11 January 2016Publication History
Skip Abstract Section

Abstract

Regression formulas are a domain-specific language adopted by several R packages for describing an important and useful class of statistical models: hierarchical linear regressions. Formulas are succinct, expressive, and clearly popular, so are they a useful addition to probabilistic programming languages? And what do they mean? We propose a core calculus of hierarchical linear regression, in which regression coefficients are themselves defined by nested regressions (unlike in R). We explain how our calculus captures the essence of the formula DSL found in R. We describe the design and implementation of Fabular, a version of the Tabular schema-driven probabilistic programming language, enriched with formulas based on our regression calculus. To the best of our knowledge, this is the first formal description of the core ideas of R's formula notation, the first development of a calculus of regression formulas, and the first demonstration of the benefits of composing regression formulas and latent variables in a probabilistic programming language.

References

  1. D. Bates, M. Mächler, B. Bolker, and S. Walker. Fitting Linear Mixed-Effects Models using lme4. ArXiv, 2014. arXiv:1406.5823 {stat.CO}. S. Bhat, J. Borgström, A. D. Gordon, and C. V. Russo. Deriving probability density functions from probabilistic functional programs. In N. Peterman and S. Smolka, editors, Tools and Algorithms for the Construction and Analysis of Systems (TACAS’13), volume 7795 of Lecture Notes in Computer Science, pages 508–522. Springer, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Borgström, A. D. Gordon, M. Greenberg, J. Margetson, and J. V. Gael. Measure transformer semantics for Bayesian machine learning. Logical Methods in Computer Science, 9(3), 2013. Preliminary version at ESOP’11. J. Borgström, A. D. Gordon, L. Ouyang, C. Russo, A. Ścibior, and M. Szymczak. Fabular: Regression formulas as probabilistic programming. Technical Report MSR–TR–2015–83, Microsoft Research, 2015.Google ScholarGoogle Scholar
  3. V. Dorie. Mixed Methods for Mixed Models. PhD thesis, Columbia University, 2014.Google ScholarGoogle Scholar
  4. A. Gelman and J. Hill. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, 2007.Google ScholarGoogle Scholar
  5. W. R. Gilks, A. Thomas, and D. J. Spiegelhalter. A language and program for complex Bayesian modelling. The Statistician, 43:169–178, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  6. N. Goodman, V. K. Mansinghka, D. M. Roy, K. Bonawitz, and J. B. Tenenbaum. Church: a language for generative models. In Uncertainty in Artificial Intelligence (UAI’08), pages 220–229. AUAI Press, 2008.Google ScholarGoogle Scholar
  7. N. D. Goodman. The principles and practice of probabilistic programming. In Principles of Programming Languages (POPL’13), pages 399–402, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. D. Gordon, M. Aizatulin, J. Borgström, G. Claret, T. Graepel, A. Nori, S. Rajamani, and C. Russo. A model-learner pattern for Bayesian reasoning. In Principles of Programming Languages (POPL’13), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. D. Gordon, T. Graepel, N. Rolland, C. V. Russo, J. Borgström, and J. Guiver. Tabular: a schema-driven probabilistic programming language. In Principles of Programming Languages (POPL’14), 2014a. A. D. Gordon, T. A. Henzinger, A. V. Nori, and S. K. Rajamani. Probabilistic programming. In Future of Software Engineering (FOSE 2014), pages 167–181, 2014b. A. D. Gordon, C. V. Russo, M. Szymczak, J. Borgström, N. Rolland, T. Graepel, and D. Tarlow. Probabilistic programs as spreadsheet queries. In J. Vitek, editor, Programming Languages and Systems (ESOP 2015), volume 9032 of Lecture Notes in Computer Science, pages 1–25. Springer, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Hahn. Statistical formula notation in R. URL http: //faculty.chicagobooth.edu/richard.hahn/teaching/ FormulaNotation.pdf. O. Kiselyov and C. Shan. Embedded probabilistic programming. In Conference on Domain-Specific Languages, volume 5658 of Lecture Notes in Computer Science, pages 360–384. Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Lunn, C. Jackson, N. Best, A. Thomas, and D. Spiegelhalter. The BUGS Book. CRC Press, 2013.Google ScholarGoogle Scholar
  12. V. Mansinghka, D. Selsam, and Y. Perov. Venture: a higher-order probabilistic programming platform with programmable inference. CoRR, 2014. arXiv:1404.0099v1 {cs.AI}. B. Milch, B. Marthi, S. J. Russell, D. Sontag, D. L. Ong, and A. Kolobov. Statistical Relational Learning, chapter BLOG: Probabilistic Models with Unknown Objects. MIT Press, 2007.Google ScholarGoogle Scholar
  13. T. Minka, J. Winn, J. Guiver, and A. Kannan. Infer.NET 2.3, Nov. 2009. Software available from http://research.microsoft.com/ infernet. T. P. Minka. A family of algorithms for approximate Bayesian inference. PhD thesis, Massachusetts Institute of Technology, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. F. Morandat, B. Hill, L. Osvald, and J. Vitek. Evaluating the design of the R language - objects and functions for data analysis. In J. Noble, editor, ECOOP 2012 - Object-Oriented Programming, volume 7313 of Lecture Notes in Computer Science, pages 104–131. Springer, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. V. Nori, C.-K. Hur, S. K. Rajamani, and S. Samuel. R2: An efficient MCMC sampler for probabilistic programs. In Conference on Artificial Intelligence. AAAI, July 2014.Google ScholarGoogle Scholar
  16. B. Paige and F. Wood. A compilation target for probabilistic programming languages. In ICML, 2014.Google ScholarGoogle Scholar
  17. A. Pfeffer. Figaro: An object-oriented probabilistic programming language. Technical report, Charles River Analytics, 2009.Google ScholarGoogle Scholar
  18. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2015. URL http://www.R-project.org/. S. R. Riedel, S. Singh, V. Srikumar, T. Rocktäschel, L. Visengeriyeva, and J. Noessner. WOLFE: strength reduction and approximate programming for probabilistic programming. In Statistical Relational Artificial Intelligence (StarAI 2014), volume WS-14-13 of AAAI Technical Report. The AAAI Press, 2014.Google ScholarGoogle Scholar
  19. Stan Development Team. Stan: A C++ library for probability and sampling, version 2.2, 2014a. URL http://mc-stan.org/. Stan Development Team. RStan: the R interface to Stan, version 2.5.0, 2014b. URL http://mc-stan.org/rstan.html. D. H. Stern, R. Herbrich, and T. Graepel. Matchbox: large scale online Bayesian recommendations. In J. Quemada, G. León, Y. S. Maarek, and W. Nejdl, editors, Proceedings of the 18th International Conference on World Wide Web (WWW 2009), pages 111–120. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. E. Whaley, M. Sigman, C. Neumann, N. Bwibo, D. Guthrie, R. E. Weiss, S. Alber, and S. P. Murphy. The impact of dietary intervention on the cognitive development of Kenyan school children. The Journal of Nutrition, 133(11):3965S–3971S, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  21. F. Wood, J. W. van de Meent, and V. Mansinghka. A new approach to probabilistic programming inference. In Proceedings of the 17th International conference on Artificial Intelligence and Statistics, volume 33 of JMLR Workshop and Conference Proceedings, 2014.Google ScholarGoogle Scholar
  22. arXiv:1403.0504v2 {cs.AI}.Google ScholarGoogle Scholar

Index Terms

  1. Fabular: regression formulas as probabilistic programming

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!