Abstract

We propose a new kind of probabilistic programming language for machine learning. We write programs simply by annotating existing relational schemas with probabilistic model expressions. We describe a detailed design of our language, Tabular, complete with formal semantics and type system. A rich series of examples illustrates the expressiveness of Tabular. We report an implementation, and show evidence of the succinctness of our notation relative to current best practice. Finally, we describe and verify a transformation of Tabular schemas so as to predict missing values in a concrete database. The ability to query for missing values provides a uniform interface to a wide variety of tasks, including classification, clustering, recommendation, and ranking.
Supplemental Material
- Y. Bachrach, T. Graepel, T. Minka, and J. Guiver. How to grade a test without knowing the answers - a Bayesian graphical model for adaptive crowdsourcing and aptitude testing. ICML '12, Omnipress 2012.Google Scholar
- Gordon, and Russo}BBGR12:DerivingPDFsS. Bhat, J. Borgström, A. D. Gordon, and C. V. Russo. Deriving probability density functions from probabilistic functional programs. TACAS '13, volume 7795 of phLNCS, pages 508--522. Springer, 2013. Google Scholar
Digital Library
- C. M. Bishop. Model-based machine learning. phPhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 371 (1984), 2013.Google Scholar
- et al.(2011)Borgström, Gordon, Greenberg, Margetson, and Van Gael}fun-esop11J. Borgström, A. D. Gordon, M. Greenberg, J. Margetson, and J. Van Gael. Measure transformer semantics for Bayesian machine learning. ESOP'11, volume 6602 of phLNCS, pages 77--96. Springer, 2011. Download available at http://research.microsoft.com/fun. Google Scholar
Digital Library
- and Suciu}DBLP:journals/cacm/DalviRS09N. N. Dalvi, C. Ré, and D. Suciu. Probabilistic databases: diamonds in the dirt. phCommun. ACM, 52 (7): 86--94, 2009. Google Scholar
Digital Library
- P. Domingos and M. Richardson. Markov logic: A unifying framework for statistical relational learning. SRL2004, pages 49--54, 2004.Google Scholar
- L. Getoor and B. Taskar, editors. phIntroduction to Statistical Relational Learning. The MIT Press, 2007. Google Scholar
Digital Library
- L. Getoor, N. Friedman, D. Koller, A. Pfeffer, and B. Taskar. Probabilistic relational models. IncitetGetoorTaskar2007.Google Scholar
- W. R. Gilks, A. Thomas, and D. J. Spiegelhalter. A language and program for complex Bayesian modelling. phThe Statistician, 43: 169--178, 1994.Google Scholar
- M. Giry. A categorical approach to probability theory. In B. Banaschewski, editor, phCategorical Aspects of Topology and Analysis, volume 915 of phLecture Notes in Mathematics, pages 68--85. Springer, 1982.Google Scholar
- N. Goodman, V. K. Mansinghka, D. M. Roy, K. Bonawitz, and J. B. Tenenbaum. Church: a language for generative models. UAI'08, pages 220--229. AUAI Press, 2008.Google Scholar
- Gordon, Aizatulin, Borgström, Claret, Graepel, Nori, Rajamani, and Russo}modelLearnerA. D. Gordon, M. Aizatulin, J. Borgström, G. Claret, T. Graepel, A. Nori, S. Rajamani, and C. Russo. A model-learner pattern for Bayesian reasoning. POPL '13, pages 403--416, ACM Press, 2013\natexlaba. Google Scholar
Digital Library
- Gordon, Graepel, Rolland, Russo, Borgström, and Guiver}TabularTechReportA. D. Gordon, T. Graepel, N. Rolland, C. Russo, J. Borgström, and J. Guiver. Tabular: A schema-driven probabilistic programming language. Technical Report MSR-TR-2013--118, Microsoft Research, 2013\natexlabb.Google Scholar
- R. Grosse, R. Salakhutdinov, W. T. Freeman, and J. B. Tenenbaum. Exploiting compositionality to explore a large space of model structures. UAI '12, pages 306--315. AUAI Press, 2012.Google Scholar
- P. Hanrahan. Analytic database technologies for a new kind of user: the data enthusiast. SIGMOD '12, pages 577--578. ACM, 2012. Google Scholar
Digital Library
- D. Heckerman, C. Meek, and D. Koller. Probabilistic Entity-Relationship Models, PRMs, and Plate Models. IncitetGetoorTaskar2007.Google Scholar
- R. Herbrich, T. Minka, and T. Graepel. Trueskill$^\mboxtm$: A Bayesian skill rating system. NIPS'06, pages 569--576, MIT Press, 2007.Google Scholar
- M. Izbicki. Algebraic classifiers: a generic approach to fast cross-validation, online training, and parallel training. ICML 2013, phJMLR W&CP 28(3):648--656, 2013.Google Scholar
- O. Kiselyov and C. Shan. Embedded probabilistic programming. DSL '09, volume 5658 of phLNCS, pages 360--384. Springer, 2009. Google Scholar
Digital Library
- D. Koller and N. Friedman. phProbabilistic Graphical Models. The MIT Press, 2009. Google Scholar
Digital Library
- V. K. Mansinghka, T. D. Kulkarni, Y. N. Perov, and J. B. Tenenbaum. Approximate Bayesian image interpretation using generative probabilistic graphics programs. To appear NIPS'13. Available at http://arxiv.org/abs/1307.0060, 2013.Google Scholar
- P. Mardziel, S. Magill, M. Hicks, and M. Srivatsa. Dynamic enforcement of knowledge-based security policies. CSF'11, pages 114--128. IEEE Computer, 2011. Google Scholar
Digital Library
- A. McCallum, K. Schultz, and S. Singh. Factorie: Probabilistic programming via imperatively defined factor graphs. NIPS'09, pages 1249--1257. Curran Associates, 2009.Google Scholar
- B. Milch, B. Marthi, S. J. Russell, D. Sontag, D. L. Ong, and A. Kolobov. BLOG: Probabilistic models with unknown objects. Probabilistic, Logical and Relational Learning -- A Further Synthesis, 2005.Google Scholar
- T. Minka and J. M. Winn. Gates. NIPS'08, pages 1073--1080. MIT Press, 2008.Google Scholar
- T. Minka, J. Winn, J. Guiver, and D. Knowles. Infer.NET 2.5, 2012. Microsoft Research Cambridge. http://research.microsoft.com/infernet.Google Scholar
- J. Neville and D. Jensen. Relational dependency networks. phJournal of Machine Learning Research, 8 (8): 653--692, 2007. Google Scholar
Digital Library
- K. Nowicki and T. A. B. Snijders. Estimation and prediction for stochastic blockstructures. phJ. Amer. Statist. Assoc., 96: 1077--1087, 2001.Google Scholar
Cross Ref
- A. Pfeffer. The design and implementation of IBAL: A general-purpose probabilistic language. IncitetGetoorTaskar2007.Google Scholar
- A. Pfeffer. Figaro: An object-oriented probabilistic programming language. Technical report, Charles River Analytics, 2009.Google Scholar
- N. Ramsey and A. Pfeffer. Stochastic lambda calculus and monads of probability distributions. POPL '02, pages 154--165. ACM, 2002. Google Scholar
Digital Library
- P. Shafto, C. Kemp, V. Mansinghka, M. Gordon, and J. B. Tenenbaum. Learning cross-cutting systems of categories. CogSci '06, pages 2146--2151. Cognitive Science Society, 2006.Google Scholar
- S. Singh and T. Graepel. Compiling relational database schemata into probabilistic graphical models. phCoRR, abs/1212.0967, 2012.Google Scholar
- 011)}pqlJ. Van Gael. PQL--probabilistic query language. Blog post available at http://jvangael.github.io/2011/05/12/pqla-probabilistic-query-language/, May 2011.Google Scholar
- D. Z. Wang, E. Michelakis, M. Garofalakis, and J. M. Hellerstein. Bayesstore: managing large, uncertain data repositories with probabilistic graphical models. phProc. VLDB Endow., 1 (1): 340--351, Aug. 2008. Google Scholar
Digital Library
- er, and Siskind}DBLP:conf/nips/WingateGSS11D. Wingate, N. D. Goodman, A. Stuhlmüller, and J. M. Siskind. Nonstandard interpretations of probabilistic programs for efficient inference. NIPS '11, pages 1152--1160, 2011.Google Scholar
Index Terms
Tabular: a schema-driven probabilistic programming language
Recommendations
Tabular: a schema-driven probabilistic programming language
POPL '14: Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming LanguagesWe propose a new kind of probabilistic programming language for machine learning. We write programs simply by annotating existing relational schemas with probabilistic model expressions. We describe a detailed design of our language, Tabular, complete ...
Fabular: regression formulas as probabilistic programming
POPL '16Regression formulas are a domain-specific language adopted by several R packages for describing an important and useful class of statistical models: hierarchical linear regressions. Formulas are succinct, expressive, and clearly popular, so are they a ...
What are the Odds?: probabilistic programming in Scala
SCALA '13: Proceedings of the 4th Workshop on ScalaProbabilistic programming is a powerful high-level paradigm for probabilistic modeling and inference. We present Odds, a small domain-specific language (DSL) for probabilistic programming, embedded in Scala. Odds provides first-class support for random ...







Comments