skip to main content
research-article

Tabular: a schema-driven probabilistic programming language

Published:08 January 2014Publication History
Skip Abstract Section

Abstract

We propose a new kind of probabilistic programming language for machine learning. We write programs simply by annotating existing relational schemas with probabilistic model expressions. We describe a detailed design of our language, Tabular, complete with formal semantics and type system. A rich series of examples illustrates the expressiveness of Tabular. We report an implementation, and show evidence of the succinctness of our notation relative to current best practice. Finally, we describe and verify a transformation of Tabular schemas so as to predict missing values in a concrete database. The ability to query for missing values provides a uniform interface to a wide variety of tasks, including classification, clustering, recommendation, and ranking.

Skip Supplemental Material Section

Supplemental Material

d2_left_t6.mp4

References

  1. Y. Bachrach, T. Graepel, T. Minka, and J. Guiver. How to grade a test without knowing the answers - a Bayesian graphical model for adaptive crowdsourcing and aptitude testing. ICML '12, Omnipress 2012.Google ScholarGoogle Scholar
  2. Gordon, and Russo}BBGR12:DerivingPDFsS. Bhat, J. Borgström, A. D. Gordon, and C. V. Russo. Deriving probability density functions from probabilistic functional programs. TACAS '13, volume 7795 of phLNCS, pages 508--522. Springer, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. M. Bishop. Model-based machine learning. phPhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 371 (1984), 2013.Google ScholarGoogle Scholar
  4. et al.(2011)Borgström, Gordon, Greenberg, Margetson, and Van Gael}fun-esop11J. Borgström, A. D. Gordon, M. Greenberg, J. Margetson, and J. Van Gael. Measure transformer semantics for Bayesian machine learning. ESOP'11, volume 6602 of phLNCS, pages 77--96. Springer, 2011. Download available at http://research.microsoft.com/fun. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. and Suciu}DBLP:journals/cacm/DalviRS09N. N. Dalvi, C. Ré, and D. Suciu. Probabilistic databases: diamonds in the dirt. phCommun. ACM, 52 (7): 86--94, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Domingos and M. Richardson. Markov logic: A unifying framework for statistical relational learning. SRL2004, pages 49--54, 2004.Google ScholarGoogle Scholar
  7. L. Getoor and B. Taskar, editors. phIntroduction to Statistical Relational Learning. The MIT Press, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Getoor, N. Friedman, D. Koller, A. Pfeffer, and B. Taskar. Probabilistic relational models. IncitetGetoorTaskar2007.Google ScholarGoogle Scholar
  9. W. R. Gilks, A. Thomas, and D. J. Spiegelhalter. A language and program for complex Bayesian modelling. phThe Statistician, 43: 169--178, 1994.Google ScholarGoogle Scholar
  10. M. Giry. A categorical approach to probability theory. In B. Banaschewski, editor, phCategorical Aspects of Topology and Analysis, volume 915 of phLecture Notes in Mathematics, pages 68--85. Springer, 1982.Google ScholarGoogle Scholar
  11. N. Goodman, V. K. Mansinghka, D. M. Roy, K. Bonawitz, and J. B. Tenenbaum. Church: a language for generative models. UAI'08, pages 220--229. AUAI Press, 2008.Google ScholarGoogle Scholar
  12. Gordon, Aizatulin, Borgström, Claret, Graepel, Nori, Rajamani, and Russo}modelLearnerA. D. Gordon, M. Aizatulin, J. Borgström, G. Claret, T. Graepel, A. Nori, S. Rajamani, and C. Russo. A model-learner pattern for Bayesian reasoning. POPL '13, pages 403--416, ACM Press, 2013\natexlaba. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gordon, Graepel, Rolland, Russo, Borgström, and Guiver}TabularTechReportA. D. Gordon, T. Graepel, N. Rolland, C. Russo, J. Borgström, and J. Guiver. Tabular: A schema-driven probabilistic programming language. Technical Report MSR-TR-2013--118, Microsoft Research, 2013\natexlabb.Google ScholarGoogle Scholar
  14. R. Grosse, R. Salakhutdinov, W. T. Freeman, and J. B. Tenenbaum. Exploiting compositionality to explore a large space of model structures. UAI '12, pages 306--315. AUAI Press, 2012.Google ScholarGoogle Scholar
  15. P. Hanrahan. Analytic database technologies for a new kind of user: the data enthusiast. SIGMOD '12, pages 577--578. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Heckerman, C. Meek, and D. Koller. Probabilistic Entity-Relationship Models, PRMs, and Plate Models. IncitetGetoorTaskar2007.Google ScholarGoogle Scholar
  17. R. Herbrich, T. Minka, and T. Graepel. Trueskill$^\mboxtm$: A Bayesian skill rating system. NIPS'06, pages 569--576, MIT Press, 2007.Google ScholarGoogle Scholar
  18. M. Izbicki. Algebraic classifiers: a generic approach to fast cross-validation, online training, and parallel training. ICML 2013, phJMLR W&CP 28(3):648--656, 2013.Google ScholarGoogle Scholar
  19. O. Kiselyov and C. Shan. Embedded probabilistic programming. DSL '09, volume 5658 of phLNCS, pages 360--384. Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Koller and N. Friedman. phProbabilistic Graphical Models. The MIT Press, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. V. K. Mansinghka, T. D. Kulkarni, Y. N. Perov, and J. B. Tenenbaum. Approximate Bayesian image interpretation using generative probabilistic graphics programs. To appear NIPS'13. Available at http://arxiv.org/abs/1307.0060, 2013.Google ScholarGoogle Scholar
  22. P. Mardziel, S. Magill, M. Hicks, and M. Srivatsa. Dynamic enforcement of knowledge-based security policies. CSF'11, pages 114--128. IEEE Computer, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. McCallum, K. Schultz, and S. Singh. Factorie: Probabilistic programming via imperatively defined factor graphs. NIPS'09, pages 1249--1257. Curran Associates, 2009.Google ScholarGoogle Scholar
  24. B. Milch, B. Marthi, S. J. Russell, D. Sontag, D. L. Ong, and A. Kolobov. BLOG: Probabilistic models with unknown objects. Probabilistic, Logical and Relational Learning -- A Further Synthesis, 2005.Google ScholarGoogle Scholar
  25. T. Minka and J. M. Winn. Gates. NIPS'08, pages 1073--1080. MIT Press, 2008.Google ScholarGoogle Scholar
  26. T. Minka, J. Winn, J. Guiver, and D. Knowles. Infer.NET 2.5, 2012. Microsoft Research Cambridge. http://research.microsoft.com/infernet.Google ScholarGoogle Scholar
  27. J. Neville and D. Jensen. Relational dependency networks. phJournal of Machine Learning Research, 8 (8): 653--692, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. K. Nowicki and T. A. B. Snijders. Estimation and prediction for stochastic blockstructures. phJ. Amer. Statist. Assoc., 96: 1077--1087, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  29. A. Pfeffer. The design and implementation of IBAL: A general-purpose probabilistic language. IncitetGetoorTaskar2007.Google ScholarGoogle Scholar
  30. A. Pfeffer. Figaro: An object-oriented probabilistic programming language. Technical report, Charles River Analytics, 2009.Google ScholarGoogle Scholar
  31. N. Ramsey and A. Pfeffer. Stochastic lambda calculus and monads of probability distributions. POPL '02, pages 154--165. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. Shafto, C. Kemp, V. Mansinghka, M. Gordon, and J. B. Tenenbaum. Learning cross-cutting systems of categories. CogSci '06, pages 2146--2151. Cognitive Science Society, 2006.Google ScholarGoogle Scholar
  33. S. Singh and T. Graepel. Compiling relational database schemata into probabilistic graphical models. phCoRR, abs/1212.0967, 2012.Google ScholarGoogle Scholar
  34. 011)}pqlJ. Van Gael. PQL--probabilistic query language. Blog post available at http://jvangael.github.io/2011/05/12/pqla-probabilistic-query-language/, May 2011.Google ScholarGoogle Scholar
  35. D. Z. Wang, E. Michelakis, M. Garofalakis, and J. M. Hellerstein. Bayesstore: managing large, uncertain data repositories with probabilistic graphical models. phProc. VLDB Endow., 1 (1): 340--351, Aug. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. er, and Siskind}DBLP:conf/nips/WingateGSS11D. Wingate, N. D. Goodman, A. Stuhlmüller, and J. M. Siskind. Nonstandard interpretations of probabilistic programs for efficient inference. NIPS '11, pages 1152--1160, 2011.Google ScholarGoogle Scholar

Index Terms

  1. Tabular: a schema-driven probabilistic programming language

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!