skip to main content
10.1145/2487575.2487579acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
poster

Accurate intelligible models with pairwise interactions

Published: 11 August 2013 Publication History

Abstract

Standard generalized additive models (GAMs) usually model the dependent variable as a sum of univariate models. Although previous studies have shown that standard GAMs can be interpreted by users, their accuracy is significantly less than more complex models that permit interactions.
In this paper, we suggest adding selected terms of interacting pairs of features to standard GAMs. The resulting models, which we call GA2{M}$-models, for Generalized Additive Models plus Interactions, consist of univariate terms and a small number of pairwise interaction terms. Since these models only include one- and two-dimensional components, the components of GA2M-models can be visualized and interpreted by users. To explore the huge (quadratic) number of pairs of features, we develop a novel, computationally efficient method called FAST for ranking all possible pairs of features as candidates for inclusion into the model.
In a large-scale empirical study, we show the effectiveness of FAST in ranking candidate pairs of features. In addition, we show the surprising result that GA2M-models have almost the same performance as the best full-complexity models on a number of real datasets. Thus this paper postulates that for many problems, GA2M-models can yield models that are both intelligible and accurate.

References

[1]
http://www.liaad.up.pt/~ltorgo/Regression/DataSets.html.
[2]
http://www.cs.toronto.edu/~delve/data/datasets.html.
[3]
http://research.microsoft.com/en-us/projects/mslr/.
[4]
http://archive.ics.uci.edu/ml/.
[5]
http://www.nipsfsc.ecs.soton.ac.uk/.
[6]
http://osmot.cs.cornell.edu/kddcup/.
[7]
http://www-stat.stanford.edu/~jhf/R-RuleFit.html.
[8]
http://additivegroves.net.
[9]
E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine learning, 36(1):105--139, 1999.
[10]
J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29:1189--1232, 2001.
[11]
J. Friedman and B. Popescu. Predictive learning via rule ensembles. The Annals of Applied Statistics, pages 916--954, 2008.
[12]
I. Guyon and A. Elisseeff. An introduction to variable and feature selection. The Journal of Machine Learning Research, 3:1157--1182, 2003.
[13]
T. Hastie and R. Tibshirani. Generalized additive models. Chapman & Hall/CRC, 1990.
[14]
G. Hooker. Discovering additive structure in black box functions. In KDD, 2004.
[15]
G. Hooker. Generalized functional anova diagnostics for high-dimensional functions of dependent variables. Journal of Computational and Graphical Statistics, 16(3):709--732, 2007.
[16]
R. Kelley Pace and R. Barry. Sparse spatial autoregressions. Statistics & Probability Letters, 33(3):291--297, 1997.
[17]
P. Li, C. Burges, and Q. Wu. Mcrank: Learning to rank using multiple classification and gradient boosting. In NIPS, 2007.
[18]
W. Loh. Regression trees with unbiased variable selection and interaction detection. Statistica Sinica, 12(2):361--386, 2002.
[19]
Y. Lou, R. Caruana, and J. Gehrke. Intelligible models for classification and regression. In KDD, 2012.
[20]
C. D. Manning, P. Raghavan, and H. Schütze. Introduction to information retrieval. Cambridge University Press Cambridge, 2008.
[21]
D. Sorokina, R. Caruana, and M. Riedewald. Additive groves of regression trees. In ECML, 2007.
[22]
D. Sorokina, R. Caruana, M. Riedewald, and D. Fink. Detecting statistical interactions with additive groves of trees. In ICML, 2008.
[23]
S. M. Weiss and N. Indurkhya. Rule-based machine learning methods for functional prediction. Journal of Artificial Intelligence Research, 3:383--403, 1995.
[24]
S. Wood. Thin plate regression splines. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(1):95--114, 2003.
[25]
S. Wood. Generalized additive models: an introduction with R. CRC Press, 2006.

Cited By

View all
  • (2024)On marginal feature attributions of tree-based modelsFoundations of Data Science10.3934/fods.2024021(0-0)Online publication date: 2024
  • (2024)Metabolic Insight into Glioma Heterogeneity: Mapping Whole Exome Sequencing to In Vivo Imaging with Stereotactic Localization and Deep LearningMetabolites10.3390/metabo1406033714:6(337)Online publication date: 16-Jun-2024
  • (2024)Combining the Strengths of the Explainable Boosting Machine and Metabolomics Approaches for Biomarker Discovery in Acute Myocardial InfarctionDiagnostics10.3390/diagnostics1413135314:13(1353)Online publication date: 26-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2013
1534 pages
ISBN:9781450321747
DOI:10.1145/2487575
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 August 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. classification
  2. interaction detection
  3. regression

Qualifiers

  • Poster

Conference

KDD' 13
Sponsor:

Acceptance Rates

KDD '13 Paper Acceptance Rate 125 of 726 submissions, 17%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)253
  • Downloads (Last 6 weeks)29
Reflects downloads up to 23 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)On marginal feature attributions of tree-based modelsFoundations of Data Science10.3934/fods.2024021(0-0)Online publication date: 2024
  • (2024)Metabolic Insight into Glioma Heterogeneity: Mapping Whole Exome Sequencing to In Vivo Imaging with Stereotactic Localization and Deep LearningMetabolites10.3390/metabo1406033714:6(337)Online publication date: 16-Jun-2024
  • (2024)Combining the Strengths of the Explainable Boosting Machine and Metabolomics Approaches for Biomarker Discovery in Acute Myocardial InfarctionDiagnostics10.3390/diagnostics1413135314:13(1353)Online publication date: 26-Jun-2024
  • (2024)Interpretable (not just posthoc-explainable) medical claims modeling for discharge placement to reduce preventable all-cause readmissions or deathPLOS ONE10.1371/journal.pone.030287119:5(e0302871)Online publication date: 9-May-2024
  • (2024)Machine Learning Detects Symptomatic Plaques in Patients With Carotid Atherosclerosis on CT AngiographyCirculation: Cardiovascular Imaging10.1161/CIRCIMAGING.123.01627417:6Online publication date: Jun-2024
  • (2024)Summarized Causal Explanations For Aggregate ViewsProceedings of the ACM on Management of Data10.1145/36393282:1(1-27)Online publication date: 26-Mar-2024
  • (2024)Bivariate Decision Trees: Smaller, Interpretable, More AccurateProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671903(1336-1347)Online publication date: 25-Aug-2024
  • (2024)Impact Charts: A Tool for Identifying Systematic Bias in Social Systems and DataProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3658965(1187-1198)Online publication date: 3-Jun-2024
  • (2024)A Comparative Study of Machine Learning Algorithms Using Explainable Artificial Intelligence System for Predicting Liver DiseaseComputing Open10.1142/S2972370123500034Online publication date: 16-Feb-2024
  • (2024)An interpretable machine learning methodology to generate interaction effect hypotheses from complex datasetsDecision Sciences10.1111/deci.12642Online publication date: 13-Aug-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media