skip to main content
10.1145/1014052.1014122acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Discovering additive structure in black box functions

Published: 22 August 2004 Publication History

Abstract

Many automated learning procedures lack interpretability, operating effectively as a black box: providing a prediction tool but no explanation of the underlying dynamics that drive it. A common approach to interpretation is to plot the dependence of a learned function on one or two predictors. We present a method that seeks not to display the behavior of a function, but to evaluate the importance of non-additive interactions within any set of variables. Should the function be close to a sum of low dimensional components, these components can be viewed and even modeled parametrically. Alternatively, the work here provides an indication of where intrinsically high-dimensional behavior takes place.The calculations used in this paper correspond closely with the functional ANOVA decomposition; a well-developed construction in Statistics. In particular, the proposed score of interaction importance measures the loss associated with the projection of the prediction function onto a space of additive models. The algorithm runs in linear time and we present displays of the output as a graphical model of the function for interpretation purposes.

References

[1]
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Databases, 1994.
[2]
L. Breiman. Bagging predictors. Mach. Learn., 24(2):123--140, 1996.
[3]
A. Buja, D. F. Swayne, M. L. Littman, N. Dean, and H. Hofmann. Xgvis: Interactive data visualization with multidimensional scaling, 2001. http://www.research.att.com/areas/stat/xgobi/index.html.
[4]
S. E. Feinberg. The Analysis of Cross-Classified Categorical Data. MIT Press, 1980.
[5]
J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5):1189--1232, 2001.
[6]
D. Harrison and D. L. Rubinfeld. Hedonic prices and the demand for clean air. Journal of Environmental Economics and Management, 5:81--102, 1978.
[7]
W. Hoeffding. A class of statistics with asymptotically normal distributions. Annals of Mathematical Statistics, 19:293--325, 1948.
[8]
G. Hooker. Black box diagnostics and the problem of extrapolation: Extending the functional anova. Technical report, Stanford University, 2004.
[9]
T. Jiang and A. B. Owen. Quasi-regression with shrinkage. Math. Comput. Simul., 62(3-6):231--241, 2003.
[10]
R. Liu and A. B. Owen. Estimating mean dimensionality. Technical report, Stanford University, 2003.
[11]
A. B. Owen. The dimension distribution and quadrature test functions. Statistica Sinica, 13(1), 2003.
[12]
R-project. http://www.r-project.org/.
[13]
C. Roosen. Visualization and Exploration of High-Dimensional Functions Using the Functional Anova Decomposition. PhD thesis, Stanford University, 1995.
[14]
I. M. Sobol. Global sensitivity indices for nonlinear mathematical models and their monte carlo estimates. Mathematics and Computers in Simulation, 5:271--280, 2001.

Cited By

View all
  • (2024)Interaction Difference Hypothesis Test for Prediction ModelsMachine Learning and Knowledge Extraction10.3390/make60200616:2(1298-1322)Online publication date: 14-Jun-2024
  • (2024)Extending Instance Space Analysis to Algorithm Configuration SpacesProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654264(147-150)Online publication date: 14-Jul-2024
  • (2024)An interpretable machine learning methodology to generate interaction effect hypotheses from complex datasetsDecision Sciences10.1111/deci.12642Online publication date: 13-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
August 2004
874 pages
ISBN:1581138881
DOI:10.1145/1014052
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 August 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. additive models
  2. diagnostics
  3. draphical models
  4. feature selection
  5. functional ANOVA
  6. interpretation
  7. visualization

Qualifiers

  • Article

Conference

KDD04

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)154
  • Downloads (Last 6 weeks)26
Reflects downloads up to 23 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Interaction Difference Hypothesis Test for Prediction ModelsMachine Learning and Knowledge Extraction10.3390/make60200616:2(1298-1322)Online publication date: 14-Jun-2024
  • (2024)Extending Instance Space Analysis to Algorithm Configuration SpacesProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654264(147-150)Online publication date: 14-Jul-2024
  • (2024)An interpretable machine learning methodology to generate interaction effect hypotheses from complex datasetsDecision Sciences10.1111/deci.12642Online publication date: 13-Aug-2024
  • (2024)Survey on Explainable AI: Techniques, challenges and open issuesExpert Systems with Applications10.1016/j.eswa.2024.124710255(124710)Online publication date: Dec-2024
  • (2024)Scientific Inference with Interpretable Machine Learning: Analyzing Models to Learn About Real-World PhenomenaMinds and Machines10.1007/s11023-024-09691-z34:3Online publication date: 15-Jul-2024
  • (2024)Marginal effects for non-linear prediction functionsData Mining and Knowledge Discovery10.1007/s10618-023-00993-x38:5(2997-3042)Online publication date: 27-Feb-2024
  • (2024)Formal Definition of Interpretability and Explainability in XAIIntelligent Systems and Applications10.1007/978-3-031-66431-1_9(133-151)Online publication date: 31-Jul-2024
  • (2023)A new PHO-rmula for improved performance of semi-structured networksProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619626(29291-29305)Online publication date: 23-Jul-2023
  • (2023)Informed Random Forest to Model Associations of Epidemiological Priors, Government Policies, and Public MobilityMDM Policy & Practice10.1177/238146832312187168:2Online publication date: 26-Dec-2023
  • (2023)Chemical reaction motifs driving non-equilibrium behaviours in phase separating materialsJournal of The Royal Society Interface10.1098/rsif.2023.011720:208Online publication date: Nov-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media