skip to main content
10.1145/3412815.3416893acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfodsConference Proceedingsconference-collections
research-article

Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable

Published: 18 October 2020 Publication History

Abstract

Ensembles of decision trees perform well on many problems, but are not interpretable. In contrast to existing approaches in interpretability that focus on explaining relationships between features and predictions, we propose an alternative approach to interpret tree ensemble classifiers by surfacing representative points for each class -- prototypes. We introduce a new distance for Gradient Boosted Tree models, and propose new, adaptive prototype selection methods with theoretical guarantees, with the flexibility to choose a different number of prototypes in each class. We demonstrate our methods on random forests and gradient boosted trees, showing that the prototypes can perform as well as or even better than the original tree ensemble when used as a nearest-prototype classifier. In a user study, humans were better at predicting the output of a tree ensemble classifier when using prototypes than when using Shapley values, a popular feature attribution method. Hence, prototypes present a viable alternative to feature-based explanations for tree ensembles.

References

[1]
Umang Bhatt, Alice Xiang, Shubham Sharma, Adrian Weller, Ankur Taly, Yunhan Jia, Joydeep Ghosh, Ruchir Puri, Jose MF Moura, and Peter Eckersley. 2020. Explainable machine learning in deployment. In FAT* .
[2]
Jacob Bien and Robert Tibshirani. 2011. Prototype selection for interpretable classification. The Annals of Applied Statistics (2011).
[3]
Leo Breiman. 2001. Random forests. Machine Learning, Vol. 45, 1 (2001).
[4]
Leo Breiman and Adele Cutler. 2002. Random Forests Manual. https://www.stat.berkeley.edu/ breiman/RandomForests . Accessed July 6, 2019. Year 2002 based on copyright year indicated in the authors' Fortran code.
[5]
Kay Henning Brodersen, Cheng Soon Ong, Klaas Enno Stephan, and Joachim M Buhmann. 2010. The Balanced Accuracy and Its Posterior Distribution. In International Conference on Pattern Recognition .
[6]
Rich Caruana, Hooshang Kangarloo, John David Dionisio, Usha Sinha, and David Johnson. 1999. Case-based explanation of non-case-based learning methods. In Proceedings of the AMIA Symposium . American Medical Informatics Association.
[7]
Rich Caruana and Alexandru Niculescu-Mizil. 2006. An Empirical Comparison of Supervised Learning Algorithms. In ICML .
[8]
Finale Doshi-Velez and Been Kim. 2017. Towards A Rigorous Science of Interpretable Machine Learning. arXiv preprint arXiv:1702.08608 (2017).
[9]
Julia Dressel and Hany Farid. 2018. The accuracy, fairness, and limits of predicting recidivism. Science Advances (2018). Data accessed from www.cs.dartmouth.edu/farid/downloads/publications/scienceadvances17.
[10]
Alex A Freitas. 2014. Comprehensible classification models: a position paper. ACM SIGKDD Explorations (2014).
[11]
Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of Statistics (2001).
[12]
Salvador Garcia, Joaquin Derrac, Jose Ramon Cano, and Francisco Herrera. 2011. Prototype selection for nearest neighbor classification: Taxonomy and empirical study. TPAMI (2011).
[13]
Ryan Gomes and Andreas Krause. 2010. Budgeted Nonparametric Learning from Data Streams. In ICML .
[14]
Joseph K Goodman, Cynthia E Cryder, and Amar Cheema. 2013. Data Collection in a Flat World: The Strengths and Weaknesses of Mechanical Turk Samples. Journal of Behavioral Decision Making, Vol. 26 (2013).
[15]
Ben Green and Yiling Chen. 2019. Disparate interactions: An algorithm-in-the-loop analysis of fairness in risk assessments. In FAT* .
[16]
Gregory Griffin, Alex Holub, and Pietro Perona. 2006. Caltech-256 Object Category Dataset. Technical Report, California Institute of Technology (2006).
[17]
Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality Reduction by Learning an Invariant Mapping. In CVPR .
[18]
Satoshi Hara and Kohei Hayashi. 2018. Making tree ensembles interpretable: A Bayesian Model Selection Approach. In AISTATS .
[19]
Peter Hart. 1968. The condensed nearest neighbor rule. IEEE Transactions on Information Theory (1968).
[20]
Hemant Ishwaran. 2007. Variable importance in binary regression trees and forests. Electronic Journal of Statistics (2007).
[21]
Leonard Kaufman and Peter J Rousseeuw. 1987. Clustering by means of medoids. In Statistical Data Analysis Based on the L1 Norm. Birkhäuser Basel.
[22]
Rajiv Khanna, Been Kim, Joydeep Ghosh, and Oluwasanmi Koyejo. 2019. Interpreting black box predictions using fisher kernels. In AISTATS .
[23]
Been Kim, Rajiv Khanna, and Oluwasanmi O Koyejo. 2016. Examples are not enough, learn to criticize! criticism for interpretability. In NIPS .
[24]
Been Kim, Cynthia Rudin, and Julie A Shah. 2014. The Bayesian Case Model: A generative approach for case-based reasoning and prototype classification. In NIPS .
[25]
Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. In ICML .
[26]
Vivian Lai and Chenhao Tan. 2019. On human predictions with explanations and predictions of machine learning models: A case study on deception detection. In FAT* .
[27]
Himabindu Lakkaraju and Osbert Bastani. 2020. " How do I fool you?" Manipulating User Trust via Misleading Black Box Explanations. In AIES .
[28]
Jeff Larson, Surya Mattu, Lauren Kirchner, and Julia Angwin. 2016. How We Analyzed the COMPAS Recidivism Algorithm. ProPublica.
[29]
Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. In Proceedings of the IEEE .
[30]
Oscar Li, Hao Liu, Chaofan Chen, and Cynthia Rudin. 2018. Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions. In AAAI .
[31]
Andy Liaw and Matthew Wiener. 2002. Classification and regression by randomForest. R News (2002).
[32]
Hui Lin and Jeff Bilmes. 2011. A class of submodular functions for document summarization. In ACL .
[33]
Yi Lin and Yongho Jeon. 2006. Random forests and adaptive nearest neighbors. J. Amer. Statist. Assoc. (2006).
[34]
Zachary C. Lipton. 2016. The Mythos of Model Interpretability. arXiv preprint arXiv:1606.03490 (2016).
[35]
Gilles Louppe. 2014. Understanding random forests: From theory to practice. arXiv preprint arXiv:1407.7502 (2014).
[36]
Scott M Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. 2020. From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence, Vol. 2 (2020).
[37]
Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In NIPS .
[38]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. JMLR (2008).
[39]
Baharan Mirzasoleiman, Amin Karbasi, Rik Sarkar, and Andreas Krause. 2013. Distributed submodular maximization: Identifying representative elements in massive data. In NIPS .
[40]
George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. 1978. An analysis of approximations for maximizing submodular set functions?I. Mathematical Programming (1978).
[41]
Christos H Papadimitriou. 1981. Worst-Case and Probabilistic Analysis of a Geometric Location Problem. SIAM J. Comput. (1981).
[42]
Forough Poursabzi-Sangdeh, Daniel G Goldstein, Jake M Hofman, Jennifer Wortman Vaughan, and Hanna Wallach. 2018. Manipulating and measuring model interpretability. arXiv preprint arXiv:1802.07810 (2018).
[43]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. “Why Should I Trust You?": Explaining the Predictions of Any Classifier. In KDD .
[44]
Michael M Richter and Agnar Aamodt. 2005. Case-based reasoning foundations. The Knowledge Engineering Review (2005).
[45]
Erwan Scornet. 2016. Random forests and kernel methods. IEEE Transactions on Information Theory (2016).
[46]
Tao Shi and Steve Horvath. 2006. Unsupervised learning with random forest predictors. Journal of Computational and Graphical Statistics (2006).
[47]
Daniel J Stekhoven. 2015. missForest: Nonparametric missing value imputation using random forest. Astrophysics Source Code Library (2015).
[48]
Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2017. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harvard Journal of Law & Technology, Vol. 31, 2 (2017).
[49]
Dennis L Wilson. 1972. Asymptotic properties of nearest neighbor rules using edited data sets. Transactions on Systems, Man and Cybernetics (1972).
[50]
Caiming Xiong, David Johnson, Ran Xu, and Jason J Corso. 2012. Random forests for metric learning with implicit pairwise position dependence. In KDD .
[51]
Chih-Kuan Yeh, Joon Kim, Ian En-Hsu Yen, and Pradeep K Ravikumar. 2018. Representer point selection for explaining deep neural networks. In NeurIPS .
[52]
Peng Zhao, Xiaogang Su, Tingting Ge, and Juanjuan Fan. 2016. Propensity score and proximity matching using random forest. Contemporary Clinical Trials (2016).
[53]
Qi-Feng Zhou, Hao Zhou, Yong-Peng Ning, Fan Yang, and Tao Li. 2015. Two approaches for novelty detection using random forest. Expert Systems with Applications (2015).
[54]
Yichen Zhou, Zhengze Zhou, and Giles Hooker. 2018. Approximation trees: Statistical stability in model distillation. arXiv preprint arXiv:1808.07573 (2018).

Cited By

View all
  • (2024)Survey on Explainable AI: Techniques, challenges and open issuesExpert Systems with Applications10.1016/j.eswa.2024.124710255(124710)Online publication date: Dec-2024
  • (2024)Unboxing Tree Ensembles for interpretability: a hierarchical visualization tool and a multivariate optimal re-built treeEURO Journal on Computational Optimization10.1016/j.ejco.2024.100084(100084)Online publication date: Jan-2024
  • (2024)Example-Based Explanations of Random Forest PredictionsAdvances in Intelligent Data Analysis XXII10.1007/978-3-031-58553-1_15(185-196)Online publication date: 16-Apr-2024
  • Show More Cited By

Index Terms

  1. Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      FODS '20: Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference
      October 2020
      196 pages
      ISBN:9781450381031
      DOI:10.1145/3412815
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 October 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. interpretability
      2. prototypes
      3. tree ensemble classifiers

      Qualifiers

      • Research-article

      Funding Sources

      • National Science Foundation
      • National Institutes of Health

      Conference

      FODS '20
      FODS '20: ACM-IMS Foundations of Data Science Conference
      October 19 - 20, 2020
      Virtual Event, USA

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)60
      • Downloads (Last 6 weeks)8
      Reflects downloads up to 23 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Survey on Explainable AI: Techniques, challenges and open issuesExpert Systems with Applications10.1016/j.eswa.2024.124710255(124710)Online publication date: Dec-2024
      • (2024)Unboxing Tree Ensembles for interpretability: a hierarchical visualization tool and a multivariate optimal re-built treeEURO Journal on Computational Optimization10.1016/j.ejco.2024.100084(100084)Online publication date: Jan-2024
      • (2024)Example-Based Explanations of Random Forest PredictionsAdvances in Intelligent Data Analysis XXII10.1007/978-3-031-58553-1_15(185-196)Online publication date: 16-Apr-2024
      • (2023)Exploring Evaluation Methods for Interpretable Machine Learning: A SurveyInformation10.3390/info1408046914:8(469)Online publication date: 21-Aug-2023
      • (2023)A Survey on Explainable Anomaly DetectionACM Transactions on Knowledge Discovery from Data10.1145/360933318:1(1-54)Online publication date: 6-Sep-2023
      • (2023)Explainable Activity Recognition for Smart Home SystemsACM Transactions on Interactive Intelligent Systems10.1145/356153313:2(1-39)Online publication date: 5-May-2023
      • (2023)Trustworthy AI: From Principles to PracticesACM Computing Surveys10.1145/355580355:9(1-46)Online publication date: 16-Jan-2023
      • (2023)An overview of XAI Algorithms2023 International Automatic Control Conference (CACS)10.1109/CACS60074.2023.10326174(1-5)Online publication date: 26-Oct-2023
      • (2023)Considerations when learning additive explanations for black-box modelsMachine Learning10.1007/s10994-023-06335-8Online publication date: 19-Jun-2023
      • (2023)Benchmarking and survey of explanation methods for black box modelsData Mining and Knowledge Discovery10.1007/s10618-023-00933-937:5(1719-1778)Online publication date: 3-Jun-2023
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media