skip to main content
10.1145/3447548.3467387acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

ControlBurn: Feature Selection by Sparse Forests

Published: 14 August 2021 Publication History

Abstract

Tree ensembles distribute feature importance evenly amongst groups of correlated features. The average feature ranking of the correlated group is suppressed, which reduces interpretability and complicates feature selection. In this paper we present ControlBurn, a feature selection algorithm that uses a weighted LASSO-based feature selection method to prune unnecessary features from tree ensembles, just as low-intensity fire reduces overgrown vegetation. Like the linear LASSO, ControlBurn assigns all the feature importance of a correlated group of features to a single feature. Moreover, the algorithm is efficient and only requires a single training iteration to run, unlike iterative wrapper-based feature selection methods. We show that ControlBurn performs substantially better than feature selection methods with comparable computational costs on datasets with correlated features.

References

[1]
Ambusaidi, M. A., He, X., Nanda, P., and Tan, Z. Building an intrusion detection system using a filter-based feature selection algorithm. IEEE transactions on computers 65, 10 (2016), 2986--2998.
[2]
ApS, M. MOSEK Fusion API for Python 9.2.37, 2021.
[3]
Bertsimas, D., King, A., Mazumder, R., et al. Best subset selection via a modern optimization lens. Annals of statistics 44, 2 (2016), 813--852.
[4]
Blum, A. L., and Langley, P. Selection of relevant features and examples in machine learning. Artificial intelligence 97, 1--2 (1997), 245--271.
[5]
Brieman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. Classification and regression trees. wadsworth. Inc. Monterey, California, USA (1984).
[6]
Chandrashekar, G., and Sahin, F. A survey on feature selection methods. Computers & Electrical Engineering 40, 1 (2014), 16--28.
[7]
Chen, T., and Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (2016), pp. 785--794.
[8]
Chzhen, E., Hebiri, M., Salmon, J., et al. On lasso refitting strategies. Bernoulli 25, 4A (2019), 3175--3200.
[9]
Cortes, C., and Vapnik, V. Support-vector networks. Machine learning 20, 3 (1995), 273--297.
[10]
Diamond, S., and Boyd, S. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research 17, 83 (2016), 1--5.
[11]
Dua, D., and Graff, C. UCI machine learning repository, 2017.
[12]
Freund, Y., Schapire, R., and Abe, N. A short introduction to boosting. Journal-Japanese Society For Artificial Intelligence 14, 771--780 (1999), 1612.
[13]
Friedman, J., Hastie, T., and Tibshirani, R. The elements of statistical learning, vol. 1. Springer series in statistics New York, 2001.
[14]
Friedman, J. H. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.
[15]
Howell, N. Demography of the dobe! kung, 2nd edn. new brunswick, 2000.
[16]
Langley, P., et al. Selection of relevant features in machine learning. In Proceedings of the AAAI Fall symposium on relevance (1994), vol. 184, Citeseer, pp. 245--271.
[17]
Lee, I.-H., Lushington, G. H., and Visvanathan, M. A filter-based feature selection approach for identifying potential biomarkers for lung cancer. Journal of clinical Bioinformatics 1, 1 (2011), 11.
[18]
Lee, S.-I., Lee, H., Abbeel, P., and Ng, A. Y. Efficient l 1 regularized logistic regression. In Aaai (2006), vol. 6, pp. 401--408.
[19]
Lou, Y., Caruana, R., Gehrke, J., and Hooker, G. Accurate intelligible models with pairwise interactions. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (2013), pp. 623--631.
[20]
Lundberg, S., and Lee, S.-I. A unified approach to interpreting model predictions. arXiv preprint arXiv:1705.07874 (2017).
[21]
Mafarja, M., and Mirjalili, S. Whale optimization approaches for wrapper feature selection. Applied Soft Computing 62 (2018), 441--453.
[22]
Maldonado, S., and Weber, R. A wrapper method for feature selection using support vector machines. Information Sciences 179, 13 (2009), 2208--2217.
[23]
Münch, M. M., Peeters, C. F., Van Der Vaart, A. W., and Van De Wiel, M. A. Adaptive group-regularized logistic elastic net regression. Biostatistics (2018).
[24]
Nori, H., Jenkins, S., Koch, P., and Caruana, R. Interpretml: A unified framework for machine learning interpretability. arXiv preprint arXiv:1909.09223 (2019).
[25]
Olson, R. S., La Cava, W., Orzechowski, P., Urbanowicz, R. J., and Moore, J. H. Pmlb: a large benchmark suite for machine learning evaluation and comparison. BioData mining 10, 1 (2017), 1--13.
[26]
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.
[27]
Pilanci, M., and Wainwright, M. J. Randomized sketches of convex programs with sharp guarantees. IEEE Transactions on Information Theory 61, 9 (2015), 5096--5115.
[28]
Ribeiro, M. T., Singh, S., and Guestrin, C. " why should i trust you?" explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (2016), pp. 1135--1144.
[29]
Scornet, E. Trees, forests, and impurity-based variable importance, 2020.
[30]
Strobl, C., Boulesteix, A.-L., Zeileis, A., and Hothorn, T. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC bioinformatics 8, 1 (2007), 25.
[31]
Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58, 1 (1996), 267--288.
[32]
West, M. M., Blanchette, C. C., Dressman, H. H., Huang, E. E., Ishida, S. S., Spang, R. R., Zuzan, H. H., Olson, J. A. J., Marks, J. R. J., and Nevins, J. R. J. Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of the National Academy of Sciences of the United States of America 98, 20 (Sept. 2001), 11462--11467.
[33]
Xu, H., Caramanis, C., and Mannor, S. Sparse algorithms are not stable: A no-free-lunch theorem. IEEE transactions on pattern analysis and machine intelligence 34, 1 (2011), 187--193.
[34]
Zhou, Z., and Hooker, G. Unbiased measurement of feature importance in tree-based methods. arXiv preprint arXiv:1903.05179 (2019).
[35]
Zhou, Z., Hooker, G., and Wang, F. S-lime: Stabilized-lime for model explanation. In Proceedings of the 27th ACM SIGKDD international conference on knowledge discovery and data mining (2021).
[36]
Zou, H., and Hastie, T. Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology) 67, 2 (2005), 301--320.

Cited By

View all
  • (2024)FAST: An Optimization Framework for Fast Additive Segmentation in Transparent MLProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671996(1863-1874)Online publication date: 25-Aug-2024
  • (2024)Online Feature Selection With Varying Feature SpacesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337724336:9(4806-4819)Online publication date: Sep-2024
  • (2023)XAI-MethylMarker: Explainable AI approach for biomarker discovery for breast cancer subtype classification using methylation dataExpert Systems with Applications10.1016/j.eswa.2023.120130225(120130)Online publication date: Sep-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
August 2021
4259 pages
ISBN:9781450383325
DOI:10.1145/3447548
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2021

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

KDD '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)5
Reflects downloads up to 23 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)FAST: An Optimization Framework for Fast Additive Segmentation in Transparent MLProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671996(1863-1874)Online publication date: 25-Aug-2024
  • (2024)Online Feature Selection With Varying Feature SpacesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337724336:9(4806-4819)Online publication date: Sep-2024
  • (2023)XAI-MethylMarker: Explainable AI approach for biomarker discovery for breast cancer subtype classification using methylation dataExpert Systems with Applications10.1016/j.eswa.2023.120130225(120130)Online publication date: Sep-2023

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media