skip to main content
10.1145/3397271.3401229acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Evidence Weighted Tree Ensembles for Text Classification

Published: 25 July 2020 Publication History

Abstract

Text documents are often mapped to vectors of binary values where 1 indicates the presence of a word and 0 indicates the absence. The vectors are then used to train predictive models. In tree-based ensemble models, predictions from some decision trees may be made purely from absent words. This type of predictions should be trusted less as absent words can be interpreted in multiple ways. In this work, we propose to improve the comprehensibility and accuracy of ensemble models by distinguishing word presence and absence. The presented method weights predictions based on word presence. Experimental results on 35 real text datasets indicate that our method outperforms state-of-the-art ensemble methods on various text classification tasks.

Supplementary Material

MP4 File (3397271.3401229.mp4)
This video presents the paper titled "Evidence Weighted Ensembles for Text Classification". We present a method to assign reliability weights to individual predictions by incorporating the evidence i.e., the number of features with values 1 used for making predictions. Text documents are often mapped to vectors of binary values where 1 indicates the presence of a word and 0 indicates the absence. The vectors are then used to train predictive models. In tree-based ensemble models, predictions from some decision trees may be made purely from absent words. This type of predictions should be trusted less as absent words can be interpreted in multiple ways. In this work, we propose to improve the comprehensibility and accuracy of ensemble models by distinguishing word presence and absence. The presented method weights predictions based on word presence. Experimental results on 35 real text datasets indicate that our method outperforms state-of-the-art ensemble methods on various text classification tasks.

References

[1]
Charu C. Aggarwal. 2014. An Introduction to Data Classification. In Data Classification: Algorithms and Applications. Chapman and Hall/CRC, 1--36.
[2]
Giuliano Armano and Emanuele Tamponi. 2018. Building forests of local trees. Pattern Recognition, Vol. 76 (2018), 380--390.
[3]
John Blitzer, Mark Dredze, and Fernando Pereira. 2009. Multi-Domain Sentiment Dataset (version 2.0). https://www.cs.jhu.edu/ mdredze/datasets/sentiment/
[4]
Leo Breiman. 2001. Random forests. Machine learning, Vol. 45, 1 (2001), 5--32.
[5]
Pierre Geurts, Damien Ernst, and Louis Wehenkel. 2006. Extremely randomized trees. Machine Learning, Vol. 63, 1 (2006), 3--42.
[6]
Trevor Hastie, Saharon Rosset, Ji Zhu, and Hui Zou. 2009. Multi-class AdaBoost. Statistics and its Interface, Vol. 2, 3 (2009), 349--360.
[7]
M. Z. Islam, J. Liu, J. Li, L. Liu, and W. Kang. 2019. A Semantics Aware Random Forest for Text Classification. In Proceedings of CIKM 2019. 1061--1070.
[8]
T. Salles, M. Gonçalves, V. Rodrigues, and L. Rocha. 2018. Improving random forests by neighborhood projection for effective text classification. Information Systems, Vol. 77 (2018), 1--21.
[9]
Peter D. Turney and Patrick Pantel. 2010. From Frequency to Meaning: Vector Space Models of Semantics. Artificial Intelligence Research, Vol. 37, 1 (2010), 141--188.
[10]
G. Wang, J. Sun, J. Ma, K. Xu, and J. Gu. 2014. Sentiment classification: The contribution of ensemble learning. Decision support systems, Vol. 57 (2014), 77--93.
[11]
Le Zhang and Ponnuthurai Nagaratnam Suganthan. 2014. Random Forests with ensemble of feature spaces. Pattern Recognition, Vol. 47, 10 (2014), 3429--3437.

Cited By

View all
  • (2024)Research Paper Classification Using Machine and Deep Learning TechniquesProceedings of the 2024 9th International Conference on Intelligent Information Technology10.1145/3654522.3654557(352-358)Online publication date: 23-Feb-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2020
2548 pages
ISBN:9781450380164
DOI:10.1145/3397271
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. decision semantics
  2. ensemble methods
  3. text classification

Qualifiers

  • Short-paper

Conference

SIGIR '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Research Paper Classification Using Machine and Deep Learning TechniquesProceedings of the 2024 9th International Conference on Intelligent Information Technology10.1145/3654522.3654557(352-358)Online publication date: 23-Feb-2024

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media