skip to main content
research-article

Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement

Published: 09 November 2010 Publication History

Abstract

Cross-validation is a mainstay for measuring performance and progress in machine learning. There are subtle differences in how exactly to compute accuracy, F-measure and Area Under the ROC Curve (AUC) in cross-validation studies. However, these details are not discussed in the literature, and incompatible methods are used by various papers and software packages. This leads to inconsistency across the research literature. Anomalies in performance calculations for particular folds and situations go undiscovered when they are buried in aggregated results over many folds and datasets, without ever a person looking at the intermediate performance measurements. This research note clarifies and illustrates the differences, and it provides guidance for how best to measure classification performance under cross-validation. In particular, there are several divergent methods used for computing F-measure, which is often recommended as a performance measure under class imbalance, e.g., for text classification domains and in one-vs.-all reductions of datasets having many classes. We show by experiment that all but one of these computation methods leads to biased measurements, especially under high class imbalance. This paper is of particular interest to those designing machine learning software libraries and researchers focused on high class imbalance.

References

[1]
G. Forman. BNS feature scaling: an improved representation over TF-IDF for SVM text classification. In Proceedings of the 17th ACM Conference onInformation and Knowledge Management (CIKM), pages 263--270, New York, NY, 2008. ACM.
[2]
T. Golub, D. Slonim, P. Tamayo, C. Huard, M. Caasenbeek, J. Mesirov, H. Coller, M. Loh, J. Downing, M. Caligiuri, C. Bloomfield, and E. Lander. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286:531--537, 1999.
[3]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA data mining software: An update. SIGKDD Explorations, 11(1), 2009.
[4]
D. D. Lewis and M. Ringuette. A comparison of two learning algorithms for text categorization. In Symposium on Document Analysis and Information Retrieval, pages 81--93, Las Vegas, NV, Apr. 1994. ISRI; Univ. of Nevada, Las Vegas.
[5]
D. D. Lewis, Y. Yang, T. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. volume 5, pages 361--397, 2004. http://www.jmlr.org/papers/volume5/lewis04a/lewis04a.pdf.
[6]
I. Mierswa, M. Wurst, R. Klinkenberg, M. Scholz, and T. Euler. Yale: Rapid prototyping for complex data mining tasks. In L. Ungar, M. Craven, D. Gunopulos, and T. Eliassi-Rad, editors, KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 935--940, New York, NY, USA, August 2006. ACM.
[7]
T. Raeder, G. Forman, and N. V. Chawla. Data Mining: Foundations and Intelligent Paradigms, chapter Learning with Imbalanced Data: Evaluation Matters. Intelligent Systems Reference Library. Springer Verlag, 2010.

Cited By

View all
  • (2024)Utilizing Artificial Intelligence for Text Classification in Communication SciencesDesign and Development of Emerging Chatbot Technology10.4018/979-8-3693-1830-0.ch013(218-235)Online publication date: 15-Mar-2024
  • (2024)Integrated drug response prediction models pinpoint repurposed drugs with effectiveness against rhabdomyosarcomaPLOS ONE10.1371/journal.pone.029562919:1(e0295629)Online publication date: 26-Jan-2024
  • (2024)Data Leakage and Evaluation Issues in Micro-Expression AnalysisIEEE Transactions on Affective Computing10.1109/TAFFC.2023.326506315:1(186-197)Online publication date: 1-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGKDD Explorations Newsletter
ACM SIGKDD Explorations Newsletter  Volume 12, Issue 1
June 2010
77 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/1882471
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2010
Published in SIGKDD Volume 12, Issue 1

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)177
  • Downloads (Last 6 weeks)20
Reflects downloads up to 23 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Utilizing Artificial Intelligence for Text Classification in Communication SciencesDesign and Development of Emerging Chatbot Technology10.4018/979-8-3693-1830-0.ch013(218-235)Online publication date: 15-Mar-2024
  • (2024)Integrated drug response prediction models pinpoint repurposed drugs with effectiveness against rhabdomyosarcomaPLOS ONE10.1371/journal.pone.029562919:1(e0295629)Online publication date: 26-Jan-2024
  • (2024)Data Leakage and Evaluation Issues in Micro-Expression AnalysisIEEE Transactions on Affective Computing10.1109/TAFFC.2023.326506315:1(186-197)Online publication date: 1-Jan-2024
  • (2024)Bioactivity predictions and virtual screening using machine learning predictive modelJournal of Biomolecular Structure and Dynamics10.1080/07391102.2023.2300132(1-20)Online publication date: 12-Jan-2024
  • (2024)GEMTELLIGENCE: Accelerating gemstone classification with deep learningCommunications Engineering10.1038/s44172-024-00252-x3:1Online publication date: 20-Aug-2024
  • (2024)Evaluating deep learning methods applied to Landsat time series subsequences to detect and classify boreal forest disturbances events: The challenge of partial and progressive disturbancesRemote Sensing of Environment10.1016/j.rse.2024.114107306(114107)Online publication date: May-2024
  • (2024)Assessing external validity in practiceResearch in Economics10.1016/j.rie.2024.10096478:3(100964)Online publication date: Sep-2024
  • (2024)Convolutional neural network-based real-time mosquito genus identification using wingbeat frequency: A binary and multiclass classification approachEcological Informatics10.1016/j.ecoinf.2024.102495(102495)Online publication date: Jan-2024
  • (2024)Anomaly detection in sleep: detecting mouth breathing in childrenData Mining and Knowledge Discovery10.1007/s10618-023-00985-x38:3(976-1005)Online publication date: 1-May-2024
  • (2024)Infectious risk events and their novelty in event-based surveillance: new definitions and annotated corpusLanguage Resources and Evaluation10.1007/s10579-024-09728-wOnline publication date: 5-Mar-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media