research-article

How Complex Is Your Classification Problem?: A Survey on Measuring Classification Complexity

Publication: ACM Computing SurveysArticle No.: 107 https://doi.org/10.1145/3347711

Abstract

Characteristics extracted from the training datasets of classification problems have proven to be effective predictors in a number of meta-analyses. Among them, measures of classification complexity can be used to estimate the difficulty in separating the data points into their expected classes. Descriptors of the spatial distribution of the data and estimates of the shape and size of the decision boundary are among the known measures for this characterization. This information can support the formulation of new data-driven pre-processing and pattern recognition techniques, which can in turn be focused on challenges highlighted by such characteristics of the problems. This article surveys and analyzes measures that can be extracted from the training datasets to characterize the complexity of the respective classification problems. Their use in recent literature is also reviewed and discussed, allowing to prospect opportunities for future work in the area. Finally, descriptions are given on an R package named Extended Complexity Library (ECoL) that implements a set of complexity measures and is made publicly available.

References

  1. Shawkat Ali and Kate A. Smith. 2006. On learning algorithm selection for classification. Appl. Soft Comput. 6, 2 (2006), 119--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Nafees Anwar, Geoff Jones, and Siva Ganesh. 2014. Measurement of data complexity for classification problems with unbalanced data. Statist. Anal. Data Mining 7, 3 (2014), 194--211.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Giuliano Armano. 2015. A direct measure of discriminant and characteristic capability for classifier building and assessment. Inform. Sci. 325 (2015), 466--483. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Giuliano Armano and Emanuele Tamponi. 2016. Experimenting multiresolution analysis for identifying regions of different classification complexity. Pattern Anal. Appl. 19, 1 (2016), 129--137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Mitra Basu and Tin K. Ho. 2006. Data Complexity in Pattern Recognition. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Gustavo E. A. P. A. Batista, Ronaldo C. Prati, and Maria C. Monard. 2004. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 20--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Richard Baumgartner, Tin K. Ho, Ray Somorjai, Uwe Himmelreich, and Tania Sorrell. 2006. Complexity of magnetic resonance spectrum classification. In Data Complexity in Pattern Recognition. Springer, 241--248.Google ScholarGoogle Scholar
  8. Ester Bernadó-Mansilla and Tin K. Ho. 2005. Domain of competence of XCS classifier system in complexity measurement space. IEEE Trans. Evol. Comput. 9, 1 (2005), 82--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Léon Bottou and Chih-Jen Lin. 2007. Support vector machine solvers. Large Scale Kern. Mach. 3, 1 (2007), 301--320.Google ScholarGoogle Scholar
  10. Alceu S. Britto Jr., Robert Sabourin, and Luiz E. S. Oliveira. 2014. Dynamic selection of classifiers—A comprehensive review. Pattern Recog. 47, 11 (2014), 3665--3680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. André L. Brun, Alceu S. Britto Jr., Luiz S. Oliveira, Fabricio Enembreck, and Robert Sabourin. 2018. A framework for dynamic classifier selection oriented by the classification problem difficulty. Pattern Recog. 76 (2018), 175--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Tadeusz Caliński and Jerzy Harabasz. 1974. A dendrite method for cluster analysis. Commun. Stat.-theor. Meth. 3, 1 (1974), 1--27.Google ScholarGoogle ScholarCross RefCross Ref
  13. Yoisel Campos, Carlos Morell, and Francesc J. Ferri. 2012. A local complexity based combination method for decision forests trained with high-dimensional data. In Proceedings of the 12th International Conference on Intelligent Systems Design and Applications (ISDA’12). 194--199.Google ScholarGoogle Scholar
  14. Francisco Charte, Antonio Rivera, María J. del Jesus, and Francisco Herrera. 2016. On the impact of dataset complexity and sampling strategy in multilabel classifiers performance. In Proceedings of the 11th International Conference on Hybrid Artificial Intelligence Systems (HAIS’16). 500--511.Google ScholarGoogle ScholarCross RefCross Ref
  15. Patrick M. Ciarelli, Elias Oliveira, and Evandro O. T. Salles. 2013. Impact of the characteristics of data sets on incremental learning. Artific. Intell. Res. 2, 4 (2013), 63--74.Google ScholarGoogle Scholar
  16. Ivan G. Costa, Ana C. Lorena, Liciana R. M. P. y Peres, and Marcilio C. P. de Souto. 2009. Using supervised complexity measures in the analysis of cancer gene expression data sets. In Proceedings of the Brazilian Symposium on Bioinformatics. 48--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Nello Cristianini and John Shawe-Taylor. 2000. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Rafael M. O. Cruz, Robert Sabourin, and George D. C. Cavalcanti. 2017a. META-DES.Oracle: Meta-learning and feature selection for dynamic ensemble selection. Inform. Fus. 38 (2017), 84--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Rafael M. O. Cruz, Robert Sabourin, and George D. C. Cavalcanti. 2018. Dynamic classifier selection: Recent advances and perspectives. Inform. Fus. 41 (2018), 195--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Rafael M. O. Cruz, Robert Sabourin, George D. C. Cavalcanti, and Tsang Ing Ren. 2015. META-DES: A dynamic ensemble selection framework using meta-learning. Pattern Recog. 48, 5 (2015), 1925--1935. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Rafael M. O. Cruz, Hiba H. Zakane, Robert Sabourin, and George D. C. Cavalcanti. 2017b. Dynamic ensemble selection VS K-NN: Why and when dynamic selection obtains higher classification performance? In Proceedings of the 17th International Conference on Image Processing Theory, Tools and Applications (IPTA’17). 1--6.Google ScholarGoogle Scholar
  22. Lisa Cummins. 2013. Combining and Choosing Case Base Maintenance Algorithms. Ph.D. Dissertation. National University of Ireland, Cork.Google ScholarGoogle Scholar
  23. Lisa Cummins and Derek Bridge. 2011. On dataset complexity for case base maintenance. In Proceedings of the 19th International Conference on Case-Based Reasoning (ICCBR’11). 47--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Silvia N. das Dôres, Luciano Alves, Duncan D. Ruiz, and Rodrigo C. Barros. 2016. A meta-learning framework for algorithm recommendation in software fault prediction. In Proceedings of the 31st ACM Symposium on Applied Computing (SAC’16). 1486--1491. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Vinícius V. de Melo and Ana C. Lorena. 2018. Using complexity measures to evolve synthetic classification datasets. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’18). 1--8.Google ScholarGoogle Scholar
  26. Ming Dong and Rishabh P. Kothari. 2003. Feature subset selection using a new definition of classificability. Pattern Recog. Lett. 24 (2003), 1215--1225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. David A. Elizondo, Ralph Birkenhead, Matias Gamez, Noelia Garcia, and Esteban Alfaro. 2012. Linear separability and classification complexity. Expert Syst. Appl. 39, 9 (2012), 7796--7807. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Alberto Fernández, Salvador García, Mikel Galar, Ronaldo C. Prati, Bartosz Krawczyk, and Francisco Herrera. 2018. Learning from Imbalanced Data Sets. Springer.Google ScholarGoogle Scholar
  29. María J. Flores, José A. Gámez, and Ana M. Martínez. 2014. Domains of competence of the semi-naive Bayesian network classifiers. Inform. Sci. 260 (2014), 120--148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Albert Fornells, Elisabet Golobardes, Josep M. Martorell, Josep M. Garrell, Núria Macià, and Ester Bernadó. 2007. A methodology for analyzing case retrieval from a clustered case memory. In Proceedings of the 7th International Conference on Case-Based Reasoning (ICCBR’07). 122--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Benoit Frenay and Michel Verleysen. 2014. Classification in the presence of label noise: A survey. IEEE Trans. Neural Netw. Learn. Syst. 25, 5 (2014), 845--869.Google ScholarGoogle ScholarCross RefCross Ref
  32. Luís P. F. Garcia, André C. P. L. F. de Carvalho, and Ana C. Lorena. 2013. Noisy data set identification. In Proceedings of the 8th International Conference on Hybrid Artificial Intelligent Systems (HAIS’13). 629--638.Google ScholarGoogle Scholar
  33. Luís P. F. Garcia, André C. P. L. F. de Carvalho, and Ana C. Lorena. 2015. Effect of label noise in the complexity of classification problems. Neurocomputing 160 (2015), 108--119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Luís P. F. Garcia, André C. P. L. F. de Carvalho, and Ana C. Lorena. 2016. Noise detection in the meta-learning level. Neurocomputing 176 (2016), 14--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Luís P. F. Garcia, Ana C. Lorena, Marcilio C. P. de Souto, and Tin Kam Ho. 2018. Classifier recommendation using data complexity measures. In Proceedings of the 24th International Conference on Pattern Recognition (ICPR’18). 874--879.Google ScholarGoogle ScholarCross RefCross Ref
  36. David García-Callejas and Miguel B. Araújo. 2016. The effects of model and data complexity on predictions from species distributions models. Ecol. Modell. 326 (2016), 4--12.Google ScholarGoogle ScholarCross RefCross Ref
  37. Alvaro Garcia-Piquer, Albert Fornells, Albert Orriols-Puig, Guiomar Corral, and Elisabet Golobardes. 2012. Data classification through an evolutionary approach based on multiple criteria. Knowl. Inform. Syst. 33, 1 (2012), 35--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Rongsheng Gong and Samuel H. Huang. 2012. A Kolmogorov-Smirnov statistic based segmentation approach to learning from imbalanced datasets: With application in property refinance prediction. Expert Syst. Appl. 39, 6 (2012), 6192--6200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. John Gower. 1971. A general coefficient of similarity and some of its properties. Biometrics 27, 4 (1971), 857--871.Google ScholarGoogle ScholarCross RefCross Ref
  40. Haibo He and Edwardo A. Garcia. 2009. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 9 (2009), 1263--1284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Zhi-Min He, Patrick P. K. Chan, Daniel S. Yeung, Witold Pedrycz, and Wing W. Y. Ng. 2015. Quantification of side-channel information leaks based on data complexity measures for web browsing. Int. J. Machine Learn. Cyber. 6, 4 (2015), 607--619.Google ScholarGoogle ScholarCross RefCross Ref
  42. Tin K. Ho. 2000. Complexity of classification problems and comparative advantages of combined classifiers. In Proceedings of the International Workshop on Multiple Classifier Systems (MCS’00). 97--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Tin K. Ho. 2002. A data complexity analysis of comparative advantages of decision forest constructors. Pattern Anal. Appl. 5 (2002), 102--112.Google ScholarGoogle ScholarCross RefCross Ref
  44. Tin K. Ho and Mitra Basu. 2002. Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Machine Intell. 24, 3 (2002), 289--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Tin K. Ho, Mitra Basu, and Martin H. C. Law. 2006. Measures of geometrical complexity in classification problems. In Data Complexity in Pattern Recognition. Springer, 1--23.Google ScholarGoogle Scholar
  46. Tin K. Ho and Ester Bernadó-Mansilla. 2006. Classifier domains of competence in data complexity space. In Data Complexity in Pattern Recognition. Springer, 135--152.Google ScholarGoogle Scholar
  47. Aarnoud Hoekstra and Robert P. W. Duin. 1996. On the nonlinearity of pattern classifiers. In Proceedings of the 13th International Conference on Pattern Recognition (ICPR’96), Vol. 4. 271--275.Google ScholarGoogle Scholar
  48. Qinghua Hu, Witold Pedrycz, Daren Yu, and Jun Lang. 2010. Selecting discrete and continuous features based on neighborhood decision error minimization. IEEE Trans. Syst., Man Cyber., Part B (Cyber.) 40, 1 (2010), 137--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Vidya Kamath, Timothy J. Yeatman, and Steven A. Eschrich. 2008. Toward a measure of classification complexity in gene expression signatures. In Proceedings of the 30th International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS’08). 5704--5707.Google ScholarGoogle Scholar
  50. Sang-Woon Kim and John Oommen. 2009. On using prototype reduction schemes to enhance the computation of volume-based inter-class overlap measures. Pattern Recog. 42, 11 (2009), 2695--2704. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Eric D. Kolaczyk. 2009. Statistical Analysis of Network Data: Methods and Models. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Sotiris Kotsiantis and Dimitris Kanellopoulos. 2006. Discretization techniques: A recent survey. GESTS International Trans. Comput. Sci. Eng. 32, 1 (2006), 47--58.Google ScholarGoogle Scholar
  53. Jesse H. Krijthe, Tin K. Ho, and Marco Loog. 2012. Improving cross-validation based classifier selection using meta-learning. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR’12). 2873--2876.Google ScholarGoogle Scholar
  54. Frank Lebourgeois and Hubert Emptoz. 1996. Pretopological approach for supervised learning. In Proceedings of the 13th International Conference on Pattern Recognition, Vol. 4. 256--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Enrique Leyva, Antonio González, and Raúl Pérez. 2014. A set of complexity measures designed for applying meta-learning to instance selection. IEEE Trans. Knowl. Data Eng. 27, 2 (2014), 354--367.Google ScholarGoogle ScholarCross RefCross Ref
  56. Li Ling and Yaser S. Abu-Mostafa. 2006. Data Complexity in Machine Learning. Technical Report CaltechCSTR:2006.004. California Institute of Technology. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Huan Liu, Hiroshi Motoda, Rudy Setiono, and Zheng Zhao. 2010. Feature selection: An ever evolving frontier in data mining. In Proceedings of the 4th International Workshop on Feature Selection in Data Mining (FSDM’10), Vol. 10. 4--13.Google ScholarGoogle Scholar
  58. Victoria López, Alberto Fernández, Jose G. Moreno-Torres, and Francisco Herrera. 2012. Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics. Expert Syst. Appl. 39, 7 (2012), 6585--6608. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Ana C. Lorena, Ivan G. Costa, Newton Spolaôr, and Marcilio C. P. Souto. 2012. Analysis of complexity indices for classification problems: Cancer gene expression data. Neurocomputing 75, 1 (2012), 33--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Ana C. Lorena and André C. P. L. F. de Carvalho. 2010. Building binary-tree-based multiclass classifiers using separability measures. Neurocomputing 73, 16–18 (2010), 2837--2845. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Ana C. Lorena, André C. P. L. F. de Carvalho, and João M. P. Gama. 2008. A review on the combination of binary classifiers in multiclass problems. Artific. Intell. Rev. 30, 1 (2008), 19--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Ana C. Lorena, Aron I. Maciel, Pericles B. C. Miranda, Ivan G. Costa, and Ricardo B. C. Prudêncio. 2018. Data complexity meta-features for regression problems. Machine Learning 107, 1 (2018), 209--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Giancarlo Lucca, Jose Sanz, Graçaliz P. Dimuro, Benjamín Bedregal, and Humberto Bustince. 2017. Analyzing the behavior of aggregation and pre-aggregation functions in fuzzy rule-based classification systems with data complexity measures. In Proceedings of the 10th Conference of the European Society for Fuzzy Logic and Technology (IWIFSGN’17). 443--455.Google ScholarGoogle Scholar
  64. Julián Luengo and Francisco Herrera. 2015. An automatic extraction method of the domains of competence for learning classifiers using data complexity measures. Knowl. Inform. Syst. 42, 1 (2015), 147--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Núria Macià. 2011. Data Complexity in Supervised Learning: A Far-reaching Implication. Ph.D. Dissertation. La Salle, Universitat Ramon Llull.Google ScholarGoogle Scholar
  66. Núria Macià and Ester Bernadó-Mansilla. 2014. Towards UCI+: A mindful repository design. Inform. Sci. 261 (2014), 237--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Núria Macia, Ester Bernadó-Mansilla, and Albert Orriols-Puig. 2008. Preliminary approach on synthetic data sets generation based on class separability measure. In Proceedings of the 19th International Conference on Pattern Recognition (ICPR’08). 1--4.Google ScholarGoogle ScholarCross RefCross Ref
  68. Núria Macià, Ester Bernadó-Mansilla, Albert Orriols-Puig, and Tin Kam Ho. 2013. Learner excellence biased by data set selection: A case for data characterisation and artificial data sets. Pattern Recog. 46, 3 (2013), 1054--1066. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Núria Macià, Albert Orriols-Puig, and Ester Bernadó-Mansilla. 2010. In search of targeted-complexity problems. In Proceedings of the 12th Conference on Genetic and Evolutionary Computation. 1055--1062. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Witold Malina. 2001. Two-parameter Fisher criterion. IEEE Trans. Syst., Man, Cyber., Part B (Cyber.) 31, 4 (2001), 629--636. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Li Ming and Paul Vitanyi. 1993. An Introduction to Kolmogorov Complexity and Its Applications. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Marvin Minsky and Seymour Papert. 1969. Perceptrons: An Introduction to Computational Geometry. The MIT Press, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Ramón A. Mollineda, José S. Sánchez, and José M. Sotoca. 2005. Data characterization for effective prototype selection. In Proceedings of the 2nd Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA’05). 27--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Ramón A. Mollineda, José S. Sánchez, and José M. Sotoca. 2006. A meta-learning framework for pattern classification by means of data complexity measures. Intel. Artific. 10, 29 (2006), 31--38.Google ScholarGoogle Scholar
  75. Gleison Morais and Ronaldo C. Prati. 2013. Complex network measures for data set characterization. In Proceedings of the 2nd Brazilian Conference on Intelligent Systems (BRACIS’13). 12--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Laura Morán-Fernández, Verónica Bolón-Canedo, and Amparo Alonso-Betanzos. 2017a. Can classification performance be predicted by complexity measures? A study using microarray data. Knowl. Inform. Syst. 51, 3 (2017), 1067--1090. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Laura Morán-Fernández, Verónica Bolón-Canedo, and Amparo Alonso-Betanzos. 2017b. On the use of different base classifiers in multiclass problems. Prog. Artific. Intell. 6, 4 (2017), 315--323.Google ScholarGoogle ScholarCross RefCross Ref
  78. Linda Mthembu and Tshilidzi Marwala. 2008. A note on the separability index. Retrieved from: Arxiv Preprint Arxiv:0812.1107 (2008).Google ScholarGoogle Scholar
  79. Mario A. Muñoz, Laura Villanova, Davaatseren Baatar, and Kate Smith-Miles. 2018. Instance spaces for machine learning classification. Machine Learn. 107, 1 (2018), 109--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Yusuke Nojima, Shinya Nishikawa, and Hisao Ishibuchi. 2011. A meta-fuzzy classifier for specifying appropriate fuzzy partitions by genetic fuzzy rule selection with data complexity measures. In Proceedings of the IEEE International Conference on Fuzzy Systems (FUZZ’11). 264--271.Google ScholarGoogle ScholarCross RefCross Ref
  81. Lucas Chesini Okimoto, Ricardo Manhães Savii, and Ana Carolina Lorena. 2017. Complexity measures effectiveness in feature selection. In Proceedings of the 6th Brazilian Conference on Intelligent Systems (BRACIS’17). 91--96.Google ScholarGoogle ScholarCross RefCross Ref
  82. Albert Orriols-Puig, Núria Macià, and Tin K. Ho. 2010. Documentation for the Data Complexity Library in C++. Technical Report. La Salle, Universitat Ramon Llull.Google ScholarGoogle Scholar
  83. Antonio R. S. Parmezan, Huei D. Lee, and Feng C. Wu. 2017. Metalearning for choosing feature selection algorithms in data mining: Proposal of a new framework. Expert Syst. Appl. 75 (2017), 1--24.Google ScholarGoogle ScholarCross RefCross Ref
  84. Erinija Pranckeviciene, Tin K. Ho, and Ray Somorjai. 2006. Class separability in spaces reduced by feature selection. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Vol. 2. 254--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Thaise M. Quiterio and Ana C. Lorena. 2018. Using complexity measures to determine the structure of directed acyclic graphs in multiclass classification. Appl. Soft Comput. 65 (2018), 428--442. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. George D. C. Cavalcantiand Tsang I. Ren and Breno A. Vale. 2012. Data complexity measures and nearest neighbor classifiers: A practical analysis for meta-learning. In Proceedings of the 24th International Conference on Tools with Artificial Intelligence (ICTAI’12), Vol. 1. 1065--1069. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Anandarup Roy, Rafael M. O. Cruz, Robert Sabourin, and George D. C. Cavalcanti. 2016. Meta-learning recommendation of default size of classifier pool for META-DES. Neurocomputing 216 (2016), 351--362.Google ScholarGoogle ScholarCross RefCross Ref
  88. José A. Saéz, Julián Luengo, and Francisco Herrera. 2013. Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification. Pattern Recog. 46, 1 (2013), 355--364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Miriam Seoane Santos, Jastin Pompeu Soares, Pedro Henrigues Abreu, Helder Araujo, and Joao Santos. 2018. Cross-validation for imbalanced datasets: Avoiding overoptimistic and overfitting approaches. IEEE Comput. Intell. Mag. 13, 4 (2018), 59--76.Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Borja Seijo-Pardo, Verónica Bolón-Canedo, and Amparo Alonso-Betanzos. 2019. On developing an automatic threshold applied to feature selection ensembles. Inform. Fus. 45 (2019), 227--245.Google ScholarGoogle ScholarCross RefCross Ref
  91. Rushit Shah, Varun Khemani, Michael Azarian, Michael Pecht, and Yan Su. 2018. Analyzing data complexity using metafeatures for classification algorithm selection. In Proceedings of the Prognostics and System Health Management Conference (PHM-Chongqing’18). 1280--1284.Google ScholarGoogle ScholarCross RefCross Ref
  92. Sameer Singh. 2003a. Multiresolution estimates of classification complexity. IEEE Trans. Pattern Anal. Machine Intell. 25, 12 (2003), 1534--1539. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Sameer Singh. 2003b. PRISM: A novel framework for pattern recognition. Pattern Anal. Appl. 6, 2 (2003), 134--149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Iryna Skrypnyk. 2011. Irrelevant features, class separability, and complexity of classification problems. In Proceedings of the 23rd IEEE International Conference on Tools with Artificial Intelligence (ICTAI’11). 998--1003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Fred W. Smith. 1968. Pattern classifier design by linear programming. IEEE Trans. Comput. C-17, 4 (1968), 367--372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Michael R. Smith, Tony Martinez, and Christophe Giraud-Carrier. 2014a. An instance level analysis of data complexity. Machine Learn. 95, 2 (2014), 225--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Michael R. Smith, Andrew White, Christophe Giraud-Carrier, and Tony Martinez. 2014b. An easy to use repository for comparing and improving machine learning algorithm usage. Arxiv Preprint Arxiv:1405.7292 (2014).Google ScholarGoogle Scholar
  98. Kate A. Smith-Miles. 2009. Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput. Surv. 41, 1 (2009), 1--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. José M. Sotoca, José Sánchez, and Ramón A. Mollineda. 2005. A review of data complexity measures and their applicability to pattern classification problems. In Actas Del III Taller Nacional de Minería de Dados y Aprendizaje (TAMIDA’05). 77--83.Google ScholarGoogle Scholar
  100. Marcilio C. P. Souto, Ana C. Lorena, Newton Spolaôr, and Ivan G. Costa. 2010. Complexity measures of supervised classification tasks: A case study for cancer gene expression data. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’10). 1352--1358.Google ScholarGoogle Scholar
  101. MengXin Sun, KunHong Liu, QingQiang Wu, QingQi Hong, BeiZhan Wang, and Haiying Zhang. 2019. A novel ECOC algorithm for multiclass microarray data classification based on data complexity analysis. Pattern Recog. 90 (2019), 346--362.Google ScholarGoogle ScholarCross RefCross Ref
  102. Ajay K. Tanwani and Muddassar Farooq. 2010. Classification potential vs. classification accuracy: A comprehensive study of evolutionary algorithms with biomedical datasets. Learn. Class. Syst. 6471 (2010), 127--144.Google ScholarGoogle ScholarCross RefCross Ref
  103. Leonardo Trujillo, Yuliana Martínez, Edgar Galván-López, and Pierrick Legrand. 2011. Predicting problem difficulty for genetic programming applied to data classification. In Proceedings of the 13th Conference on Genetic and Evolutionary Computation (GECCO’11). 1355--1362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. Ricardo Vilalta and Youssef Drissi. 2002. A perspective view and survey of meta-learning. Artif. Intell. Rev. 18, 2 (2002), 77--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. Piyanoot Vorraboot, Suwanna Rasmequan, Chidchanok Lursinsap, and Krisana Chinnasarn. 2012. A modified error function for imbalanced dataset classification problem. In Proceedings of the 7th International Conference on Computing and Convergence Technology (ICCCT’12). 854--859.Google ScholarGoogle Scholar
  106. Christiaan V. D. Walt and Etienne Barnard. 2007. Measures for the characterisation of pattern-recognition data sets. In Proceedings of the 18th Symposium of the Pattern Recognition Association of South Africa (PRASA’07).Google ScholarGoogle Scholar
  107. D. Randall Wilson and Tony R. Martinez. 1997. Improved heterogeneous distance functions. J. Artific. Intell. Res. 6 (1997), 1--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. David H. Wolpert. 1996. The lack of a priori distinctions between learning algorithms. Neural Comput. 8, 7 (1996), 1341--1390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. Yan Xing, Hao Cai, Yanguang Cai, Ole Hejlesen, and Egon Toft. 2013. Preliminary evaluation of classification complexity measures on imbalanced data. In Proceedings of the Chinese Intelligent Automation Conference: Intelligent Information Processing. 189--196.Google ScholarGoogle ScholarCross RefCross Ref
  110. Xueying Zhang, Ruixian Li, Bo Zhang, Yunxiang Yang, Jing Guo, and Xiang Ji. 2019. An instance-based learning recommendation algorithm of imbalance handling methods. Appl. Math. Comput. 351 (2019), 204--218.Google ScholarGoogle ScholarCross RefCross Ref
  111. Xingmin Zhao, Weipeng Cao, Hongyu Zhu, Zhong Ming, and Rana Aamir Raza Ashfaq. 2018. An initial study on the rank of input matrix for extreme learning machine. Int. J. Machine Learn. Cyber. 9, 5 (2018), 867--879.Google ScholarGoogle ScholarCross RefCross Ref
  112. Xiaojin Zhu, John Lafferty, and Ronald Rosenfeld. 2005. Semi-supervised Learning with Graphs. Ph.D. Dissertation. Carnegie Mellon University, Language Technologies Institute, School of Computer Science.Google ScholarGoogle Scholar
  113. Julian Zubek and Dariusz M. Plewczynski. 2016. Complexity curve: A graphical measure of data complexity and classifier performance. PeerJ Comput. Sci. 2 (2016), e76.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. How Complex Is Your Classification Problem?

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!