skip to main content
research-article
Open Access

Active Balancing Mechanism for Imbalanced Medical Data in Deep Learning–Based Classification Models

Published:12 March 2020Publication History
Skip Abstract Section

Abstract

Imbalanced data always has a serious impact on a predictive model, and most under-sampling techniques consume more time and suffer from loss of samples containing critical information during imbalanced data processing, especially in the biomedical field. To solve these problems, we developed an active balancing mechanism (ABM) based on valuable information contained in the biomedical data. ABM adopts the Gaussian naïve Bayes method to estimate the object samples and entropy as a query function to evaluate sample information and only retains valuable samples of the majority class to achieve under-sampling. The Physikalisch Technische Bundesanstalt diagnostic electrocardiogram (ECG) database, including 5,173 normal ECG samples and 26,654 myocardial infarction ECG samples, is applied to verify the validity of ABM. At imbalance rates of 13 and 5, experimental results reveal that ABM takes 7.7 seconds and 13.2 seconds, respectively. Both results are significantly faster than five conventional under-sampling methods. In addition, at the imbalance rate of 13, ABM-based data obtained the highest accuracy of 92.23% and 97.52% using support vector machines and modified convolutional neural networks (MCNNs) with eight layers, respectively. At the imbalance rate of 5, the processed data by ABM also achieved the best accuracy of 92.31% and 98.46% based on support vector machines and MCNNs, respectively. Furthermore, ABM has better performance than two compared methods in F1-measure, G-means, and area under the curve. Consequently, ABM could be a useful and effective approach to deal with imbalanced data in general, particularly biomedical myocardial infarction ECG datasets, and the MCNN can also achieve higher performance compared to the state of the art.

References

  1. Kashif Ahmad, Mohamed Lamine Mekhalfi, Nicola Conci, Farid Melgani, and Francesco De Natale. 2018. Ensemble of deep models for event recognition. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 2 (2018), 51. DOI:https://doi.org/10.1145/3199668Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ulas Baran Baloglu, Muhammed Talo, Ozal Yildirim, Ru San Tan, and U. Rajendra Acharya. 2019. Classification of myocardial infarction with multi-lead ECG signals and deep CNN. Pattern Recognition Letters 122 (2019), 23--30. DOI:https://doi.org/10.1016/j.patrec.2019.02.016Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002), 321--357. DOI:https://doi.org/10.1613/jair.953Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jie Chen, ZhongCheng Wu, and Jun Zhang. 2019. Driving safety risk prediction using cost-sensitive with nonnegativity-constrained autoencoders based on imbalanced naturalistic driving data. IEEE Transactions on Intelligent Transportation Systems 20, 12 (2019), 4450--4465. DOI:https://doi.org/10.1109/TITS.2018.2886280Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Noel C. F. Codella, Q.-B. Nguyen, Sharath Pankanti, D. A. Gutman, Brian Helba, A. C. Halpern, and John R. Smith. 2017. Deep learning ensembles for melanoma recognition in dermoscopy images. IBM Journal of Research and Development 61, 4–5 (2017), Article 5, 15 pages. DOI:https://doi.org/10.1147/JRD.2017.2708299Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chris Drummond and Robert C. Holte. 2003. C4.5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. In Proceedings of the Workshop on Learning from Imbalanced Datasets II, Vol. 11. 1--8.Google ScholarGoogle Scholar
  7. Annarita D’Addabbo and Rosalia Maglietta. 2015. Parallel selective sampling method for imbalanced and large data classification. Pattern Recognition Letters 62 (2015), 61--67. DOI:https://doi.org/10.1016/j.patrec.2015.05.008Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Guo Haixiang, Li Yijing, Jennifer Shang, Gu Mingyun, Huang Yuanyue, and Gong Bing. 2017. Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications 73 (2017), 220--239. DOI:https://doi.org/10.1016/j.eswa.2016.12.035Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chuang Han and Li Shi. 2019. Automated interpretable detection of myocardial infarction fusing energy entropy and morphological features. Computer Methods and Programs in Biomedicine 175 (2019), 9--23. DOI:https://doi.org/10.1016/j.cmpb.2019.03.012Google ScholarGoogle ScholarCross RefCross Ref
  10. Hart. 2003. The condensed nearest neighbor rule. IEEE Transactions on Information Theory 14, 3 (2003), 515--516. DOI:https://doi.org/10.1109/TIT.1968.1054155Google ScholarGoogle Scholar
  11. Haibo He and Edwardo A. Garcia. 2009. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering9 (2009), 1263--1284. DOI:https://doi.org/10.1109/TKDE.2008.239Google ScholarGoogle Scholar
  12. M. Shamim Hossain, Syed Umar Amin, Mansour Alsulaiman, and Ghulam Muhammad. 2019. Applying deep learning for epilepsy seizure detection and brain mapping visualization. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 1s (2019), 1--17. DOI:https://doi.org/10.1145/3241056Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chen Huang, Yining Li, Chen Change Loy, and Xiaoou Tang. 2016. Learning deep representation for imbalanced classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5375--5384. DOI:https://doi.org/10.1109/CVPR.2016.580Google ScholarGoogle ScholarCross RefCross Ref
  14. Nathalie Japkowicz. 2000. The class imbalance problem: Significance and strategies. In Proceedings of the International Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  15. Qi Kang, Lei Shi, MengChu Zhou, XueSong Wang, QiDi Wu, and Zhi Wei. 2017. A distance-based weighted undersampling scheme for support vector machines and its application to imbalanced classification. IEEE Transactions on Neural Networks and Learning Systems 29, 9 (2017), 4152--4165. DOI:https://doi.org/10.1109/TNNLS.2017.2755595Google ScholarGoogle ScholarCross RefCross Ref
  16. Fenglian Li, Xueying Zhang, Xiqian Zhang, Chunlei Du, Yue Xu, and Yu-Chu Tian. 2018. Cost-sensitive and hybrid-attribute measure multi-decision tree over imbalanced data sets. Information Sciences 422 (2018), 242--256. DOI:https://doi.org/10.1016/j.ins.2017.09.013Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jinyan Li, Lian-Sheng Liu, Simon Fong, Raymond K. Wong, Sabah Mohammed, Jinan Fiaidhi, Yunsick Sung, and Kelvin K. L. Wong. 2017. Adaptive swarm balancing algorithms for rare-event prediction in imbalanced healthcare data. Computerized Medical Imaging and Graphics 12, 7 (2017), 1--25. DOI:https://doi.org/10.1371/journal.pone.0180830Google ScholarGoogle Scholar
  18. Chien-Liang Liu and Po-Yen Hsieh. 2019. Model-based synthetic sampling for imbalanced data. IEEE Transactions on Knowledge and Data Engineering. Epub ahead of print. March 18, 2019. DOI:https://doi.org/10.1109/TKDE.2019.2905559Google ScholarGoogle Scholar
  19. Inderjeet Mani and I. Zhang. 2003. kNN approach to unbalanced data distributions: A case study involving information extraction. In Proceedings of the Workshop on Learning from Imbalanced Datasets, Vol. 126. 42--48.Google ScholarGoogle Scholar
  20. Antonio Maratea, Alfredo Petrosino, and Mario Manzo. 2014. Adjusted F-measure and kernel scaling for imbalanced data learning. Information Sciences 257 (2014), 331--341. DOI:https://doi.org/10.1016/j.ins.2013.04.016Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Iman Nekooeimehr and Susana K. Lai-Yuen. 2016. Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets. Expert Systems with Applications 46 (2016), 405--416. DOI:https://doi.org/10.1016/j.eswa.2015.10.031Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jiapu Pan and Willis J. Tompkins. 1985. A real-time QRS detection algorithm. IEEE Transactions on Biomedical Engineering 32, 3 (1985), 230--236. DOI:https://doi.org/10.1109/TBME.1985.325532Google ScholarGoogle ScholarCross RefCross Ref
  23. Sandeep Pirbhulal, Heye Zhang, Md E. Alahi, Hemant Ghayvat, Subhas Mukhopadhyay, Yuan-Ting Zhang, and Wanqing Wu. 2017. A novel secure IoT-based smart home automation system using a wireless sensor network. Sensors 17, 1 (2017), 69. DOI:https://doi.org/10.3390/s17030606Google ScholarGoogle ScholarCross RefCross Ref
  24. Sandeep Pirbhulal, Heye Zhang, Wanqing Wu, Subhas Chandra Mukhopadhyay, and Yuan-Ting Zhang. 2018. Heartbeats based biometric random binary sequences generation to secure wireless body sensor networks. IEEE Transactions on Biomedical Engineering 65, 12 (2018), 2751--2759. DOI:https://doi.org/10.1109/TBME.2018.2815155Google ScholarGoogle ScholarCross RefCross Ref
  25. Foster Provost, Tom Fawcett, and Ron Kohavi. 1997. The case against accuracy estimation for comparing induction algorithms. In Proceedings of the 15th International Conference on Machine Learning. 445--453.Google ScholarGoogle Scholar
  26. Joseph D. Prusa, Taghi M. Khoshgoftaar, and Naeem Seliya. 2016. Enhancing ensemble learners with data sampling on high-dimensional imbalanced tweet sentiment data. In Proceedings of the 29th International Flairs Conference.Google ScholarGoogle Scholar
  27. Bharat Richhariya and Muhammad Tanveer. 2018. A robust fuzzy least squares twin support vector machine for class imbalance learning. Applied Soft Computing 71 (2018), 418--432. DOI:https://doi.org/10.1016/j.asoc.2018.07.003Google ScholarGoogle ScholarCross RefCross Ref
  28. Deboleena Sadhukhan, Saurabh Pal, and Madhuchhanda Mitra. 2018. Automated identification of myocardial infarction using harmonic phase distribution pattern of ECG data. IEEE Transactions on Instrumentation and Measurement 67, 10 (2018), 2303--2313. DOI:https://doi.org/10.1109/TIM.2018.2816458Google ScholarGoogle ScholarCross RefCross Ref
  29. Roghayeh Soleymani, Eric Granger, and Giorgio Fumera. 2018. Progressive boosting for class imbalance and its application to face re-identification. Expert Systems with Applications 101 (2018), 271--291. DOI:https://doi.org/10.1016/j.eswa.2018.01.023Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Yanmin Sun, Mohamed S. Kamel, and Yang Wang. 2006. Boosting for learning multiple classes with imbalanced class distribution. In Proceedings of the 6th International Conference on Data Mining (ICDM’06). IEEE, Los Alamitos, CA, 592--602. DOI:https://doi.org/10.1109/ICDM.2006.29Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Seba Susan and Amitesh Kumar. 2019. SSOMaj-SMOTE-SSOMin: Three-step intelligent pruning of majority and minority samples for learning from imbalanced datasets. Applied Soft Computing 78 (2019), 141--149. DOI:https://doi.org/10.1016/j.asoc.2019.02.028Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Yonglong Tian, Guang-He Lee, Hao He, Chen-Yu Hsu, and Dina Katabi. 2018. RF-based fall monitoring using convolutional neural networks. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 3 (2018), 1--24. DOI:https://doi.org/10.1145/3264947Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Ivan Tomek. 1976. Two modifications of CNN. IEEE Transactions on Systems, Man, and Cybernetics 6 (1976), 769--772. DOI:https://doi.org/10.1109/TSMC.1976.4309452Google ScholarGoogle Scholar
  34. Ivan Tomek. 2007. An experiment with the edited nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics SMC-6, 6 (2007), 448--452. DOI:https://doi.org/10.1109/TSMC.1976.4309523Google ScholarGoogle Scholar
  35. Chih-Fong Tsai, Wei-Chao Lin, Ya-Han Hu, and Guan-Ting Yao. 2019. Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Information Sciences 477 (2019), 47--54. DOI:https://doi.org/10.1016/j.ins.2018.10.029Google ScholarGoogle ScholarCross RefCross Ref
  36. Yuandou Wang, Hang Liu, Wanbo Zheng, Yunni Xia, Yawen Li, Peng Chen, Kunyin Guo, and Hong Xie. 2019. Multi-objective workflow scheduling with deep-Q-network-based multi-agent reinforcement learning. IEEE Access 7 (2019), 39974--39982. DOI:https://doi.org/10.1109/ACCESS.2019.2902846Google ScholarGoogle ScholarCross RefCross Ref
  37. Wanqing Wu, Sandeep Pirbhulal, Arun Kumar Sangaiah, Subhas Chandra Mukhopadhyay, and Guanglin Li. 2018. Optimization of signal quality over comfortability of textile electrodes for ECG monitoring in fog computing based medical applications. Future Generation Computer Systems 86 (2018), 515--526. DOI:https://doi.org/10.1016/j.future.2018.04.024Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Wanqing Wu, Sandeep Pirbhulal, Heye Zhang, and Subhas Chandra Mukhopadhyay. 2018. Quantitative assessment for self-tracking of acute stress based on triangulation principle in a wearable sensor system. IEEE Journal of Biomedical and Health Informatics 23, 2 (2018), 703--713. DOI:https://doi.org/10.1109/jbhi.2018.2832069Google ScholarGoogle ScholarCross RefCross Ref
  39. Wanqing Wu, Heye Zhang, Sandeep Pirbhulal, Subhas Chandra Mukhopadhyay, and Yuan-Ting Zhang. 2015. Assessment of biofeedback training for emotion management through wearable textile physiological monitoring system. IEEE Sensors Journal 15, 12 (2015), 7087--7095. DOI:https://doi.org/10.1109/JSEN.2015.2470638Google ScholarGoogle ScholarCross RefCross Ref
  40. Yawen Xiao, Jun Wu, Zongli Lin, and Xiaodong Zhao. 2018. A deep learning-based multi-model ensemble method for cancer prediction. Computer Methods and Programs in Biomedicine 153 (2018), 1--9. DOI:https://doi.org/10.1016/j.cmpb.2017.09.005Google ScholarGoogle ScholarCross RefCross Ref
  41. Yilin Yan, Min Chen, Mei-Ling Shyu, and Shu-Ching Chen. 2015. Deep learning for imbalanced multimedia data classification. In Proceedings of the 2015 IEEE International Symposium on Multimedia (ISM’15). IEEE, Los Alamito, CA, 483--488. DOI:https://doi.org/10.1109/ISM.2015.126Google ScholarGoogle ScholarCross RefCross Ref
  42. Show-Jane Yen and Yue-Shi Lee. 2009. Cluster-based under-sampling approaches for imbalanced data distributions. Expert Systems with Applications 36, 3 (2009), 5718--5727. DOI:https://doi.org/10.1016/j.eswa.2008.06.108Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Hualong Yu, Sun Changyin, Xibei Yang, Shang Zheng, and Haitao Zou. 2019. Fuzzy support vector machine with relative density information for classifying imbalanced data. IEEE Transactions on Fuzzy Systems 27, 12 (2019), 2353--2367. DOI:https://doi.org/10.1109/TFUZZ.2019.2898371Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Chong Zhang, Kay Chen Tan, Haizhou Li, and Geok Soon Hong. 2019. A cost-sensitive deep belief network for imbalanced classification. IEEE Transactions on Neural Networks and Learning Systems 30, 1 (2019), 109--122. DOI:https://doi.org/10.1109/TNNLS.2018.2832648Google ScholarGoogle ScholarCross RefCross Ref
  45. Qingchen Zhang, Laurence T. Yang, Zhikui Chen, and Peng Li. 2018. Dependable deep computation model for feature learning on big data in cyber-physical systems. ACM Transactions on Cyber-Physical Systems 3, 1 (2018), 11. DOI:https://doi.org/10.1145/3110218Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Xiaoxuan Zhang, Tianbao Yang, and Padmini Srinivasan. 2016. Online asymmetric active learning with imbalanced data. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 2055--2064. DOI:https://doi.org/10.1145/2939672.2939854Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Yifan Zhang, Peilin Zhao, Jiezhang Cao, Wenye Ma, Junzhou Huang, Qingyao Wu, and Mingkui Tan. 2018. Online adaptive asymmetric active learning for budgeted imbalanced data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 2768--2777. DOI:https://doi.org/10.1145/3219819.3219948Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Zhedong Zheng, Liang Zheng, and Yi Yang. 2016. A discriminatively learned CNN embedding for person re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 1 (2016), 1--20. DOI:https://doi.org/10.1145/3159171Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Active Balancing Mechanism for Imbalanced Medical Data in Deep Learning–Based Classification Models

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 16, Issue 1s
        Special Issue on Multimodal Machine Learning for Human Behavior Analysis and Special Issue on Computational Intelligence for Biomedical Data and Imaging
        January 2020
        376 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3388236
        Issue’s Table of Contents

        Copyright © 2020 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 March 2020
        • Revised: 1 August 2019
        • Accepted: 1 August 2019
        • Received: 1 May 2019
        Published in tomm Volume 16, Issue 1s

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!