Abstract
Imbalanced data always has a serious impact on a predictive model, and most under-sampling techniques consume more time and suffer from loss of samples containing critical information during imbalanced data processing, especially in the biomedical field. To solve these problems, we developed an active balancing mechanism (ABM) based on valuable information contained in the biomedical data. ABM adopts the Gaussian naïve Bayes method to estimate the object samples and entropy as a query function to evaluate sample information and only retains valuable samples of the majority class to achieve under-sampling. The Physikalisch Technische Bundesanstalt diagnostic electrocardiogram (ECG) database, including 5,173 normal ECG samples and 26,654 myocardial infarction ECG samples, is applied to verify the validity of ABM. At imbalance rates of 13 and 5, experimental results reveal that ABM takes 7.7 seconds and 13.2 seconds, respectively. Both results are significantly faster than five conventional under-sampling methods. In addition, at the imbalance rate of 13, ABM-based data obtained the highest accuracy of 92.23% and 97.52% using support vector machines and modified convolutional neural networks (MCNNs) with eight layers, respectively. At the imbalance rate of 5, the processed data by ABM also achieved the best accuracy of 92.31% and 98.46% based on support vector machines and MCNNs, respectively. Furthermore, ABM has better performance than two compared methods in F1-measure, G-means, and area under the curve. Consequently, ABM could be a useful and effective approach to deal with imbalanced data in general, particularly biomedical myocardial infarction ECG datasets, and the MCNN can also achieve higher performance compared to the state of the art.
- Kashif Ahmad, Mohamed Lamine Mekhalfi, Nicola Conci, Farid Melgani, and Francesco De Natale. 2018. Ensemble of deep models for event recognition. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 2 (2018), 51. DOI:https://doi.org/10.1145/3199668Google Scholar
Digital Library
- Ulas Baran Baloglu, Muhammed Talo, Ozal Yildirim, Ru San Tan, and U. Rajendra Acharya. 2019. Classification of myocardial infarction with multi-lead ECG signals and deep CNN. Pattern Recognition Letters 122 (2019), 23--30. DOI:https://doi.org/10.1016/j.patrec.2019.02.016Google Scholar
Digital Library
- Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002), 321--357. DOI:https://doi.org/10.1613/jair.953Google Scholar
Digital Library
- Jie Chen, ZhongCheng Wu, and Jun Zhang. 2019. Driving safety risk prediction using cost-sensitive with nonnegativity-constrained autoencoders based on imbalanced naturalistic driving data. IEEE Transactions on Intelligent Transportation Systems 20, 12 (2019), 4450--4465. DOI:https://doi.org/10.1109/TITS.2018.2886280Google Scholar
Digital Library
- Noel C. F. Codella, Q.-B. Nguyen, Sharath Pankanti, D. A. Gutman, Brian Helba, A. C. Halpern, and John R. Smith. 2017. Deep learning ensembles for melanoma recognition in dermoscopy images. IBM Journal of Research and Development 61, 4–5 (2017), Article 5, 15 pages. DOI:https://doi.org/10.1147/JRD.2017.2708299Google Scholar
Digital Library
- Chris Drummond and Robert C. Holte. 2003. C4.5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. In Proceedings of the Workshop on Learning from Imbalanced Datasets II, Vol. 11. 1--8.Google Scholar
- Annarita D’Addabbo and Rosalia Maglietta. 2015. Parallel selective sampling method for imbalanced and large data classification. Pattern Recognition Letters 62 (2015), 61--67. DOI:https://doi.org/10.1016/j.patrec.2015.05.008Google Scholar
Digital Library
- Guo Haixiang, Li Yijing, Jennifer Shang, Gu Mingyun, Huang Yuanyue, and Gong Bing. 2017. Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications 73 (2017), 220--239. DOI:https://doi.org/10.1016/j.eswa.2016.12.035Google Scholar
Digital Library
- Chuang Han and Li Shi. 2019. Automated interpretable detection of myocardial infarction fusing energy entropy and morphological features. Computer Methods and Programs in Biomedicine 175 (2019), 9--23. DOI:https://doi.org/10.1016/j.cmpb.2019.03.012Google Scholar
Cross Ref
- Hart. 2003. The condensed nearest neighbor rule. IEEE Transactions on Information Theory 14, 3 (2003), 515--516. DOI:https://doi.org/10.1109/TIT.1968.1054155Google Scholar
- Haibo He and Edwardo A. Garcia. 2009. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering9 (2009), 1263--1284. DOI:https://doi.org/10.1109/TKDE.2008.239Google Scholar
- M. Shamim Hossain, Syed Umar Amin, Mansour Alsulaiman, and Ghulam Muhammad. 2019. Applying deep learning for epilepsy seizure detection and brain mapping visualization. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 1s (2019), 1--17. DOI:https://doi.org/10.1145/3241056Google Scholar
Digital Library
- Chen Huang, Yining Li, Chen Change Loy, and Xiaoou Tang. 2016. Learning deep representation for imbalanced classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5375--5384. DOI:https://doi.org/10.1109/CVPR.2016.580Google Scholar
Cross Ref
- Nathalie Japkowicz. 2000. The class imbalance problem: Significance and strategies. In Proceedings of the International Conference on Artificial Intelligence.Google Scholar
- Qi Kang, Lei Shi, MengChu Zhou, XueSong Wang, QiDi Wu, and Zhi Wei. 2017. A distance-based weighted undersampling scheme for support vector machines and its application to imbalanced classification. IEEE Transactions on Neural Networks and Learning Systems 29, 9 (2017), 4152--4165. DOI:https://doi.org/10.1109/TNNLS.2017.2755595Google Scholar
Cross Ref
- Fenglian Li, Xueying Zhang, Xiqian Zhang, Chunlei Du, Yue Xu, and Yu-Chu Tian. 2018. Cost-sensitive and hybrid-attribute measure multi-decision tree over imbalanced data sets. Information Sciences 422 (2018), 242--256. DOI:https://doi.org/10.1016/j.ins.2017.09.013Google Scholar
Digital Library
- Jinyan Li, Lian-Sheng Liu, Simon Fong, Raymond K. Wong, Sabah Mohammed, Jinan Fiaidhi, Yunsick Sung, and Kelvin K. L. Wong. 2017. Adaptive swarm balancing algorithms for rare-event prediction in imbalanced healthcare data. Computerized Medical Imaging and Graphics 12, 7 (2017), 1--25. DOI:https://doi.org/10.1371/journal.pone.0180830Google Scholar
- Chien-Liang Liu and Po-Yen Hsieh. 2019. Model-based synthetic sampling for imbalanced data. IEEE Transactions on Knowledge and Data Engineering. Epub ahead of print. March 18, 2019. DOI:https://doi.org/10.1109/TKDE.2019.2905559Google Scholar
- Inderjeet Mani and I. Zhang. 2003. kNN approach to unbalanced data distributions: A case study involving information extraction. In Proceedings of the Workshop on Learning from Imbalanced Datasets, Vol. 126. 42--48.Google Scholar
- Antonio Maratea, Alfredo Petrosino, and Mario Manzo. 2014. Adjusted F-measure and kernel scaling for imbalanced data learning. Information Sciences 257 (2014), 331--341. DOI:https://doi.org/10.1016/j.ins.2013.04.016Google Scholar
Digital Library
- Iman Nekooeimehr and Susana K. Lai-Yuen. 2016. Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets. Expert Systems with Applications 46 (2016), 405--416. DOI:https://doi.org/10.1016/j.eswa.2015.10.031Google Scholar
Digital Library
- Jiapu Pan and Willis J. Tompkins. 1985. A real-time QRS detection algorithm. IEEE Transactions on Biomedical Engineering 32, 3 (1985), 230--236. DOI:https://doi.org/10.1109/TBME.1985.325532Google Scholar
Cross Ref
- Sandeep Pirbhulal, Heye Zhang, Md E. Alahi, Hemant Ghayvat, Subhas Mukhopadhyay, Yuan-Ting Zhang, and Wanqing Wu. 2017. A novel secure IoT-based smart home automation system using a wireless sensor network. Sensors 17, 1 (2017), 69. DOI:https://doi.org/10.3390/s17030606Google Scholar
Cross Ref
- Sandeep Pirbhulal, Heye Zhang, Wanqing Wu, Subhas Chandra Mukhopadhyay, and Yuan-Ting Zhang. 2018. Heartbeats based biometric random binary sequences generation to secure wireless body sensor networks. IEEE Transactions on Biomedical Engineering 65, 12 (2018), 2751--2759. DOI:https://doi.org/10.1109/TBME.2018.2815155Google Scholar
Cross Ref
- Foster Provost, Tom Fawcett, and Ron Kohavi. 1997. The case against accuracy estimation for comparing induction algorithms. In Proceedings of the 15th International Conference on Machine Learning. 445--453.Google Scholar
- Joseph D. Prusa, Taghi M. Khoshgoftaar, and Naeem Seliya. 2016. Enhancing ensemble learners with data sampling on high-dimensional imbalanced tweet sentiment data. In Proceedings of the 29th International Flairs Conference.Google Scholar
- Bharat Richhariya and Muhammad Tanveer. 2018. A robust fuzzy least squares twin support vector machine for class imbalance learning. Applied Soft Computing 71 (2018), 418--432. DOI:https://doi.org/10.1016/j.asoc.2018.07.003Google Scholar
Cross Ref
- Deboleena Sadhukhan, Saurabh Pal, and Madhuchhanda Mitra. 2018. Automated identification of myocardial infarction using harmonic phase distribution pattern of ECG data. IEEE Transactions on Instrumentation and Measurement 67, 10 (2018), 2303--2313. DOI:https://doi.org/10.1109/TIM.2018.2816458Google Scholar
Cross Ref
- Roghayeh Soleymani, Eric Granger, and Giorgio Fumera. 2018. Progressive boosting for class imbalance and its application to face re-identification. Expert Systems with Applications 101 (2018), 271--291. DOI:https://doi.org/10.1016/j.eswa.2018.01.023Google Scholar
Digital Library
- Yanmin Sun, Mohamed S. Kamel, and Yang Wang. 2006. Boosting for learning multiple classes with imbalanced class distribution. In Proceedings of the 6th International Conference on Data Mining (ICDM’06). IEEE, Los Alamitos, CA, 592--602. DOI:https://doi.org/10.1109/ICDM.2006.29Google Scholar
Digital Library
- Seba Susan and Amitesh Kumar. 2019. SSOMaj-SMOTE-SSOMin: Three-step intelligent pruning of majority and minority samples for learning from imbalanced datasets. Applied Soft Computing 78 (2019), 141--149. DOI:https://doi.org/10.1016/j.asoc.2019.02.028Google Scholar
Digital Library
- Yonglong Tian, Guang-He Lee, Hao He, Chen-Yu Hsu, and Dina Katabi. 2018. RF-based fall monitoring using convolutional neural networks. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 3 (2018), 1--24. DOI:https://doi.org/10.1145/3264947Google Scholar
Digital Library
- Ivan Tomek. 1976. Two modifications of CNN. IEEE Transactions on Systems, Man, and Cybernetics 6 (1976), 769--772. DOI:https://doi.org/10.1109/TSMC.1976.4309452Google Scholar
- Ivan Tomek. 2007. An experiment with the edited nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics SMC-6, 6 (2007), 448--452. DOI:https://doi.org/10.1109/TSMC.1976.4309523Google Scholar
- Chih-Fong Tsai, Wei-Chao Lin, Ya-Han Hu, and Guan-Ting Yao. 2019. Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Information Sciences 477 (2019), 47--54. DOI:https://doi.org/10.1016/j.ins.2018.10.029Google Scholar
Cross Ref
- Yuandou Wang, Hang Liu, Wanbo Zheng, Yunni Xia, Yawen Li, Peng Chen, Kunyin Guo, and Hong Xie. 2019. Multi-objective workflow scheduling with deep-Q-network-based multi-agent reinforcement learning. IEEE Access 7 (2019), 39974--39982. DOI:https://doi.org/10.1109/ACCESS.2019.2902846Google Scholar
Cross Ref
- Wanqing Wu, Sandeep Pirbhulal, Arun Kumar Sangaiah, Subhas Chandra Mukhopadhyay, and Guanglin Li. 2018. Optimization of signal quality over comfortability of textile electrodes for ECG monitoring in fog computing based medical applications. Future Generation Computer Systems 86 (2018), 515--526. DOI:https://doi.org/10.1016/j.future.2018.04.024Google Scholar
Digital Library
- Wanqing Wu, Sandeep Pirbhulal, Heye Zhang, and Subhas Chandra Mukhopadhyay. 2018. Quantitative assessment for self-tracking of acute stress based on triangulation principle in a wearable sensor system. IEEE Journal of Biomedical and Health Informatics 23, 2 (2018), 703--713. DOI:https://doi.org/10.1109/jbhi.2018.2832069Google Scholar
Cross Ref
- Wanqing Wu, Heye Zhang, Sandeep Pirbhulal, Subhas Chandra Mukhopadhyay, and Yuan-Ting Zhang. 2015. Assessment of biofeedback training for emotion management through wearable textile physiological monitoring system. IEEE Sensors Journal 15, 12 (2015), 7087--7095. DOI:https://doi.org/10.1109/JSEN.2015.2470638Google Scholar
Cross Ref
- Yawen Xiao, Jun Wu, Zongli Lin, and Xiaodong Zhao. 2018. A deep learning-based multi-model ensemble method for cancer prediction. Computer Methods and Programs in Biomedicine 153 (2018), 1--9. DOI:https://doi.org/10.1016/j.cmpb.2017.09.005Google Scholar
Cross Ref
- Yilin Yan, Min Chen, Mei-Ling Shyu, and Shu-Ching Chen. 2015. Deep learning for imbalanced multimedia data classification. In Proceedings of the 2015 IEEE International Symposium on Multimedia (ISM’15). IEEE, Los Alamito, CA, 483--488. DOI:https://doi.org/10.1109/ISM.2015.126Google Scholar
Cross Ref
- Show-Jane Yen and Yue-Shi Lee. 2009. Cluster-based under-sampling approaches for imbalanced data distributions. Expert Systems with Applications 36, 3 (2009), 5718--5727. DOI:https://doi.org/10.1016/j.eswa.2008.06.108Google Scholar
Digital Library
- Hualong Yu, Sun Changyin, Xibei Yang, Shang Zheng, and Haitao Zou. 2019. Fuzzy support vector machine with relative density information for classifying imbalanced data. IEEE Transactions on Fuzzy Systems 27, 12 (2019), 2353--2367. DOI:https://doi.org/10.1109/TFUZZ.2019.2898371Google Scholar
Digital Library
- Chong Zhang, Kay Chen Tan, Haizhou Li, and Geok Soon Hong. 2019. A cost-sensitive deep belief network for imbalanced classification. IEEE Transactions on Neural Networks and Learning Systems 30, 1 (2019), 109--122. DOI:https://doi.org/10.1109/TNNLS.2018.2832648Google Scholar
Cross Ref
- Qingchen Zhang, Laurence T. Yang, Zhikui Chen, and Peng Li. 2018. Dependable deep computation model for feature learning on big data in cyber-physical systems. ACM Transactions on Cyber-Physical Systems 3, 1 (2018), 11. DOI:https://doi.org/10.1145/3110218Google Scholar
Digital Library
- Xiaoxuan Zhang, Tianbao Yang, and Padmini Srinivasan. 2016. Online asymmetric active learning with imbalanced data. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 2055--2064. DOI:https://doi.org/10.1145/2939672.2939854Google Scholar
Digital Library
- Yifan Zhang, Peilin Zhao, Jiezhang Cao, Wenye Ma, Junzhou Huang, Qingyao Wu, and Mingkui Tan. 2018. Online adaptive asymmetric active learning for budgeted imbalanced data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 2768--2777. DOI:https://doi.org/10.1145/3219819.3219948Google Scholar
Digital Library
- Zhedong Zheng, Liang Zheng, and Yi Yang. 2016. A discriminatively learned CNN embedding for person re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 1 (2016), 1--20. DOI:https://doi.org/10.1145/3159171Google Scholar
Digital Library
Index Terms
Active Balancing Mechanism for Imbalanced Medical Data in Deep Learning–Based Classification Models
Recommendations
Myocardial infarction detection based on deep neural network on imbalanced data
AbstractMyocardial infarction (MI) is an acute interruption of blood flow to the heart, which causes the heart to suffer from a deficiency of blood and ischemia, so the heart muscle is damaged, and cells can die and lose their function. Despite the low ...
An Active Under-Sampling Approach for Imbalanced Data Classification
ISCID '12: Proceedings of the 2012 Fifth International Symposium on Computational Intelligence and Design - Volume 02An active under-sampling approach is proposed for handling the imbalanced problem in this paper. Traditional classifiers usually assume that training examples are evenly distributed among different classes, so they are often biased to the majority class ...
Over-sampling via under-sampling in strongly imbalanced data
Classification of imbalanced datasets is an important challenge in machine learning. This investigation analysed the effect of ratio imbalance and the selected classifier on the application of several re-sampling strategies to deal with imbalanced ...






Comments