Abstract
Training gene expression data with supervised learning approaches can provide an alarm sign for early treatment of lung cancer to decrease death rates. However, the samples of gene features involve lots of noises in a realistic environment. In this study, we present a random forest with self-paced learning bootstrap for improvement of lung cancer classification and prognosis based on gene expression data. To be specific, we propose an ensemble learning with random forest approach to improving the model classification performance by selecting multi-classifiers. Then, we investigate the sampling strategy by gradually embedding from high- to low-quality samples by self-paced learning. The experimental results based on five public lung cancer datasets show that our proposed method could select significant genes exactly, which improves classification performance compared to that of existing approaches. We believe that our proposed method has the potential to assist doctors in gene selections and lung cancer prognosis.
- Rebecca L. Siegel, Kimberly D. Miller, and Ahmedin Jemal. 2018. Cancer statistics, 2018. CA: A Cancer Journal for Clinicians 68, 1 (2018), 7--30.Google Scholar
Cross Ref
- Lindsey A. Torre, Rebecca L. Siegel, and Ahmedin Jemal. 2016. Lung Cancer Statistics. Springer International Publishing.Google Scholar
- Howard Lee and Yi Ping Phoebe Chen. 2015. Image based computer aided diagnosis system for cancer detection. Expert Systems with Applications 42, 12 (2015), 5356--5365.Google Scholar
Digital Library
- Azian Azamimi Abdullah and Syamimi Mardiah Shaharum. 2012. Lung cancer cell classification method using artificial neural network. Information Engineering Letters 2, 1 (2012), 49--59.Google Scholar
- Z. Cai, D. Xu, Q. Zhang, J. Zhang, S. M. Ngai, and J. Shao. 2015. Classification of lung cancer using ensemble-based feature selection and machine learning methods. Molecular Biosystems 11, 3 (2015), 791--800.Google Scholar
Cross Ref
- Maciej Zięba, Jakub M. Tomczak, Marek Lubicz, and Jerzy Świątek. 2014. Boosted SVM for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients. Applied Soft Computing 14 (2014), 99--108.Google Scholar
Digital Library
- Golrokh Mirzaei, Anahita Adeli, and Hojjat Adeli. 2016. Imaging and machine learning techniques for diagnosis of Alzheimer’s disease. Reviews in the Neurosciences 27, 8 (2016), 857--870.Google Scholar
Cross Ref
- Aboul Ella Hassanien, Hossam M. Moftah, Ahmad Taher Azar, and Mahmoud Shoman. 2014. MRI breast cancer diagnosis hybrid approach using adaptive ant-based segmentation and multilayer perceptron neural networks classifier. Applied Soft Computing Journal 14, 1 (2014), 62--71.Google Scholar
Digital Library
- Qingyong Wang, Liang-Yong Xia, Hua Chai, and Yun Zhou. 2018. Semi-supervised learning with ensemble self-training for cancer classification. In Proceedings of 2018 IEEE SmartWorld, Ubiquitous Intelligence and Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI’18). IEEE, Los Alamitos, CA, 796--803.Google Scholar
- M. Pawan Kumar, Benjamin Packer, and Daphne Koller. 2010. Self-paced learning for latent variable models. In Advances in Neural Information Processing Systems. 1189--1197.Google Scholar
- Changsheng Li, Junchi Yan, Fan Wei, Weishan Dong, Qingshan Liu, and Hongyuan Zha. 2017. Self-paced multi-task learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI’17). 2175--2181.Google Scholar
- Ye Tang, Yu Bin Yang, and Yang Gao. 2012. Self-paced dictionary learning for image classification. In Proceedings of the ACM International Conference on Multimedia. 833--836.Google Scholar
Digital Library
- Liang-Yong Xia, Qing-Yong Wang, Zehong Cao, and Yong Liang. 2019. Descriptor selection improvements for quantitative structure-activity relationships. International Journal of Neural Systems 29, 9 (2019), 1--16.Google Scholar
Cross Ref
- Min Wei Huang, Chih Wen Chen, Wei Chao Lin, Shih Wen Ke, and Chih Fong Tsai. 2017. SVM and SVM ensembles in breast cancer prediction. PLoS One 12, 1 (2017), e0161501.Google Scholar
Cross Ref
- D. L. Langer, T. H. van der Kwast, A. J. Evans, J. Trachtenberg, B. C. Wilson, and M. A. Haider. 2010. Prostate cancer detection with multi-parametric MRI: Logistic regression analysis of quantitative T2, diffusion-weighted imaging, and dynamic contrast-enhanced MRI. Journal of Magnetic Resonance Imaging 30, 2 (2010), 327--334.Google Scholar
Cross Ref
- Zehong Cao, Weiping Ding, Yu-Kai Wang, Farookh Khadeer Hussain, Adel Al-Jumaily, and Chin-Teng Lin. 2019. Effects of repetitive SSVEPs on EEG complexity using multiscale inherent fuzzy entropy. Neurocomputing. In press.Google Scholar
- Dursun Delen. 2010. Analysis of cancer data: A data mining approach. Expert Systems 26, 1 (2010), 100--112.Google Scholar
Cross Ref
- Dursun Delen, Glenn Walker, and Amit Kadam. 2005. Predicting breast cancer survivability: A comparison of three data mining methods. Artificial Intelligence in Medicine 34, 2 (2005), 113--127.Google Scholar
Digital Library
- Zhi-Hua Zhou. 2012. Ensemble Methods: Foundations and Algorithms. CRC Press, Boca Raton, FL.Google Scholar
Cross Ref
- Dustin Baumgartner and Gursel Serpen. 2013. Performance of global--local hybrid ensemble versus boosting and bagging ensembles. International Journal of Machine Learning and Cybernetics 4, 4 (2013), 301--317.Google Scholar
Cross Ref
- Dieu Tien Bui, Tien-Chung Ho, Biswajeet Pradhan, Binh-Thai Pham, Viet-Ha Nhu, and Inge Revhaug. 2016. GIS-based modeling of rainfall-induced landslides using data mining-based functional trees classifier with AdaBoost, Bagging, and MultiBoost ensemble frameworks. Environmental Earth Sciences 75, 14 (2016), 1101.Google Scholar
Cross Ref
- V. F. Rodriguez-Galiano, M. Chica-Olmo, F. Abarca-Hernandez, Peter M. Atkinson, and C. Jeganathan. 2012. Random forest classification of Mediterranean land cover using multi-seasonal imagery and multi-seasonal texture. Remote Sensing of Environment 121 (2012), 93--107.Google Scholar
Cross Ref
- Juan José Rodriguez, Ludmila I. Kuncheva, and Carlos J. Alonso. 2006. Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 10 (2006), 1619--1630.Google Scholar
Digital Library
- R. Diaz-Uriarte and S. Alvarez de Andres. 2006. Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 1 (2006), 1--13.Google Scholar
Cross Ref
- Mohammed Khalilia, Sounak Chakraborty, and Mihail Popescu. 2011. Predicting disease risks from highly imbalanced data using random forest. BMC Medical Informatics and Decision Making 11, 1 (2011), 51.Google Scholar
Cross Ref
- Fan Ma, Deyu Meng, Qi Xie, Zina Li, and Xuanyi Dong. 2017. Self-paced co-training. In Proceedings of the International Conference on Machine Learning. 2275--2284.Google Scholar
- Dingwen Zhang, Junwei Han, Long Zhao, and Deyu Meng. 2018. Leveraging prior-knowledge for weakly supervised object detection under a collaborative self-paced curriculum learning framework. International Journal of Computer Vision 127, 4 (2018), 363--380.Google Scholar
Digital Library
- Deyu Meng, Qian Zhao, and Lu Jiang. 2017. A theoretical understanding of self-paced learning. Information Sciences 414 (2017), 319--328.Google Scholar
Cross Ref
- Jesper Holst Pedersen, Witold Rzyman, Giulia Veronesi, Thomas A. D’Amico, Paul Van Schil, Laureano Molins, Gilbert Massard, and Gaetano Rocco. 2017. Recommendations from the European Society of Thoracic Surgeons (ESTS) regarding computed tomography screening for lung cancer in Europe. European Journal of Cardio-Thoracic Surgery 51, 3 (2017), 411--420. DOI:http://dx.doi.org/10.1093/ejcts/ezw418Google Scholar
- Rahul Paul, Samuel H. Hawkins, Matthew B. Schabath, Robert J. Gillies, Lawrence O. Hall, and Dmitry B. Goldgof. 2018. Predicting malignant nodules by fusing deep features with classical radiomics features. Journal of Medical Imaging 5, 1 (2018), 011021.Google Scholar
Cross Ref
- Ahmed Hosny, Chintan Parmar, Thibaud P. Coroller, Patrick Grossmann, Roman Zeleznik, Avnish Kumar, Johan Bussink, Robert J. Gillies, Raymond H. Mak, and Hugo J. W. L. Aerts. 2018. Deep learning for lung cancer prognostication: A retrospective multi-cohort radiomics study. PLoS Medicine 15, 11 (2018), e1002711.Google Scholar
Cross Ref
- Rahul Paul, Samuel H. Hawkins, Yoganand Balagurunathan, Matthew B. Schabath, Robert J. Gillies, Lawrence O. Hall, and Dmitry B. Goldgof. 2016. Deep feature transfer learning in combination with traditional features predicts survival among patients with lung adenocarcinoma. Tomography 2, 4 (2016), 388.Google Scholar
Cross Ref
- Ravi K. Samala, Heang-Ping Chan, Lubomir Hadjiiski, Mark A. Helvie, Caleb Richter, and Kenny Cha. 2018. Cross-domain and multi-task transfer learning of deep convolutional neural network for breast cancer diagnosis in digital breast tomosynthesis. In Medical Imaging 2018: Computer-Aided Diagnosis, Vol. 10575. International Society for Optics and Photonics, Bellingham, WA, 105750Q.Google Scholar
- Te Pi, Xi Li, Zhongfei Zhang, Deyu Meng, Fei Wu, Jun Xiao, and Yueting Zhuang. 2016. Self-paced boost learning for classification. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’16). 1932--1938.Google Scholar
- Lu Jiang, Deyu Meng, Qian Zhao, Shiguang Shan, and Alexander G. Hauptmann. 2015. Self-paced curriculum learning. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI’15), Vol. 2. 6.Google Scholar
- Bin Liu, Ren Long, and Kuo-Chen Chou. 2016. iDHS-EL: Identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework. Bioinformatics 32, 16 (2016), 2411--2418.Google Scholar
Cross Ref
- Jean Yves Audibert. 2004. Bagging predictors. Annales de Linstitut Henri Poincare Probability and Statistics 40, 6 (2004), 685--736.Google Scholar
Cross Ref
- L. Breiman. 2001. Random forests. Machine Learning 45, 1 (2001), 5--32.Google Scholar
Digital Library
- Zehong Cao, Chin-Teng Lin, Kuan-Lin Lai, Li-Wei Ko, Jung-Tai King, Kwong-Kum Liao, Jong-Ling Fuh, and Shuu-Jiun Wang. 2020. Extraction of SSVEPs-based inherent fuzzy entropy using a wearable headband EEG in migraine patients. IEEE Transactions on Fuzzy Systems 28, 1 (2020), 14--27.Google Scholar
Digital Library
- M Friedman. 1939. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Publications of the American Statistical Association 32, 200 (1939), 675--701.Google Scholar
Cross Ref
- Zehong Cao and Chin-Teng Lin. 2017. Inherent fuzzy entropy for the improvement of EEG complexity evaluation. IEEE Transactions on Fuzzy Systems 26, 2 (2017), 1032--1035.Google Scholar
Cross Ref
- Zehong Cao, Chun-Hsiang Chuang, Jung-Kai King, and Chin-Teng Lin. 2019. Multi-channel EEG recordings during a sustained-attention driving task. Scientific Data 6 (2019), Article 19.Google Scholar
Index Terms
Random Forest with Self-Paced Bootstrap Learning in Lung Cancer Prognosis
Recommendations
Feature Extraction and Analysis for Lung Nodule Classification using Random Forest
ICSIE '19: Proceedings of the 8th International Conference on Software and Information EngineeringEarly detection of lung nodule decreases the risk of advanced stages in lung cancer disease. Random forest (RF), a machine learning classifier, is used to detect the lung nodules and classify soft-tissues into nodules and non-nodules. A lung nodule ...
The Effect of Artificial Neural Network Model Combined with Six Tumor Markers in Auxiliary Diagnosis of Lung Cancer
To evaluate the diagnosis potential of artificial neural network (ANN) model combined with six tumor markers in auxiliary diagnosis of lung cancer, to differentiate lung cancer from lung benign disease, normal control, and gastrointestinal cancers. ...
Data mining in lung cancer pathologic staging diagnosis
We utilised data mining techniques in cancer staging diagnosis.We found the correlation between pathology report and clinical information.Many interesting rules have been generated and evaluated.The evaluation results demonstrated the availability of ...






Comments