skip to main content
research-article

Random Forest with Self-Paced Bootstrap Learning in Lung Cancer Prognosis

Published:17 April 2020Publication History
Skip Abstract Section

Abstract

Training gene expression data with supervised learning approaches can provide an alarm sign for early treatment of lung cancer to decrease death rates. However, the samples of gene features involve lots of noises in a realistic environment. In this study, we present a random forest with self-paced learning bootstrap for improvement of lung cancer classification and prognosis based on gene expression data. To be specific, we propose an ensemble learning with random forest approach to improving the model classification performance by selecting multi-classifiers. Then, we investigate the sampling strategy by gradually embedding from high- to low-quality samples by self-paced learning. The experimental results based on five public lung cancer datasets show that our proposed method could select significant genes exactly, which improves classification performance compared to that of existing approaches. We believe that our proposed method has the potential to assist doctors in gene selections and lung cancer prognosis.

References

  1. Rebecca L. Siegel, Kimberly D. Miller, and Ahmedin Jemal. 2018. Cancer statistics, 2018. CA: A Cancer Journal for Clinicians 68, 1 (2018), 7--30.Google ScholarGoogle ScholarCross RefCross Ref
  2. Lindsey A. Torre, Rebecca L. Siegel, and Ahmedin Jemal. 2016. Lung Cancer Statistics. Springer International Publishing.Google ScholarGoogle Scholar
  3. Howard Lee and Yi Ping Phoebe Chen. 2015. Image based computer aided diagnosis system for cancer detection. Expert Systems with Applications 42, 12 (2015), 5356--5365.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Azian Azamimi Abdullah and Syamimi Mardiah Shaharum. 2012. Lung cancer cell classification method using artificial neural network. Information Engineering Letters 2, 1 (2012), 49--59.Google ScholarGoogle Scholar
  5. Z. Cai, D. Xu, Q. Zhang, J. Zhang, S. M. Ngai, and J. Shao. 2015. Classification of lung cancer using ensemble-based feature selection and machine learning methods. Molecular Biosystems 11, 3 (2015), 791--800.Google ScholarGoogle ScholarCross RefCross Ref
  6. Maciej Zięba, Jakub M. Tomczak, Marek Lubicz, and Jerzy Świątek. 2014. Boosted SVM for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients. Applied Soft Computing 14 (2014), 99--108.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Golrokh Mirzaei, Anahita Adeli, and Hojjat Adeli. 2016. Imaging and machine learning techniques for diagnosis of Alzheimer’s disease. Reviews in the Neurosciences 27, 8 (2016), 857--870.Google ScholarGoogle ScholarCross RefCross Ref
  8. Aboul Ella Hassanien, Hossam M. Moftah, Ahmad Taher Azar, and Mahmoud Shoman. 2014. MRI breast cancer diagnosis hybrid approach using adaptive ant-based segmentation and multilayer perceptron neural networks classifier. Applied Soft Computing Journal 14, 1 (2014), 62--71.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Qingyong Wang, Liang-Yong Xia, Hua Chai, and Yun Zhou. 2018. Semi-supervised learning with ensemble self-training for cancer classification. In Proceedings of 2018 IEEE SmartWorld, Ubiquitous Intelligence and Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI’18). IEEE, Los Alamitos, CA, 796--803.Google ScholarGoogle Scholar
  10. M. Pawan Kumar, Benjamin Packer, and Daphne Koller. 2010. Self-paced learning for latent variable models. In Advances in Neural Information Processing Systems. 1189--1197.Google ScholarGoogle Scholar
  11. Changsheng Li, Junchi Yan, Fan Wei, Weishan Dong, Qingshan Liu, and Hongyuan Zha. 2017. Self-paced multi-task learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI’17). 2175--2181.Google ScholarGoogle Scholar
  12. Ye Tang, Yu Bin Yang, and Yang Gao. 2012. Self-paced dictionary learning for image classification. In Proceedings of the ACM International Conference on Multimedia. 833--836.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Liang-Yong Xia, Qing-Yong Wang, Zehong Cao, and Yong Liang. 2019. Descriptor selection improvements for quantitative structure-activity relationships. International Journal of Neural Systems 29, 9 (2019), 1--16.Google ScholarGoogle ScholarCross RefCross Ref
  14. Min Wei Huang, Chih Wen Chen, Wei Chao Lin, Shih Wen Ke, and Chih Fong Tsai. 2017. SVM and SVM ensembles in breast cancer prediction. PLoS One 12, 1 (2017), e0161501.Google ScholarGoogle ScholarCross RefCross Ref
  15. D. L. Langer, T. H. van der Kwast, A. J. Evans, J. Trachtenberg, B. C. Wilson, and M. A. Haider. 2010. Prostate cancer detection with multi-parametric MRI: Logistic regression analysis of quantitative T2, diffusion-weighted imaging, and dynamic contrast-enhanced MRI. Journal of Magnetic Resonance Imaging 30, 2 (2010), 327--334.Google ScholarGoogle ScholarCross RefCross Ref
  16. Zehong Cao, Weiping Ding, Yu-Kai Wang, Farookh Khadeer Hussain, Adel Al-Jumaily, and Chin-Teng Lin. 2019. Effects of repetitive SSVEPs on EEG complexity using multiscale inherent fuzzy entropy. Neurocomputing. In press.Google ScholarGoogle Scholar
  17. Dursun Delen. 2010. Analysis of cancer data: A data mining approach. Expert Systems 26, 1 (2010), 100--112.Google ScholarGoogle ScholarCross RefCross Ref
  18. Dursun Delen, Glenn Walker, and Amit Kadam. 2005. Predicting breast cancer survivability: A comparison of three data mining methods. Artificial Intelligence in Medicine 34, 2 (2005), 113--127.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Zhi-Hua Zhou. 2012. Ensemble Methods: Foundations and Algorithms. CRC Press, Boca Raton, FL.Google ScholarGoogle ScholarCross RefCross Ref
  20. Dustin Baumgartner and Gursel Serpen. 2013. Performance of global--local hybrid ensemble versus boosting and bagging ensembles. International Journal of Machine Learning and Cybernetics 4, 4 (2013), 301--317.Google ScholarGoogle ScholarCross RefCross Ref
  21. Dieu Tien Bui, Tien-Chung Ho, Biswajeet Pradhan, Binh-Thai Pham, Viet-Ha Nhu, and Inge Revhaug. 2016. GIS-based modeling of rainfall-induced landslides using data mining-based functional trees classifier with AdaBoost, Bagging, and MultiBoost ensemble frameworks. Environmental Earth Sciences 75, 14 (2016), 1101.Google ScholarGoogle ScholarCross RefCross Ref
  22. V. F. Rodriguez-Galiano, M. Chica-Olmo, F. Abarca-Hernandez, Peter M. Atkinson, and C. Jeganathan. 2012. Random forest classification of Mediterranean land cover using multi-seasonal imagery and multi-seasonal texture. Remote Sensing of Environment 121 (2012), 93--107.Google ScholarGoogle ScholarCross RefCross Ref
  23. Juan José Rodriguez, Ludmila I. Kuncheva, and Carlos J. Alonso. 2006. Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 10 (2006), 1619--1630.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Diaz-Uriarte and S. Alvarez de Andres. 2006. Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 1 (2006), 1--13.Google ScholarGoogle ScholarCross RefCross Ref
  25. Mohammed Khalilia, Sounak Chakraborty, and Mihail Popescu. 2011. Predicting disease risks from highly imbalanced data using random forest. BMC Medical Informatics and Decision Making 11, 1 (2011), 51.Google ScholarGoogle ScholarCross RefCross Ref
  26. Fan Ma, Deyu Meng, Qi Xie, Zina Li, and Xuanyi Dong. 2017. Self-paced co-training. In Proceedings of the International Conference on Machine Learning. 2275--2284.Google ScholarGoogle Scholar
  27. Dingwen Zhang, Junwei Han, Long Zhao, and Deyu Meng. 2018. Leveraging prior-knowledge for weakly supervised object detection under a collaborative self-paced curriculum learning framework. International Journal of Computer Vision 127, 4 (2018), 363--380.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Deyu Meng, Qian Zhao, and Lu Jiang. 2017. A theoretical understanding of self-paced learning. Information Sciences 414 (2017), 319--328.Google ScholarGoogle ScholarCross RefCross Ref
  29. Jesper Holst Pedersen, Witold Rzyman, Giulia Veronesi, Thomas A. D’Amico, Paul Van Schil, Laureano Molins, Gilbert Massard, and Gaetano Rocco. 2017. Recommendations from the European Society of Thoracic Surgeons (ESTS) regarding computed tomography screening for lung cancer in Europe. European Journal of Cardio-Thoracic Surgery 51, 3 (2017), 411--420. DOI:http://dx.doi.org/10.1093/ejcts/ezw418Google ScholarGoogle Scholar
  30. Rahul Paul, Samuel H. Hawkins, Matthew B. Schabath, Robert J. Gillies, Lawrence O. Hall, and Dmitry B. Goldgof. 2018. Predicting malignant nodules by fusing deep features with classical radiomics features. Journal of Medical Imaging 5, 1 (2018), 011021.Google ScholarGoogle ScholarCross RefCross Ref
  31. Ahmed Hosny, Chintan Parmar, Thibaud P. Coroller, Patrick Grossmann, Roman Zeleznik, Avnish Kumar, Johan Bussink, Robert J. Gillies, Raymond H. Mak, and Hugo J. W. L. Aerts. 2018. Deep learning for lung cancer prognostication: A retrospective multi-cohort radiomics study. PLoS Medicine 15, 11 (2018), e1002711.Google ScholarGoogle ScholarCross RefCross Ref
  32. Rahul Paul, Samuel H. Hawkins, Yoganand Balagurunathan, Matthew B. Schabath, Robert J. Gillies, Lawrence O. Hall, and Dmitry B. Goldgof. 2016. Deep feature transfer learning in combination with traditional features predicts survival among patients with lung adenocarcinoma. Tomography 2, 4 (2016), 388.Google ScholarGoogle ScholarCross RefCross Ref
  33. Ravi K. Samala, Heang-Ping Chan, Lubomir Hadjiiski, Mark A. Helvie, Caleb Richter, and Kenny Cha. 2018. Cross-domain and multi-task transfer learning of deep convolutional neural network for breast cancer diagnosis in digital breast tomosynthesis. In Medical Imaging 2018: Computer-Aided Diagnosis, Vol. 10575. International Society for Optics and Photonics, Bellingham, WA, 105750Q.Google ScholarGoogle Scholar
  34. Te Pi, Xi Li, Zhongfei Zhang, Deyu Meng, Fei Wu, Jun Xiao, and Yueting Zhuang. 2016. Self-paced boost learning for classification. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’16). 1932--1938.Google ScholarGoogle Scholar
  35. Lu Jiang, Deyu Meng, Qian Zhao, Shiguang Shan, and Alexander G. Hauptmann. 2015. Self-paced curriculum learning. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI’15), Vol. 2. 6.Google ScholarGoogle Scholar
  36. Bin Liu, Ren Long, and Kuo-Chen Chou. 2016. iDHS-EL: Identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework. Bioinformatics 32, 16 (2016), 2411--2418.Google ScholarGoogle ScholarCross RefCross Ref
  37. Jean Yves Audibert. 2004. Bagging predictors. Annales de Linstitut Henri Poincare Probability and Statistics 40, 6 (2004), 685--736.Google ScholarGoogle ScholarCross RefCross Ref
  38. L. Breiman. 2001. Random forests. Machine Learning 45, 1 (2001), 5--32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Zehong Cao, Chin-Teng Lin, Kuan-Lin Lai, Li-Wei Ko, Jung-Tai King, Kwong-Kum Liao, Jong-Ling Fuh, and Shuu-Jiun Wang. 2020. Extraction of SSVEPs-based inherent fuzzy entropy using a wearable headband EEG in migraine patients. IEEE Transactions on Fuzzy Systems 28, 1 (2020), 14--27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M Friedman. 1939. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Publications of the American Statistical Association 32, 200 (1939), 675--701.Google ScholarGoogle ScholarCross RefCross Ref
  41. Zehong Cao and Chin-Teng Lin. 2017. Inherent fuzzy entropy for the improvement of EEG complexity evaluation. IEEE Transactions on Fuzzy Systems 26, 2 (2017), 1032--1035.Google ScholarGoogle ScholarCross RefCross Ref
  42. Zehong Cao, Chun-Hsiang Chuang, Jung-Kai King, and Chin-Teng Lin. 2019. Multi-channel EEG recordings during a sustained-attention driving task. Scientific Data 6 (2019), Article 19.Google ScholarGoogle Scholar

Index Terms

  1. Random Forest with Self-Paced Bootstrap Learning in Lung Cancer Prognosis

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 16, Issue 1s
        Special Issue on Multimodal Machine Learning for Human Behavior Analysis and Special Issue on Computational Intelligence for Biomedical Data and Imaging
        January 2020
        376 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3388236
        Issue’s Table of Contents

        Copyright © 2020 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 April 2020
        • Accepted: 1 July 2019
        • Revised: 1 June 2019
        • Received: 1 February 2019
        Published in tomm Volume 16, Issue 1s

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!