skip to main content
10.1145/3200947.3201029acmotherconferencesArticle/Chapter ViewAbstractPublication PagessetnConference Proceedingsconference-collections
research-article

Pathway analysis using XGBoost classification in Biomedical Data

Authors Info & Claims
Published:09 July 2018Publication History

ABSTRACT

Given the fact that our biological existence is rooted in a complex system within our cells with thousands of interactions among genes and metabolites, research community in biological and medical fields have shifted their interest to network-based approaches. This complexity is imprinted in networks which encode the relationships among system's components. This evolution has also led to the generation a new research fields, the Network Medicine, a combination of Network Science and Systems Biology applied to human diseases. Meanwhile, cutting-edge approaches towards this direction are the subpathway-based methods, identifying "active subpathways" - in the form of local sub-structures within pathways - related with a case under study.

Based on this, we propose a classification scheme based on XGBoost, a recent tree-based classification algorithm, in order to detect the most discriminative pathways related with a disease. Subsequently, we extract subpathways and rank them with regard to their ability to correctly classify samples from different experimental conditions. Our method is demonstrated on an aging gene expression dataset providing evidences that XGBoost outperforms other well-known classification methods in biological data, while results provided by our method include several established as well as recently reported longevity-associated pathways.

References

  1. C. Mitsopoulos, A. C. Schierz, P. Workman, and B. Al-Lazikani. 2015. Distinctive behaviors of druggable proteins in cellular networks. PLoS computational biology, 11(12), e1004597.Google ScholarGoogle Scholar
  2. M. A. García-Campos, J. Espinal-Enríquez, and E. Hernández-Lemus. 2015. Pathway analysis: state of the art. Frontiers in physiology, 6, 383.Google ScholarGoogle Scholar
  3. P. Khatri, M. Sirota, and A. J. Butte. 2012. Ten years of pathway analysis: current approaches and outstanding challenges. PLoS computational biology, 8(2), e1002375.Google ScholarGoogle Scholar
  4. C. Mitrea, Z. Taghavi, B. Bokanizad, S. Hanoudi, R. Tagett, M. Donato, C. Voichrţa, and S. Drăghici. 2013. Methods and approaches in the topology-based analysis of biological pathways. Frontiers in physiology, 4, 278.Google ScholarGoogle Scholar
  5. A. L. Barabási. 2016. Network science. Cambridge university press.Google ScholarGoogle Scholar
  6. J. Loscalzo (Ed.). 2017. Network Medicine. Harvard University Press.Google ScholarGoogle Scholar
  7. A. L. Barabasi, N. Gulbahce, and J. Loscalzo. 2011. Network medicine: a network-based approach to human disease. Nature reviews genetics, 12(1), 56.Google ScholarGoogle Scholar
  8. M. Caldera, P. Buphamalai, F. Müller, and J. Menche. 2017. Interactome-Based Approaches to Human Disease. Current Opinion in Systems Biology.Google ScholarGoogle Scholar
  9. S. Chen, C. Li, B. Wu, C. Zhang, C. Liu, and E. Li. 2014. Identification of differentially expressed genes and their subpathways in recurrent versus primary bone giant cell tumors, Int. J. Oncol., 45, 3, 1133--1142.Google ScholarGoogle ScholarCross RefCross Ref
  10. S. Nam, H. R. Chang, K. T. Kim, M. C. Kook, D., Hong, C., Kwon, ... and T. Park. 2014. PATHOME: an algorithm for accurately detecting differentially expressed subpathways. Oncogene, 33(41), 4941.Google ScholarGoogle ScholarCross RefCross Ref
  11. A. G. Vrahatis, K. Dimitrakopoulou, P. Balomenos, A. K. Tsakalidis, and A. Bezerianos. 2015. CHRONOS: A time-varying method for microRNA-mediated subpathway enrichment analysis, Bioinformatics, 32, 6, 884--892.Google ScholarGoogle ScholarCross RefCross Ref
  12. L. Feng, Y. Xu, Y. Zhang, Z. Sun, J. Han, ... and X. Li. 2015. Subpathway-GMir: Identifying miRNA-mediated metabolic subpathways by integrating condition-specific genes, microRNAs, and pathway topologies, Oncotarget, 6, 36.Google ScholarGoogle ScholarCross RefCross Ref
  13. A. G. Vrahatis, P. Balomenos, A. K. Tsakalidis, and A. Bezerianos. 2016. DEsubs: an R package for flexible identification of differentially expressed subpathways using RNA-seq experiments, Bioinformatics, 32, 24, 3844--3846.Google ScholarGoogle ScholarCross RefCross Ref
  14. C. Feng, J. Zhang, X. Li, B. Ai, J. Han, Q. Wang, T. Wei, Y. Xu, M. Li, S. Li, C. Song, and C. Li. 2016. Subpathway-CorSP: Identification of metabolic subpathways via integrating expression correlations and topological features between metabolites and genes of interest within pathways, Sci. Rep., 6.Google ScholarGoogle Scholar
  15. A. G. Vrahatis, K. Dimitrakopoulou, A. Kanavos, S. Sioutas, and A. Tsakalidis. 2017. Detecting Perturbed Subpathways towards Mouse Lung Regeneration Following H1N1 Influenza Infection. Computation, 5(2), 20.Google ScholarGoogle ScholarCross RefCross Ref
  16. Y. Xu, F. Li, T. Wu, Y. Xu, H. Yang, ... and X. Li. 2017. LncSubpathway: a novel approach for identifying dysfunctional subpathways associated with risk lncRNAs by integrating lncRNA and mRNA expression profiles and pathway topologies, Oncotarget, 8, 9, 15453--15469, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  17. Y. Zhu, X. Shen, and W. Pan. 2009. Network-based support vector machine for classification of microarray samples. BMC bioinformatics, 10(1), S21.Google ScholarGoogle Scholar
  18. H. Pang, A. Lin, M. Holford, B. E. Enerson, B. Lu, M. P. Lawton, ... and H. Zhao. 2006. Pathway analysis using random forests classification and regression. Bioinformatics, 22(16), 2028--2036. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. N. Dimitrakopoulos, P. Balomenos, A. G. Vrahatis, K. Sgarbas, and A. Bezerianos. 2016. Identifying disease network perturbations through regression on gene expression and pathway topology analysis," in 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2016, 5969--5972.Google ScholarGoogle Scholar
  20. T. Chen, and C. Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 785--794, ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Li, K. Kang, J. M. Krahn, N. Croutwater, K. Lee, D. M. Umbach, and L. Li. 2017. A comprehensive genomic pan-cancer classification using The Cancer Genome Atlas gene expression data. BMC genomics, 18(1), 508.Google ScholarGoogle Scholar
  22. S. A. Babayan, W. Liu, G. Hamilton, E. Kilbride, E. Rynkiewicz, M. Clerc, and A. B. Pedersen. 2018. The immune and non-immune Pathways That Drive chronic gastrointestinal helminth Burdens in the Wild. Frontiers in Immunology, 9, 56.Google ScholarGoogle ScholarCross RefCross Ref
  23. E. M. Yasser. 2018. CCA based multi-view feature selection for multi-omics data integration. bioRxiv, 243733.Google ScholarGoogle Scholar
  24. S. Melov M. A. Tarnopolsky K. Beckman, K. Felkey, and A. Hubbard. 2007. Resistance exercise reverses aging in human skeletal muscle. PLoS One, 2(5), e465.Google ScholarGoogle ScholarCross RefCross Ref
  25. M. Kanehisa, and S. Goto. 2000. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research, 28(1), 27--30.Google ScholarGoogle ScholarCross RefCross Ref
  26. J. H. Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189--1232.Google ScholarGoogle Scholar
  27. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, ... and J. Vanderplas. 2011. Scikit-learn: Machine learning in Python. Journal of machine learning research, 12(Oct), 2825--2830. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. W. Huang, B. T. Sherman, and R. A. Lempicki. 2008. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature protocols, 4(1), 44--57.Google ScholarGoogle Scholar
  29. Y. L. Miao, K. Kikuchi, Q. Y. Sun, and H. Schatten. 2009. Oocyte aging: cellular and molecular changes, developmental potential and reversal possibility. Human reproduction update, 15(5), 573--585.Google ScholarGoogle Scholar
  30. G. C. Kujoth, A. Hiona, T. D. Pugh, S. Someya, K. Panzer, ... and J. D. Morrow. 2005. Mitochondrial DNA mutations, oxidative stress, and apoptosis in mammalian aging. Science, 309(5733), 481--484.Google ScholarGoogle ScholarCross RefCross Ref
  31. R. V. Khapre, A. A. Kondratova, S. Patel, Y. Dubrovsky, M. Wrobel, M. P. Antoch, and R. V. Kondratov. 2014. BMAL1-dependent regulation of the mTOR signaling pathway delays aging. Aging (Albany NY), 6(1), 48.Google ScholarGoogle ScholarCross RefCross Ref
  32. K. Jia, D. Chen, and D. L. Riddle. 2004. The TOR pathway interacts with the insulin signaling pathway to regulate C. elegans larval development, metabolism and life span. Development, 131(16), 3897--3906.Google ScholarGoogle ScholarCross RefCross Ref
  33. M. Zerofsky, E. Harel, N. Silverman, and M. Tatar. 2005. Aging of the innate immune response in Drosophila melanogaster. Aging cell, 4(2), 103--108.Google ScholarGoogle Scholar
  34. H. Bruunsgaard M. Pedersen, and B. K. Pedersen. 2001. Aging and proinflammatory cytokines. Current Opin. Hematol. 8, 131 -136.Google ScholarGoogle ScholarCross RefCross Ref
  35. T. Dechat, T. Shimi, S. A. Adam, A. E. Rusinol, D. A. Andres, ... and R. D. Goldman. 2007. Alterations in mitosis and cell cycle progression caused by a mutant lamin A known to accelerate human aging. Proceedings of the National Academy of Sciences, 104(12), 4955--4960.Google ScholarGoogle ScholarCross RefCross Ref
  36. O. M. Gutiérrez, M. Mannstadt, T. Isakova, J. A. Rauh-Hain, H. Tamez, ... and M. Wolf. 2008. Fibroblast growth factor 23 and mortality among patients undergoing hemodialysis. New England Journal of Medicine, 359(6), 584--592.Google ScholarGoogle ScholarCross RefCross Ref
  37. M. Villamiel, M. C. Polo, and M. V. Moreno-Arribas. 2008. Nitrogen compounds and polysaccharides changes during the biological ageing of sherry wines. LWT-Food Science and Technology, 41(10), 1842--1846.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

(auto-classified)
  1. Pathway analysis using XGBoost classification in Biomedical Data

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            SETN '18: Proceedings of the 10th Hellenic Conference on Artificial Intelligence
            July 2018
            339 pages
            ISBN:9781450364331
            DOI:10.1145/3200947

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 9 July 2018

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader