ABSTRACT
Motivation: Biological networks unravel the inherent structure of molecular interactions which can lead to discovery of driver genes and meaningful pathways especially in cancer context. Often due to gene mutations, the gene expression undergoes changes and the corresponding gene regulatory network sustains some amount of localized re-wiring. The ability to identify significant changes in the interaction patterns caused by the progression of the disease can lead to the revelation of novel relevant signatures.
Methods: The task of identifying differential sub-networks in paired biological networks (A:control,B:case) can be re-phrased as one of finding dense communities in a single noisy differential topological (DT) graph constructed by taking absolute difference between the topological graphs of A and B. In this paper, we propose a fast three-stage approach, namely Differential Community Detection (DCD), to identify differential sub-networks as differential communities in a de-noised version of the DT graph. In the first stage, we iteratively re-order the nodes of the DT graph to determine approximate block diagonals present in the DT adjacency matrix using neighbourhood information of the nodes and Jaccard similarity. In the second stage, the ordered DT adjacency matrix is traversed along the diagonal to remove all the edges associated with a node, if that node has no immediate edges within a window. Finally, we apply community detection methods on this de-noised DT graph to discover differential sub-networks as communities.
Results: Our proposed DCD approach can effectively locate differential sub-networks in several simulated paired random-geometric networks and various paired scale-free graphs with different power-law exponents. The DCD approach easily outperforms community detection methods applied on the original noisy DT graph and recent statistical techniques in simulation studies. We applied DCD method on two real datasets: a) Ovarian cancer dataset to discover differential DNA co-methylation sub-networks in patients and controls; b) Glioma cancer dataset to discover the difference between the regulatory networks of IDH-mutant and IDH-wild-type. We demonstrate the potential benefits of DCD for finding network-inferred bio-markers/pathways associated with a trait of interest. Conclusion: The proposed DCD approach overcomes the limitations of previous statistical techniques and the issues associated with identifying differential sub-networks by use of community detection methods on the noisy DT graph. This is reflected in the superior performance of the DCD method with respect to various metrics like Precision, Accuracy, Kappa and Specificity. The code implementing proposed DCD method is available at https://sites.google.com/site/raghvendramallmlresearcher/codes.
References
- Ahern, T., Horvath-Puho, E., Spindler, K., Sorensen, H., Ording, A., and Erichsen, R. Colorectal cancer, comorbidity, and risk of venous thromboembolism: assessment of biological interactions in a Danish nationwide cohort. British Journal of Cancer 114, 1 (2016), 96--102.Google Scholar
- Benjamini, Y., and Yekutieli, D. The control of false discovery rate in multiple testing under dependency. Annals of Statistics 29 (2001), 1165--1188.Google Scholar
- Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment 2008, 10 (2008), P10008.Google Scholar
- Boginski, V., Butenko, S., and Pardolas, P. M. Statistical analysis of financial networks. Computational Statistics and Data Analysis 48, 2 (2005), 431--443.Google Scholar
- Brandes, U., and Eriebach, T. Network Analysis: Methodological Foundations. Springer 3418 (2005). Google Scholar
- Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., and Wiener, J. Graph structure in the web. Comput. Netw. 33, 1--6 (2000), 309--320. Google Scholar
- Ceccarelli, M., Barthel, F. P., Malta, T. M., Sabedot, T. S., Salama, S. R., Murray, B. A., Morozova, O., Newton, Y., Radenbaugh, A., Pagnotta, S. M., et al. Molecular Profiling Reveals Biologically Discrete Subsets and Pathways of Progression in Diffuse Glioma. Cell 164, 3 (Feb. 2016), 550--563.Google Scholar
- Ceccarelli, M., Cerulo, L., and Santore, A. De novo reconstruction of gene regulatory networks from time series data, an approach based on formal methods. Methods 69, 3 (Oct 2014), 298--305.Google Scholar
- Dittrich, M. T., Klau, G. W., Rosenwald, A., Dandekar, T., and Müller, T. Identifying functional modules in protein--protein interaction networks: an integrated exact approach. Bioinformatics 24, 13 (2008), i223--i231. Google Scholar
- Erath, A., Löchl, M., and Axhausen, K. Graph-theoretical analysis of the swiss road and railway networks over time. Networks and Spatial Economics 9, 3 (2009), 379--400.Google Scholar
- Ernst, J., and Kellis, M. Chromhmm: automating chromatin-state discovery and characterization. Nature methods 9, 3 (2012), 215--216.Google Scholar
- Falcon, S., and Gentleman, R. Using GOstats to test gene lists for GO term association. Bioinformatics 23, 2 (2007), 257--258. Google Scholar
- Fuller, T., Ghazalpour, A., Aten, J., Drake, T., Lusis, A., and Horvath, S. Weighted Gene Co-expression Network Analysis Strategies Applied to Mouse Weight. Mammilian Genome 18, 6 (2007), 463--472.Google Scholar
- Gill, R., Datta, S., and Datta, S. A statistical framework for differential network analysis from microarrya data. BMC: Bioinformatics 11, 1 (2010), 95.Google Scholar
- Girvan, M., and Newman, M. E. Community structure in social and biological networks. Proc. of the national academy of sciences 99, 12 (2002), 7821--7826.Google Scholar
- Ha, M., Baladandayuthapani, V., and Do, K. Dingo: differential network analysis in genomics. Bioinformatics 31, 21 (2015), 3413--20.Google Scholar
- Horvath, S., Zhang, Y., Langfelder, P., Kahn, R. S., Boks, M. P., van Eijk, K., van den Berg, L. H., and Ophoff, R. A. Aging effects on DNA methylation modules in human brain and blood tissue. Genome biology 13, 10 (2012), R97.Google Scholar
- Huang, D. W., Sherman, B. T., and Lempicki, R. A. Systematic and integrative analysis of large gene lists using david bioinformatics resources. Nature protocols 4, 1 (2009), 44--57.Google Scholar
- Hubert, L. J. Assignment methods in combinatorial data analysis. Marcel Dekker 1 (1987). Google Scholar
- Ideker, T., Ozier, O., Schwikowski, B., and Siegel, A. Discovery regulartory and signalling circuits in molecular interaction networks. Bioinformatics 18 (2002).Google Scholar
- Jiao, Y., Widschwendter, M., and Teschendorff, A. E. A systems-level integrative framework for genome-wide dna methylation and gene expression data identifies differential gene expression modules under epigenetic control. Bioinformatics 30, 16 (2014), 2360--2366.Google Scholar
- Jin, L., Chen, Y., Wang, T., Hui, P., and Vasilakos, A. Understanding user behavior in online social networks: a survey. Communications Magazine, IEEE 51, 9 (September 2013), 144--150.Google Scholar
- Johnson, W. E., Li, C., and Rabinovic, A. Adjusting batch effects in microarray expression data using empirical bayes methods. Biostatistics 8, 1 (2007), 118--127.Google Scholar
- Jolma, A., Yan, J., Whitington, T., Toivonen, J., Nitta, K. R., Rastas, P., Morgunova, E., Enge, M., Taipale, M., Wei, G., et al. Dna-binding specificities of human transcription factors. Cell 152, 1 (2013), 327--339.Google Scholar
- Keller, A., Bakes, C., Gerasch, A., Kaufmann, M., Kohlbacher, O., Meese, E., and Lenhof, H. A novel algorithm for detecting differentially regulated paths based on gene enrichment analysis. Bioinfomatics 25, 21 (2009), 2787--2794. Google Scholar
- Kulakovskiy, I. V., Vorontsov, I. E., Yevshin, I. S., Soboleva, A. V., Kasianov, A. S., Ashoor, H., Ba-Alawi, W., Bajic, V. B., Medvedeva, Y. A., Kolpakov, F. A., et al. Hocomoco: expansion and enhancement of the collection of transcription factor binding sites models. Nucleic acids research 44, D1 (2016), D116--D125.Google Scholar
- Lamirel, J.-C., Cuxac, P., Mall, R., and Safi, G. A new efficient and unbiased approach for clustering quality evaluation. New Frontiers in Applied Data Mining (2012), 209--220. Google Scholar
- Lena, P. D., Wu, G., Martelli, P., Casadio, R., and Nardini, M. C. An efficient tool for molecular interaction maps overlap. BMC Bioinforma 14, 1 (2013), 159.Google Scholar
- Levandowsky, M., and Winter, D. Distance between sets. Nature 234, 5323 (1971), 34--35.Google Scholar
- Li, D., Brown, J. B., Orsini, L., Pan, Z., Hu, G., and He, S. Moda: Module differential analysis for weighted gene co-expression network. arXiv preprint arXiv:1605.04739 (2016).Google Scholar
- Mall, R., Cerulo, L., Bensmail, H., Iavarone, A., and Ceccarelli, M. Detection of statistically significant network changes in complex biological networks. BMC Systems Biology 11, 1 (2017), 32.Google Scholar
- Mall, R., Langone, R., and Suykens, J. A. Kernel spectral clustering for big data networks. Entropy 15, 5 (2013), 1567--1586.Google Scholar
- Mall, R., Langone, R., and Suykens, J. A. Self-tuned kernel spectral clustering for large scale networks. In Big Data, 2013 IEEE International Conference on (2013), IEEE, pp. 385--393.Google Scholar
- Mall, R., Langone, R., and Suykens, J. A. Multilevel hierarchical kernel spectral clustering for real-life large scale complex networks. PloS one 9, 6 (2014), e99966.Google Scholar
- Mantel, N. The detection of disease clustering and a generalized regression approach. Cancer Research 27, 2 (1967), 209.Google Scholar
- Marbach, D., Lamparter, D., Quon, G., Kellis, M., Kutalik, Z., and Bergmann, S. Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases. Nature methods (2016).Google Scholar
- Margolin, A. A., Nemenman, I., Basso, K., Wiggins, C., Stolovitzky, G., Favera, R. D., and Califano, A. Aracne: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 7, S-1 (2006).Google Scholar
- Mathelier, A., Fornes, O., Arenillas, D. J., Chen, C.-y., Denay, G., Lee, J., Shi, W., Shyr, C., Tan, G., Worsley-Hunt, R., et al. Jaspar 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic acids research 44, D1 (2016), D110--D115.Google Scholar
- Merico, D., Isserlin, R., Stueker, O., Emili, A., and Bader, G. D. Enrichment map: a network-based method for gene-set enrichment visualization and interpretation. PloS one 5, 11 (2010), e13984.Google Scholar
- Mislove, A., Marcon, M., Gummadi, K. P., Druschel, P., and Bhattacharjee, B. Measurement and analysis of online social networks. In Proc. of the 7th ACM SIGCOMM Conference on Internet Measurement (2007), IMC '07, ACM, pp. 29--42. Google Scholar
- Nacu, S., Critchley-Throne, R., Lee, R., and Holmes, S. Gene expression network analysis and applications to immunology. Bioinformatics 23, 7 (2007), 850--858. Google Scholar
- Orman, G. K., and Labatut, V. A comparison of community detection algorithms on artificial networks. In International Conference on Discovery Science (2009), Springer, pp. 242--256. Google Scholar
- Prvzulj, N. Biological network comparison using graphlet degree distribution. Bioinformatics 23, 2 (2007), e177--e183. Google Scholar
- Ramana, M., Scheinerman, E., and Ullman, D. Fractional isomorphism of graphs. Discrete Mathematics 132, 1 (1994), 247--265. Google Scholar
- Reichardt, J., and Bornholdt, S. Statistical mechanics of community detection. Physical Review E 74, 1 (2006), 016110.Google Scholar
- Rosvall, M., and Bergstrom, C. T. Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PloS one 6, 4 (2011), e18209.Google Scholar
- Ruan, D. Statistical methods for comparing labelled graphs. PhD thesis, Imperial College London, 2014.Google Scholar
- Ruan, D., Young, A., and Montana, G. Differential analysis of biological networks. BMC bioinformatics 16, 1 (2015), 327.Google Scholar
- Shervashidze, N., Schweitzer, P., van Leeuwen, E. J., Mehlhorn, K., and Borgwardt, K. Weisfeiler-Lehman Graph Kernels. Journal of Machine Learning Research 12 (2011), 2539--2561. Google Scholar
- Teschendorff, A. E., Menon, U., Gentry-Maharaj, A., Ramus, S. J., Weisenberger, D. J., Shen, H., Campan, M., Noushmehr, H., Bell, C. G., Maxwell, A. P., et al. Age-dependent DNA methylation of genes that are suppressed in stem cells is a hallmark of cancer. Genome research 20, 4 (2010), 440--446.Google Scholar
- Wallace, T., Martin, D., and Ambs, S. Interaction among genes, tumor biology and the environment in cancer health disparities: examining the evidence on a national and global scale. Carcinogenesis 32, 8 (2011), 1107--1121.Google Scholar
- West, J., Beck, S., Wang, X., and Teschendorff, A. E. An integrative network algorithm identifies age-associated differential methylation interactome hotspots targeting stem-cell differentiation pathways. Scientific reports 3 (2013), 1630.Google Scholar
- Yang, Q., and Sze, S. Path matching and graph matching in biological networks. Journal of Computational Biology 14, 1 (2007), 56--67.Google Scholar
- Yang, X., Shao, X., Gao, L., and Zhang, S. Systematic dna methylation analysis of multiple cell lines reveals common and specific patterns within and across tissues of origin. Human molecular genetics 24, 15 (2015), 4374--4384.Google Scholar
- Zhang, B., Horvath, S., et al. A general framework for weighted gene co-expression network analysis. Statistical applications in genetics and molecular biology 4, 1 (2005), 1128.endthebibliographyGoogle Scholar
Index Terms
Differential Community Detection in Paired Biological Networks




Comments