ABSTRACT
High-throughput mRNA sequencing (also known as RNA-Seq) promises to be the technique of choice for studying transcriptome profiles. This technique provides the ability to develop precise methodologies for transcript and gene expression quantification, novel transcript and exon discovery, and splice variant detection. One of the limitations of current RNA-Seq methods is the dependency on annotated biological features (e.g. exons, transcripts, genes) to detect expression differences across samples. This forces the identification of expression levels and the detection of significant changes to known genomic regions. Any significant changes that occur in unannotated regions will not be captured. To overcome this limitation, we developed a novel segmentation approach, Island-Based (IB), for analyzing differential expression in RNA-Seq and targeted sequencing (exome capture) data without specific knowledge of an isoform. The IB segmentation determines individual islands of expression based on windowed read counts that can be compared across experimental conditions to determine differential island expression. In order to detect differentially expressed genes, the significance of islands (p-values) are combined using Fisher's method. We tested and evaluated the performance of our approach by comparing it to the existing differentially expressed gene (DEG) methods: CuffDiff, DESeq, and edgeR using two benchmark MAQC RNA-Seq datasets. The IB algorithm outperforms all three methods in both datasets as illustrated by an increased auROC.
References
- Anders, S. and Huber, W. 2010. Differential expression analysis for sequence count data. Genome Biol. 11, 10 (October 2010), R106. DOI=http://dx.doi.org/10.1186/gb-2010-11-10-r106.Google Scholar
Cross Ref
- Benjamini, Y. and Hochberg, Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B 57, 1 (October 1995), 289--300. DOI=http://dx.doi.org/10.2307/2346101.Google Scholar
- Bullard, J. H., Purdom, E., Hansen, K. D., and Dudoit, D. 2010. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11 (February 2010), 94. DOI=http://dx.doi.org/10.1186/1471-2105-11-94.Google Scholar
- Cousins, R. D. 2008. Annotated bibliography of some papers on combining significances or p-values. Available at arXiv:0705.2209v2 {physics.data-an}, December 2008.Google Scholar
- ENCODE Project Consortium. 2008. A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 9, 4 (April 2011), e1001046. DOI=http://dx.doi.org/10.1371/journal.pbio.1001046.Google Scholar
- Fisher, R. A. 1970. Statistical methods for research workers. Oliver and Boyd, Edinburgh, London.Google Scholar
- Grabherr, M. G., Haas, B. J., Yassour, M., Levin, J. Z., Thompson, D. A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., Chen, Z., Mauceli, E., Hacohen, N., Gnirke, A., Rhind, N., di Palma, F., Birren, B. W., Nusbaum, C., Lindblad-Toh, K., Friedman, N., and Regev A. 2011. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 7 (May 2011), 644--652. DOI=http://dx.doi.org/10.1038/nbt.1883.Google Scholar
Cross Ref
- Halvardson, J., Zaghlool, A., and Feuk, L. 2013. Exome RNA sequencing reveals rare and novel alternative transcripts. Nucleic Acids Res. 41, 1 (August 2013), e6-e6. DOI=http://dx.doi.org/10.1093/nar/gks816.Google Scholar
Cross Ref
- Hardcastle, T. J. and Kelly, K. A. 2010. baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics 11, 1 (August 2010), 422. DOI=http://dx.doi.org/10.1186/1471-2105-11-422.Google Scholar
Cross Ref
- Harrison, B. J., Flight, R. M., Gomes, C., Venkat, G., Ellis, S. R., Sankar, U., Twiss, J. L., Rouchka, E. C., and Petruska, J. C. 2013. IB4-binding sensory neurons in the adult rat express a novel 3'UTR-extended isoform of CaMK4 that is associated with its localization to axons. J. Comp. Neurol. epub ahead of print. DOI=http://dx.doi.org/10.1002/cne.23398.Google Scholar
- Harrow, J., Denoeud, F., Frankish, A., Reymond, A., Chen, C. K., Chrast, J., Lagarde, J., Gilbert, J. G., Storey, R., Swarbreck, D., Rossier, C., Ucla, C., Hubbard, T., Antonarakis, S. E., and Guigo, R. 2006. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 7, Suppl 1 (August 2010), S4. DOI=http://dx.doi.org/10.1186/gb-2006-7-s1-s4.Google Scholar
- Harrow, J., Frankish, A., Gonzalez, J. M., Tapanari, E., Diekhans, M., Kokocinski, F., Aken, B. L., Barrell, D., Zadissa, A., Searle, S., Barnes, I., Bignell, A., Boychenko, V., Hunt, T., Kay, M., Mukherjee, G., Rajan, J., Despacio-Reyes, G., Saunders, G., Steward, C., Harte, R., Lin, M., Howald, C., Tanzer, A., Derrien, T., Chrast, J., Walters, N., Balasubramanian, S., Pei, B., Tress, M., Rodriguez, J.M., Ezkurdia, I., van Baren, J., Brent, M., Haussler, D., Kellis, M., Valencia, A., Reymond, A., Gerstein, M., Guigo, R., and Hubbard, T. J. 2012. GENCODE: The reference human genome annotation for The ENCODE Project. Genome Res. 22, 9 (September 2012), 1760--1774. DOI=http://dx.doi.org/10.1101/gr.135350.111.Google Scholar
Cross Ref
- Hess, A. and Tyer, H. 2007. Fisher's combined p-value for detecting differentially expressed genes using Affymetrix expression arrays. BMC Genomics 8 (April 2007), 96. DOI=http://dx.doi.org/0.1186/1471-2164-8-96.Google Scholar
- Howald, C., Tanzer, A., Chrast, J., Kokocinski, F., Derrien, T., Walters, N., Gonzalez, J. M., Frankish, A., Aken, B. L., Hourlier, T., Vogel, J. H., White, S., Searle, S., Harrow, J., Hubbard, T. J., Guigo, R., and Reymond, A. 2012. Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome. Genome Res. 22, 9 (September 2012), 1698--1710. DOI=http://dx.doi.org/10.1101/gr.134478.111.Google Scholar
Cross Ref
- Kvam, V. M., Liu, P., and Si, Y. 2012. A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data. Am. J. Bot. 99, 2 (February 2012), 248--256. DOI=http://dx.doi.org/10.3732/ajb.1100340.Google Scholar
Cross Ref
- Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, 3 (March 2009), R25. DOI=http://dx.doi.org/10.1186/gb-2009-10-3-r25.Google Scholar
Cross Ref
- MAQC Consortium. 2006. The MicroArray Quality Control (MAQC) project shows inter-andintraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24, 9 (September 2006), 1151--1161. DOI=http://dx.doi.org/10.1038/nbt1239.Google Scholar
Cross Ref
- Marioni, J. C., Mason, C. E., Mane, S. M., Stephens, M., and Gilad, Y. 2008. RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18, 9 (September 2008), 1509--1517. DOI=http://dx.doi.org/10.1101/gr.079558.108.Google Scholar
Cross Ref
- Mercer, T. R., Gerhardt, D. J., Dinger, M. E., Crawford, J., Trapnell, C., Jeddeloh, J. A., Mattick, J. S., and Rinn, J. L. 2011. Targeted RNA sequencing reveals the deep complexity of the human transcriptome. Nat. Biotechnol. 30, 1 (November 2011), 99--104. DOI=http://dx.doi.org/10.1038/nbt.2024.Google Scholar
- Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L., and Wold, B. 2008. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods. 5, 7 (July 2008), 621--628. DOI=http://dx.doi.org/10.1038/nmeth.1226.Google Scholar
Cross Ref
- Nacu, S., Yuan, W., Kan, Z., Bhatt, D., Rivers, C. S., Stinson, J., Peters, B. A., Modrusan, Z., Jung, K., Seshagiri, S., and Wu, T. D. 2011. Deep RNA sequencing analysis of readthrough gene fusions in human prostate adenocarcinoma and reference samples. BMC Med. Genomics. 4 (January 2011), 11. DOI=http://dx.doi.org/10.1186/1755-8794-4-11.Google Scholar
- Oshlack, A., Robinson, M. D., and Young, M. D. 2010. From RNA-seq reads to differential expression results. Genome Biol. 11, 12 (December 2010), 220. DOI=http://dx.doi.org/10.1186/gb-2010-11-12-220.Google Scholar
Cross Ref
- J. Pickrell, J. K., Marioni, J. C., Pai, A. A., Degner, J. F., Engelhardt, B. E., Nkadori, E., Veyrieras, J., Stephens, M., Gilad, Y., and Pritchard, J. K. 2010. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature, 464, 7289 (April 2010), 768--772. DOI=http://dx.doi.org/10.1038/nature08872.Google Scholar
Cross Ref
- Quinlan, A. R. and Hall, I. M. 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 26, 6 (March 2010), 841--842. DOI=http://dx.doi.org/10.1093/bioinformatics/btq033. Google Scholar
Digital Library
- Robertson, G., Schein, J., Chiu, R., Corbett, R., Field, M., Jackman, S. D., Mungall, K., Lee, S., Okada, H. M., Qian, J. Q., Griffith, M., Raymond, A., Thiessen, N., Cezard, T., Butterfield, Y. S., Newsome, R., Chan, S. K., She, R., Varhol, R., Kamoh, B., Prabhu, A., Tam, A., Zhao, Y., Moore, R. A., Hirst, M., Marra, M. A., Jones, S. J. M., Hoodless, P. A., and Birol, I. 2010. De novo assembly and analysis of RNA-seq data. Nat. Methods, 7, 11 (November 2010), 909--912. DOI=http://dx.doi.org/doi: 10.1038/nmeth.1517.Google Scholar
Cross Ref
- Robinson, M. D., McCarthy, D. J., and Smyth, G. K. 2010. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26, 1 (January 2010), 139--140. DOI=http://dx.doi.org/10.1093/bioinformatics/btp616. Google Scholar
Digital Library
- Schulz, M. H., Zerbino, D. R., Vingron, M., and Birney, E. 2012. Oases: Robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics, 28, 8 (April 2012), 1086--1092. DOI=http://dx.doi.org/10.1093/bioinformatics/bts094. Google Scholar
Digital Library
- Sun, K., Chen, X., Jiang, P., Song, X., Wang, H., and Sun, H. 2013. iSeeRNA: identification of long intergenic non-coding RNA transcripts from transcriptome sequencing data. BMC Genomics, 14, Suppl 2 (February 2013), S7.DOI=http://dx.doi.org/10.1186/1471-2164-14-S2-S7.Google Scholar
- Trapnell, C., Williams, B. A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M. J., Salzberg, S. L., Wold, B. J., and Pachter, L. 2010. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol., 28, 5 (May 2010), 511--515. DOI=http://dx.doi.org/10.1038/nbt.1621.Google Scholar
Cross Ref
- Wan, L. and Sun, F. 2012. CEDER: accurate detection of differentially expressed genes by combining significance of exons using RNA-Seq. IEEE/ACM Trans. Comput. Biol. Bioinform., 9, 5 (September-October 2012), 1281--1292. DOI=http://dx.doi.org/10.1109/TCBB.2012.83. Google Scholar
Digital Library
- Wang, L., Feng, Z., Wang, X., Wang, X., and Zhang, X. 2010. DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics, 26, 1 (January 2010), 136--138. DOI=http://dx.doi.org/10.1093/bioinformatics/btp612. Google Scholar
Digital Library
- Wang, Z., Gerstein, M., and Snyder, M. 2009. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet., 10, 1 (January 2009), 57--63. DOI=http://dx.doi.org/10.1038/nrg2484.Google Scholar
Cross Ref
- Yang, J. H., Li, J., Jiang, S., Zhou, H., and Qu, L. H. 2013. ChIPBase: a database for decoding the transcriptional regulation of long non-coding RNA and microRNA genes from ChIP-Seq data. Nucleic Acids Res., 41, D1 (November 2013), D177-D187. DOI=http://dx.doi.org/doi: 10.1093/nar/gks1060.Google Scholar
Cross Ref
- Yang, J. H., and Qu, L. H. 2012. deepBase: Annotation and discovery of microRNAs and other noncoding RNAs from deep-sequencing data. Methods Mol. Biol., 822 (2012), 233--248. DOI=http://dx.doi.org/10.1007/978-1-61779-427-816.Google Scholar
Cross Ref
- Zang, C., Schones, D. E., Zeng, C., Cui, K., Zhao, K., and Peng, W. 2009. A clustering approach for identification of enriched domains from histone modification ChIP-Seq data. Bioinformatics, 25, 15 (August 2009), 1952--1958. DOI=http://dx.doi.org/. Google Scholar
Digital Library
Index Terms
(auto-classified)An Island-Based Approach for Differential Expression Analysis





Comments