10.1145/2506583.2506589acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
tutorial

An Island-Based Approach for Differential Expression Analysis

Authors Info & Claims
Online:22 September 2013Publication History

ABSTRACT

High-throughput mRNA sequencing (also known as RNA-Seq) promises to be the technique of choice for studying transcriptome profiles. This technique provides the ability to develop precise methodologies for transcript and gene expression quantification, novel transcript and exon discovery, and splice variant detection. One of the limitations of current RNA-Seq methods is the dependency on annotated biological features (e.g. exons, transcripts, genes) to detect expression differences across samples. This forces the identification of expression levels and the detection of significant changes to known genomic regions. Any significant changes that occur in unannotated regions will not be captured. To overcome this limitation, we developed a novel segmentation approach, Island-Based (IB), for analyzing differential expression in RNA-Seq and targeted sequencing (exome capture) data without specific knowledge of an isoform. The IB segmentation determines individual islands of expression based on windowed read counts that can be compared across experimental conditions to determine differential island expression. In order to detect differentially expressed genes, the significance of islands (p-values) are combined using Fisher's method. We tested and evaluated the performance of our approach by comparing it to the existing differentially expressed gene (DEG) methods: CuffDiff, DESeq, and edgeR using two benchmark MAQC RNA-Seq datasets. The IB algorithm outperforms all three methods in both datasets as illustrated by an increased auROC.

References

  1. Anders, S. and Huber, W. 2010. Differential expression analysis for sequence count data. Genome Biol. 11, 10 (October 2010), R106. DOI=http://dx.doi.org/10.1186/gb-2010-11-10-r106.Google ScholarGoogle ScholarCross RefCross Ref
  2. Benjamini, Y. and Hochberg, Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B 57, 1 (October 1995), 289--300. DOI=http://dx.doi.org/10.2307/2346101.Google ScholarGoogle Scholar
  3. Bullard, J. H., Purdom, E., Hansen, K. D., and Dudoit, D. 2010. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11 (February 2010), 94. DOI=http://dx.doi.org/10.1186/1471-2105-11-94.Google ScholarGoogle Scholar
  4. Cousins, R. D. 2008. Annotated bibliography of some papers on combining significances or p-values. Available at arXiv:0705.2209v2 {physics.data-an}, December 2008.Google ScholarGoogle Scholar
  5. ENCODE Project Consortium. 2008. A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 9, 4 (April 2011), e1001046. DOI=http://dx.doi.org/10.1371/journal.pbio.1001046.Google ScholarGoogle Scholar
  6. Fisher, R. A. 1970. Statistical methods for research workers. Oliver and Boyd, Edinburgh, London.Google ScholarGoogle Scholar
  7. Grabherr, M. G., Haas, B. J., Yassour, M., Levin, J. Z., Thompson, D. A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., Chen, Z., Mauceli, E., Hacohen, N., Gnirke, A., Rhind, N., di Palma, F., Birren, B. W., Nusbaum, C., Lindblad-Toh, K., Friedman, N., and Regev A. 2011. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 7 (May 2011), 644--652. DOI=http://dx.doi.org/10.1038/nbt.1883.Google ScholarGoogle ScholarCross RefCross Ref
  8. Halvardson, J., Zaghlool, A., and Feuk, L. 2013. Exome RNA sequencing reveals rare and novel alternative transcripts. Nucleic Acids Res. 41, 1 (August 2013), e6-e6. DOI=http://dx.doi.org/10.1093/nar/gks816.Google ScholarGoogle ScholarCross RefCross Ref
  9. Hardcastle, T. J. and Kelly, K. A. 2010. baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics 11, 1 (August 2010), 422. DOI=http://dx.doi.org/10.1186/1471-2105-11-422.Google ScholarGoogle ScholarCross RefCross Ref
  10. Harrison, B. J., Flight, R. M., Gomes, C., Venkat, G., Ellis, S. R., Sankar, U., Twiss, J. L., Rouchka, E. C., and Petruska, J. C. 2013. IB4-binding sensory neurons in the adult rat express a novel 3'UTR-extended isoform of CaMK4 that is associated with its localization to axons. J. Comp. Neurol. epub ahead of print. DOI=http://dx.doi.org/10.1002/cne.23398.Google ScholarGoogle Scholar
  11. Harrow, J., Denoeud, F., Frankish, A., Reymond, A., Chen, C. K., Chrast, J., Lagarde, J., Gilbert, J. G., Storey, R., Swarbreck, D., Rossier, C., Ucla, C., Hubbard, T., Antonarakis, S. E., and Guigo, R. 2006. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 7, Suppl 1 (August 2010), S4. DOI=http://dx.doi.org/10.1186/gb-2006-7-s1-s4.Google ScholarGoogle Scholar
  12. Harrow, J., Frankish, A., Gonzalez, J. M., Tapanari, E., Diekhans, M., Kokocinski, F., Aken, B. L., Barrell, D., Zadissa, A., Searle, S., Barnes, I., Bignell, A., Boychenko, V., Hunt, T., Kay, M., Mukherjee, G., Rajan, J., Despacio-Reyes, G., Saunders, G., Steward, C., Harte, R., Lin, M., Howald, C., Tanzer, A., Derrien, T., Chrast, J., Walters, N., Balasubramanian, S., Pei, B., Tress, M., Rodriguez, J.M., Ezkurdia, I., van Baren, J., Brent, M., Haussler, D., Kellis, M., Valencia, A., Reymond, A., Gerstein, M., Guigo, R., and Hubbard, T. J. 2012. GENCODE: The reference human genome annotation for The ENCODE Project. Genome Res. 22, 9 (September 2012), 1760--1774. DOI=http://dx.doi.org/10.1101/gr.135350.111.Google ScholarGoogle ScholarCross RefCross Ref
  13. Hess, A. and Tyer, H. 2007. Fisher's combined p-value for detecting differentially expressed genes using Affymetrix expression arrays. BMC Genomics 8 (April 2007), 96. DOI=http://dx.doi.org/0.1186/1471-2164-8-96.Google ScholarGoogle Scholar
  14. Howald, C., Tanzer, A., Chrast, J., Kokocinski, F., Derrien, T., Walters, N., Gonzalez, J. M., Frankish, A., Aken, B. L., Hourlier, T., Vogel, J. H., White, S., Searle, S., Harrow, J., Hubbard, T. J., Guigo, R., and Reymond, A. 2012. Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome. Genome Res. 22, 9 (September 2012), 1698--1710. DOI=http://dx.doi.org/10.1101/gr.134478.111.Google ScholarGoogle ScholarCross RefCross Ref
  15. Kvam, V. M., Liu, P., and Si, Y. 2012. A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data. Am. J. Bot. 99, 2 (February 2012), 248--256. DOI=http://dx.doi.org/10.3732/ajb.1100340.Google ScholarGoogle ScholarCross RefCross Ref
  16. Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, 3 (March 2009), R25. DOI=http://dx.doi.org/10.1186/gb-2009-10-3-r25.Google ScholarGoogle ScholarCross RefCross Ref
  17. MAQC Consortium. 2006. The MicroArray Quality Control (MAQC) project shows inter-andintraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24, 9 (September 2006), 1151--1161. DOI=http://dx.doi.org/10.1038/nbt1239.Google ScholarGoogle ScholarCross RefCross Ref
  18. Marioni, J. C., Mason, C. E., Mane, S. M., Stephens, M., and Gilad, Y. 2008. RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18, 9 (September 2008), 1509--1517. DOI=http://dx.doi.org/10.1101/gr.079558.108.Google ScholarGoogle ScholarCross RefCross Ref
  19. Mercer, T. R., Gerhardt, D. J., Dinger, M. E., Crawford, J., Trapnell, C., Jeddeloh, J. A., Mattick, J. S., and Rinn, J. L. 2011. Targeted RNA sequencing reveals the deep complexity of the human transcriptome. Nat. Biotechnol. 30, 1 (November 2011), 99--104. DOI=http://dx.doi.org/10.1038/nbt.2024.Google ScholarGoogle Scholar
  20. Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L., and Wold, B. 2008. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods. 5, 7 (July 2008), 621--628. DOI=http://dx.doi.org/10.1038/nmeth.1226.Google ScholarGoogle ScholarCross RefCross Ref
  21. Nacu, S., Yuan, W., Kan, Z., Bhatt, D., Rivers, C. S., Stinson, J., Peters, B. A., Modrusan, Z., Jung, K., Seshagiri, S., and Wu, T. D. 2011. Deep RNA sequencing analysis of readthrough gene fusions in human prostate adenocarcinoma and reference samples. BMC Med. Genomics. 4 (January 2011), 11. DOI=http://dx.doi.org/10.1186/1755-8794-4-11.Google ScholarGoogle Scholar
  22. Oshlack, A., Robinson, M. D., and Young, M. D. 2010. From RNA-seq reads to differential expression results. Genome Biol. 11, 12 (December 2010), 220. DOI=http://dx.doi.org/10.1186/gb-2010-11-12-220.Google ScholarGoogle ScholarCross RefCross Ref
  23. J. Pickrell, J. K., Marioni, J. C., Pai, A. A., Degner, J. F., Engelhardt, B. E., Nkadori, E., Veyrieras, J., Stephens, M., Gilad, Y., and Pritchard, J. K. 2010. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature, 464, 7289 (April 2010), 768--772. DOI=http://dx.doi.org/10.1038/nature08872.Google ScholarGoogle ScholarCross RefCross Ref
  24. Quinlan, A. R. and Hall, I. M. 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 26, 6 (March 2010), 841--842. DOI=http://dx.doi.org/10.1093/bioinformatics/btq033. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Robertson, G., Schein, J., Chiu, R., Corbett, R., Field, M., Jackman, S. D., Mungall, K., Lee, S., Okada, H. M., Qian, J. Q., Griffith, M., Raymond, A., Thiessen, N., Cezard, T., Butterfield, Y. S., Newsome, R., Chan, S. K., She, R., Varhol, R., Kamoh, B., Prabhu, A., Tam, A., Zhao, Y., Moore, R. A., Hirst, M., Marra, M. A., Jones, S. J. M., Hoodless, P. A., and Birol, I. 2010. De novo assembly and analysis of RNA-seq data. Nat. Methods, 7, 11 (November 2010), 909--912. DOI=http://dx.doi.org/doi: 10.1038/nmeth.1517.Google ScholarGoogle ScholarCross RefCross Ref
  26. Robinson, M. D., McCarthy, D. J., and Smyth, G. K. 2010. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26, 1 (January 2010), 139--140. DOI=http://dx.doi.org/10.1093/bioinformatics/btp616. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Schulz, M. H., Zerbino, D. R., Vingron, M., and Birney, E. 2012. Oases: Robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics, 28, 8 (April 2012), 1086--1092. DOI=http://dx.doi.org/10.1093/bioinformatics/bts094. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sun, K., Chen, X., Jiang, P., Song, X., Wang, H., and Sun, H. 2013. iSeeRNA: identification of long intergenic non-coding RNA transcripts from transcriptome sequencing data. BMC Genomics, 14, Suppl 2 (February 2013), S7.DOI=http://dx.doi.org/10.1186/1471-2164-14-S2-S7.Google ScholarGoogle Scholar
  29. Trapnell, C., Williams, B. A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M. J., Salzberg, S. L., Wold, B. J., and Pachter, L. 2010. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol., 28, 5 (May 2010), 511--515. DOI=http://dx.doi.org/10.1038/nbt.1621.Google ScholarGoogle ScholarCross RefCross Ref
  30. Wan, L. and Sun, F. 2012. CEDER: accurate detection of differentially expressed genes by combining significance of exons using RNA-Seq. IEEE/ACM Trans. Comput. Biol. Bioinform., 9, 5 (September-October 2012), 1281--1292. DOI=http://dx.doi.org/10.1109/TCBB.2012.83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Wang, L., Feng, Z., Wang, X., Wang, X., and Zhang, X. 2010. DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics, 26, 1 (January 2010), 136--138. DOI=http://dx.doi.org/10.1093/bioinformatics/btp612. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Wang, Z., Gerstein, M., and Snyder, M. 2009. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet., 10, 1 (January 2009), 57--63. DOI=http://dx.doi.org/10.1038/nrg2484.Google ScholarGoogle ScholarCross RefCross Ref
  33. Yang, J. H., Li, J., Jiang, S., Zhou, H., and Qu, L. H. 2013. ChIPBase: a database for decoding the transcriptional regulation of long non-coding RNA and microRNA genes from ChIP-Seq data. Nucleic Acids Res., 41, D1 (November 2013), D177-D187. DOI=http://dx.doi.org/doi: 10.1093/nar/gks1060.Google ScholarGoogle ScholarCross RefCross Ref
  34. Yang, J. H., and Qu, L. H. 2012. deepBase: Annotation and discovery of microRNAs and other noncoding RNAs from deep-sequencing data. Methods Mol. Biol., 822 (2012), 233--248. DOI=http://dx.doi.org/10.1007/978-1-61779-427-816.Google ScholarGoogle ScholarCross RefCross Ref
  35. Zang, C., Schones, D. E., Zeng, C., Cui, K., Zhao, K., and Peng, W. 2009. A clustering approach for identification of enriched domains from histone modification ChIP-Seq data. Bioinformatics, 25, 15 (August 2009), 1952--1958. DOI=http://dx.doi.org/. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

(auto-classified)
  1. An Island-Based Approach for Differential Expression Analysis

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          ACM Conferences cover image
          BCB'13: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
          September 2013
          987 pages
          ISBN:9781450324342
          DOI:10.1145/2506583

          Copyright © 2013 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Online: 22 September 2013
          • Published: 22 September 2013

          Permissions

          Request permissions about this article.

          Request Permissions

          Qualifiers

          • tutorial
          • Research
          • Refereed limited

          Acceptance Rates

          BCB'13 Paper Acceptance Rate 43 of 148 submissions, 29%
          Overall Acceptance Rate 254 of 885 submissions, 29%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!