Abstract
The search for homologous RNA molecules---sequences of RNA that might behave simiarly due to similarity in their physical (secondary) structure---is currently a computationally intensive task. Moreover, RNA sequences are populating genome databases at a pace unmatched by gains in standard processor performance. While software tools such as Infernal can efficiently find homologies among RNA families and genome databases of modest size, the continuous advent of new RNA families and the explosive growth in volume of RNA sequences necessitate a faster approach.
This work introduces two different architectures for accelerating the task of finding homologous RNA molecules in a genome database. The first architecture takes advantage of the tree-like configuration of the covariance models used to represent the consensus secondary structure of an RNA family and converts it directly into a highly-pipelined processing engine. Results for this architecture show a 24× speedup over Infernal when processing a small RNA model. It is estimated that the architecture could potentially offer several thousands of times speedup over Infernal on larger models, provided that there are sufficient hardware resources available.
The second architecture is introduced to address the steep resource requirements of the first architecture. It utilizes a uniform array of processing elements and schedules all of the computations required to scan for an RNA homolog onto those processing elements. The estimated speedup for this architecture over the Infernal software package ranges from just under 20× to over 2,350×.
- }}Ahmadi, H. and Denzel, W. 1989. A survey of modern high-performance switching techniques. IEEE J. Select. Areas Comm. 7, 7, 1091--1103.Google Scholar
Digital Library
- }}Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. 1990. Basic local alignment search tool. J. Molecul. Biol. 215, 403--410.Google Scholar
Cross Ref
- }}Aluru, S., Fuamura, N., and Mehrotra, K. 2003. Parallel biological sequence comparison using prefix computations. J. Paral. Distrib. Comput. 63, 3, 264--272. Google Scholar
Digital Library
- }}Batcher, K. E. 1968. Sorting networks and their applications. In Proceedings of the AFIPS Spring Joint Computer Conference 32. 307--314. Google Scholar
Digital Library
- }}Brown, M. P. S. 2000. Small subunit ribosomal RNA modeling using stochastic context-free grammars. In Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology. AAAI Press, 57--66. Google Scholar
Digital Library
- }}Cocke, J. 1969. Programming Languages and Their Compilers: Preliminary Notes. Courant Institute of Mathematical Sciences, New York University. Google Scholar
Digital Library
- }}Coffman, E. and Graham, R. 1972. Optimal scheduling for two-processor systems. Acta Inf. 1, 3, 200--213.Google Scholar
Digital Library
- }}Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. 1998. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press.Google Scholar
- }}Eddy, S. R. 2006. Computational analysis of RNAs. Cold Spring Harbor Symp. Quantitat. Biol. 71, 1, 117--128.Google Scholar
Cross Ref
- }}Eddy, S. R. and Durbin, R. 1994. RNA sequence analysis using covariance models. Nucleic Acids Res. 22, 11, 2079--2088.Google Scholar
Cross Ref
- }}Fishburn, P. 1985. Interval Orders and Interval Graphs. John Wiley & Sons, New York.Google Scholar
- }}Goke, L. R. and Lipovski, G. J. 1973. Banyan networks for partitioning multiprocessor systems. In Proceedings of the 1st Annual Symposium on Computer Architecture (ISCA’73). 21--28. Google Scholar
Digital Library
- }}Griffiths-Jones, S., Moxon, S., Marshall, M., Khanna, A., Eddy, S. R., and Bateman, A. 2005. Rfam: Annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33.Google Scholar
- }}HMMER. HMMER Website. http://hmmer.janelia.org {July 2007}.Google Scholar
- }}Hu, T. C. 1961. Parallel sequencing and assembly line problems. Oper. Res. 9, 6, 841--848.Google Scholar
Digital Library
- }}Infernal. Infernal website. http://infernal.janelia.org.Google Scholar
- }}Kasami, T. 1965. An efficient recognition and syntax algortihm for context-free languages. Tech. rep. Scientific rep. AFCRL-65-758, Air Force Cambridge Research Laboratory, Bedford, MA.Google Scholar
- }}Kwok, Y.-K. and Ahmad, I. 1999. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31, 4, 406--471. Google Scholar
Digital Library
- }}Lenhof, H.-P., Reinert, K., and Vingron, M. 1998. A polyhedral approach to RNA sequence structure alignment. J. Comput. Biol. 5, 3, 517--530.Google Scholar
Cross Ref
- }}Liu, T. and Schmidt, B. 2005. Parallel RNA secondary structure prediction using stochastic context-free grammars. Concurr. Comput. Pract. Exper. 17, 14, 1669--1685. Google Scholar
Digital Library
- }}Lowe, T. and Eddy, S. R. 1997. tRNAscan-SE: A program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 5, 955--964.Google Scholar
Cross Ref
- }}Mattick, J. S. 2003. Challenging the dogma: The hidden layer of non-protein-coding RNAs in complex organisms. BioEssays 25, 930--939.Google Scholar
Cross Ref
- }}Moscola, J. 2008. Techniques for hardware-accelerated parsing for network and bioinformatic applications. Ph.D. thesis, Washington University in St. Louis, St. Louis, MO. Google Scholar
Digital Library
- }}Nawrocki, E. P. and Eddy, S. R. 2007. Query-Dependent banding (QDB) for faster RNA similarity searches. PLoS Comput. Biol. 3, 3.Google Scholar
Cross Ref
- }}Pearson, W. R. and Lipman, D. J. 1988. Improved tools for biological sequence comparison. Proc. Nat. Acad. Sci. 85, 8, 2444--2448.Google Scholar
Cross Ref
- }}Rivas, E. and Eddy, S. R. 2001. Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinf. 2, 1.Google Scholar
Cross Ref
- }}RNACAD. RNACAD website. http://www.soe.ucsc.edu/~mpbrown/rnacad/.Google Scholar
- }}Schmidt, B., Schroder, H., and Schimmler, M. 2002. Massively parallel solutions for molecular sequence analysis. In Proceedings of the 1st International Workshop on High Performance Computational Biology (IPDPS’02). Google Scholar
Digital Library
- }}Searls, D. 1992. The linguistics of DNA. Amer. Sci. 80, 579--591.Google Scholar
- }}Shiveley, R. 2006. Dual-Core Intel Itanium 2 processors deliver unbeatable flexibility and performance to the enterprose. Technol. Intel Mag.Google Scholar
- }}Srinivas, M. and Patnaik, L. M. 1994. Genetic algorithms: A survey. IEEE Comput. 27, 6, 17--26. Google Scholar
Digital Library
- }}Storz, G. 2002. An expanding universe of noncoding RNAs. Sci. 296, 5571, 1260--1263.Google Scholar
- }}Vienna. Vienna RNA software website. http://www.tbi.univie.ac.at/RNA/.Google Scholar
- }}Washietl, S., Hofacker, I. L., Lukasser, M., Huttenhofer, A., and Stadler, P. F. 2003. Mapping of conserved RNA secondary structures predicts thousands of functional non-coding RNAs in the human genome. Nature Biotechnol. 23, 1390--1390.Google Scholar
- }}Weinberg, Z. and Ruzzo, W. L. 2004. Faster genome annotation of non-coding RNA families without loss of accuracy. In Proceedings of the 8th Annual International Conference on Research in Computational Molecular Biology (RECOMB’04). 243--251. Google Scholar
Digital Library
- }}Weinberg, Z. and Ruzzo, W. L. 2006. Sequence-Based heuristics for faster annotation of non-coding RNA families. Bioinf. 22, 1, 35--39. Google Scholar
Digital Library
- }}Younger, D. 1967. Recognition and parsing of context-free languages in time O(n3). Inform. Control 10, 2, 189--208.Google Scholar
Cross Ref
Index Terms
Hardware-Accelerated RNA Secondary-Structure Alignment
Recommendations
Animal Actin Phylogeny and RNA Secondary Structure Study
Animal actin is a diverse and evolutionarily ancient protein. Actin genes and their corresponding protein sequences were used to infer phylogenetic affiliations. The study indicated that several species appear to be polyphyletic and several unrelated ...
RNA secondary structure prediction with simple pseudoknots
APBC '04: Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29Pseudoknots are widely occurring structural motifs in RNA. Pseudoknots have been shown to be functionally important in different RNAs which play regulatory, catalytic, or structural roles in cells. Current biophysical methods to identify the presence of ...
An Efficient Alignment Algorithm for Searching Simple Pseudoknots over Long Genomic Sequence
Structural alignment has been shown to be an effective computational method to identify structural noncoding RNA (ncRNA) candidates as ncRNAs are known to be conserved in secondary structures. However, the complexity of the structural alignment ...








Comments