skip to main content
research-article

Hardware-Accelerated RNA Secondary-Structure Alignment

Published:01 September 2010Publication History
Skip Abstract Section

Abstract

The search for homologous RNA molecules---sequences of RNA that might behave simiarly due to similarity in their physical (secondary) structure---is currently a computationally intensive task. Moreover, RNA sequences are populating genome databases at a pace unmatched by gains in standard processor performance. While software tools such as Infernal can efficiently find homologies among RNA families and genome databases of modest size, the continuous advent of new RNA families and the explosive growth in volume of RNA sequences necessitate a faster approach.

This work introduces two different architectures for accelerating the task of finding homologous RNA molecules in a genome database. The first architecture takes advantage of the tree-like configuration of the covariance models used to represent the consensus secondary structure of an RNA family and converts it directly into a highly-pipelined processing engine. Results for this architecture show a 24× speedup over Infernal when processing a small RNA model. It is estimated that the architecture could potentially offer several thousands of times speedup over Infernal on larger models, provided that there are sufficient hardware resources available.

The second architecture is introduced to address the steep resource requirements of the first architecture. It utilizes a uniform array of processing elements and schedules all of the computations required to scan for an RNA homolog onto those processing elements. The estimated speedup for this architecture over the Infernal software package ranges from just under 20× to over 2,350×.

References

  1. }}Ahmadi, H. and Denzel, W. 1989. A survey of modern high-performance switching techniques. IEEE J. Select. Areas Comm. 7, 7, 1091--1103.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. }}Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. 1990. Basic local alignment search tool. J. Molecul. Biol. 215, 403--410.Google ScholarGoogle ScholarCross RefCross Ref
  3. }}Aluru, S., Fuamura, N., and Mehrotra, K. 2003. Parallel biological sequence comparison using prefix computations. J. Paral. Distrib. Comput. 63, 3, 264--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. }}Batcher, K. E. 1968. Sorting networks and their applications. In Proceedings of the AFIPS Spring Joint Computer Conference 32. 307--314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. }}Brown, M. P. S. 2000. Small subunit ribosomal RNA modeling using stochastic context-free grammars. In Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology. AAAI Press, 57--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. }}Cocke, J. 1969. Programming Languages and Their Compilers: Preliminary Notes. Courant Institute of Mathematical Sciences, New York University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. }}Coffman, E. and Graham, R. 1972. Optimal scheduling for two-processor systems. Acta Inf. 1, 3, 200--213.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. }}Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. 1998. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press.Google ScholarGoogle Scholar
  9. }}Eddy, S. R. 2006. Computational analysis of RNAs. Cold Spring Harbor Symp. Quantitat. Biol. 71, 1, 117--128.Google ScholarGoogle ScholarCross RefCross Ref
  10. }}Eddy, S. R. and Durbin, R. 1994. RNA sequence analysis using covariance models. Nucleic Acids Res. 22, 11, 2079--2088.Google ScholarGoogle ScholarCross RefCross Ref
  11. }}Fishburn, P. 1985. Interval Orders and Interval Graphs. John Wiley & Sons, New York.Google ScholarGoogle Scholar
  12. }}Goke, L. R. and Lipovski, G. J. 1973. Banyan networks for partitioning multiprocessor systems. In Proceedings of the 1st Annual Symposium on Computer Architecture (ISCA’73). 21--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. }}Griffiths-Jones, S., Moxon, S., Marshall, M., Khanna, A., Eddy, S. R., and Bateman, A. 2005. Rfam: Annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33.Google ScholarGoogle Scholar
  14. }}HMMER. HMMER Website. http://hmmer.janelia.org {July 2007}.Google ScholarGoogle Scholar
  15. }}Hu, T. C. 1961. Parallel sequencing and assembly line problems. Oper. Res. 9, 6, 841--848.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. }}Infernal. Infernal website. http://infernal.janelia.org.Google ScholarGoogle Scholar
  17. }}Kasami, T. 1965. An efficient recognition and syntax algortihm for context-free languages. Tech. rep. Scientific rep. AFCRL-65-758, Air Force Cambridge Research Laboratory, Bedford, MA.Google ScholarGoogle Scholar
  18. }}Kwok, Y.-K. and Ahmad, I. 1999. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31, 4, 406--471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. }}Lenhof, H.-P., Reinert, K., and Vingron, M. 1998. A polyhedral approach to RNA sequence structure alignment. J. Comput. Biol. 5, 3, 517--530.Google ScholarGoogle ScholarCross RefCross Ref
  20. }}Liu, T. and Schmidt, B. 2005. Parallel RNA secondary structure prediction using stochastic context-free grammars. Concurr. Comput. Pract. Exper. 17, 14, 1669--1685. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. }}Lowe, T. and Eddy, S. R. 1997. tRNAscan-SE: A program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 5, 955--964.Google ScholarGoogle ScholarCross RefCross Ref
  22. }}Mattick, J. S. 2003. Challenging the dogma: The hidden layer of non-protein-coding RNAs in complex organisms. BioEssays 25, 930--939.Google ScholarGoogle ScholarCross RefCross Ref
  23. }}Moscola, J. 2008. Techniques for hardware-accelerated parsing for network and bioinformatic applications. Ph.D. thesis, Washington University in St. Louis, St. Louis, MO. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. }}Nawrocki, E. P. and Eddy, S. R. 2007. Query-Dependent banding (QDB) for faster RNA similarity searches. PLoS Comput. Biol. 3, 3.Google ScholarGoogle ScholarCross RefCross Ref
  25. }}Pearson, W. R. and Lipman, D. J. 1988. Improved tools for biological sequence comparison. Proc. Nat. Acad. Sci. 85, 8, 2444--2448.Google ScholarGoogle ScholarCross RefCross Ref
  26. }}Rivas, E. and Eddy, S. R. 2001. Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinf. 2, 1.Google ScholarGoogle ScholarCross RefCross Ref
  27. }}RNACAD. RNACAD website. http://www.soe.ucsc.edu/~mpbrown/rnacad/.Google ScholarGoogle Scholar
  28. }}Schmidt, B., Schroder, H., and Schimmler, M. 2002. Massively parallel solutions for molecular sequence analysis. In Proceedings of the 1st International Workshop on High Performance Computational Biology (IPDPS’02). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. }}Searls, D. 1992. The linguistics of DNA. Amer. Sci. 80, 579--591.Google ScholarGoogle Scholar
  30. }}Shiveley, R. 2006. Dual-Core Intel Itanium 2 processors deliver unbeatable flexibility and performance to the enterprose. Technol. Intel Mag.Google ScholarGoogle Scholar
  31. }}Srinivas, M. and Patnaik, L. M. 1994. Genetic algorithms: A survey. IEEE Comput. 27, 6, 17--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. }}Storz, G. 2002. An expanding universe of noncoding RNAs. Sci. 296, 5571, 1260--1263.Google ScholarGoogle Scholar
  33. }}Vienna. Vienna RNA software website. http://www.tbi.univie.ac.at/RNA/.Google ScholarGoogle Scholar
  34. }}Washietl, S., Hofacker, I. L., Lukasser, M., Huttenhofer, A., and Stadler, P. F. 2003. Mapping of conserved RNA secondary structures predicts thousands of functional non-coding RNAs in the human genome. Nature Biotechnol. 23, 1390--1390.Google ScholarGoogle Scholar
  35. }}Weinberg, Z. and Ruzzo, W. L. 2004. Faster genome annotation of non-coding RNA families without loss of accuracy. In Proceedings of the 8th Annual International Conference on Research in Computational Molecular Biology (RECOMB’04). 243--251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. }}Weinberg, Z. and Ruzzo, W. L. 2006. Sequence-Based heuristics for faster annotation of non-coding RNA families. Bioinf. 22, 1, 35--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. }}Younger, D. 1967. Recognition and parsing of context-free languages in time O(n3). Inform. Control 10, 2, 189--208.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Hardware-Accelerated RNA Secondary-Structure Alignment

          Recommendations

          Reviews

          Ned Chapin

          Computational support for biologists' work focused on genomes has placed heavy demands on computer resources. This bioinformatics paper reports a proposal to enable decreasing the runtimes required for one important computational task in molecular biology. The reported improvement factor (depending on specifics) ranges from 20 to over 2,000, compared to the runtimes experienced when running the usually employed software tool (Infernal) for the task. The range of instances of runtimes for Infernal may take from a few minutes to a year. The computational task of concern is to find out what the three-dimensional shapes (secondary-structure alignments) of noncoding RNA molecules are. These shapes can be extremely complex. The noncoding RNA molecules do not translate to form proteins, but are critically involved in many other functions of cells that make up living things. Noncoding RNA molecules with similar secondary-structure alignments may have similar biological functions, even though their components and environments may be very different. This paper opens with a brief abstract and an introduction, devoted mostly to orientation. "Baseline Architecture" is the central and key section of the paper, and includes some runtime comparisons for a highly pipelined architecture. Other sections include "Processor Array Architecture," "Scheduling Computations on the Processor Array," and "Analysis of the Processor Array Architecture," which place an emphasis on computational resource requirements. The paper ends with a short summary and a list of 37 references. It's good to see a paper that addresses ways to alleviate a significant difficulty in bioinformatics and, in addition, carries forward work that was done in earning a PhD. Both information technology and bioinformatics are fast-moving fields. I noticed two potentially very relevant matters that essentially do not get any explicit coverage (either positive or negative) in this September 2010 paper. While one of the references (published in 2002) provides some older coverage of parallel processing, none of the references provide any specific coverage of virtualization, let alone good, recent coverage. Even so, it appears to me that applying virtualization technology with support from parallel processing technology is very likely to enable additional major computational performance improvements, by recasting some of what has been described in this secondary-structure alignments paper. Online Computing Reviews Service

          Access critical reviews of Computing literature here

          Become a reviewer for Computing Reviews.

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Reconfigurable Technology and Systems
            ACM Transactions on Reconfigurable Technology and Systems  Volume 3, Issue 3
            September 2010
            231 pages
            ISSN:1936-7406
            EISSN:1936-7414
            DOI:10.1145/1839480
            Issue’s Table of Contents

            Copyright © 2010 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 September 2010
            • Accepted: 1 June 2009
            • Revised: 1 April 2009
            • Received: 1 December 2008
            Published in trets Volume 3, Issue 3

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!