Abstract
Large-scale protein sequence comparison is an important but compute-intensive task in molecular biology. BLASTP is the most popular tool for comparative analysis of protein sequences. In recent years, an exponential increase in the size of protein sequence databases has required either exponentially more running time or a cluster of machines to keep pace. To address this problem, we have designed and built a high-performance FPGA-accelerated version of BLASTP, Mercury BLASTP. In this article, we describe the architecture of the portions of the application that are accelerated in the FPGA, and we also describe the integration of these FPGA-accelerated portions with the existing BLASTP software. We have implemented Mercury BLASTP on a commodity workstation with two Xilinx Virtex-II 6000 FPGAs. We show that the new design runs 11--15 times faster than software BLASTP on a modern CPU while delivering close to 99% identical results.
- Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D. J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucl. Acids Res. 25, 17, 3389--3402.Google Scholar
Cross Ref
- Altschul, S. F. and Gish, W. 1996. Local alignment statistics. Metho. Enzymol. 266, 460--80.Google Scholar
Cross Ref
- Buhler, J. D., Lancaster, J. M., Jacob, A. C., and Chamberlain, R. D. 2007. Mercury BLASTN: Faster DNA sequence comparison using a streaming hardware architecture. In Proceedings of Reconfigurable Systems Summer Institute.Google Scholar
- Chamberlain, R. D. et al. 2003. The Mercury System: Exploiting truly fast hardware for data search. In Proceedings of the International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI). 65--72. Google Scholar
Digital Library
- Chamberlain, R. D. and Shands, B. 2005. Streaming data from disk store to application. In Proceedings of the International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI). 17--23.Google Scholar
- Dayhoff, M. O., Schwartz, R., and Orcutt, B. C. 1978. A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure 5, 345--52.Google Scholar
- Henikoff S. and Henikoff, J. G. 1992. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. 89, 22, 10915--10919.Google Scholar
Cross Ref
- Herbordt, M. C., Model, J., Gu, Y., Sukhwani, B., and VanCourt, T. 2006. Single pass, BLAST-like approximate string matching on FPGAs. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM). 217--26. Google Scholar
Digital Library
- Herbordt, M. C., Model, J., Sukhwani, B., Gu, Y., and VanCourt, T. 2007. Single pass streaming BLAST on FPGAs. Parall. Comput. 33, 10-11, 741--756. Google Scholar
Digital Library
- Hirschberg, J. D., Hughey, R., and Karplus, K. 1996. Kestrel: A programmable array for sequence analysis. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors (ASAP). 25--34. Google Scholar
Digital Library
- Hoang, D. T. 1993. Searching genetic databases on Splash 2. In IEEE Workshop on FPGAs for Custom Computing Machines (FCCM). 185--191.Google Scholar
Cross Ref
- Krishnamurthy, P., Buhler, J., Chamberlain, R., Franklin, M., Gyang, K., Jacob, A., and Lancaster, J. 2007. Biosequence similarity search on the Mercury system. J. VLSI Signal Process. 49, 101--121. Google Scholar
Digital Library
- Krishnamurthy, P., Buhler, J., Chamberlain, R., Franklin, M., Gyang, K., and Lancaster, J. 2004. Biosequence similarity search on the Mercury system. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP). 365--375. Google Scholar
Digital Library
- Lancaster, J., Buhler, J., and Chamberlain, R. D. 2005. Acceleration of ungapped extension in Mercury BLAST. In Proceedings of 7th Workshop on Media and Streaming Processors. 50--57.Google Scholar
- Lancaster, J., Buhler, J., and Chamberlain, R. D. 2008. Acceleration of ungapped extension in Mercury BLAST. Intl. J. of Embed. Sys. To appear.Google Scholar
- Lavenier, D., Guyetant, S., Derrien, S., and Rubini, S. 2003. A reconfigurable parallel disk system for filtering genomic banks. In Proceedings of Engineering of Reconfigurable Systems and Algorithms (ERSA). 154--166.Google Scholar
- Lin, H., Ma, X., Chandramohan, P., Geist, A. and Samatova, N. 2005. Efficient data access for parallel BLAST. In Proceedings of the International Conference on Parallel and Distributed Processing Symposium (IPDPS). 72.2. Google Scholar
Digital Library
- Margulies, M., Egholm, M., Altman, W. E., Attiya, S., Bader, J. S., et al. 2005. Genome sequencing in microfabricated high-density picoliter reactors. Nature 437, 326--7.Google Scholar
Cross Ref
- McGinnis, S. and Madden, T. L. 2004. BLAST: At the core of a powerful and diverse set of sequence analysis tools. Nuc. Acids Res. 32, 20--5.Google Scholar
Cross Ref
- Muriki, K., Underwood, K. D., and Sass, R. 2005. RC-BLAST: Towards a portable, cost-effective open source hardware implementation. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS). 196.2. Google Scholar
Digital Library
- Portugaly, E. and Ninio, M. 2004. HMMERHEAD - accelerating HMM searches on large databases. In Proceedings of the International Conference on Research in Molecular Biology (RECOMB). 250--251.Google Scholar
- Rangwala, H., Lantz, E., Musselman, R., Pinnow, K., Smith, B., and Wallenfelt, B. 2005. Massively parallel BLAST for the Blue Gene/L. In High Availability and Performance Computing Workshop.Google Scholar
- Schaffer, A. A., Wolf, Y. I., Ponging, C. P., Koonin, E. V., Aravind, L., and Altschul, S. F. 1999. IMPALA: Matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics 15, 1000--11.Google Scholar
Cross Ref
- Smith, T. F. and Waterman, M. S. 1981. Identification of common molecular subsequences. J. Molec. Biol. 147, 195--197.Google Scholar
Cross Ref
- Sotiriades, E., Dollas, A., and Kozanitis, C. 2006. Some initial results on hardware BLAST acceleration with a reconfigurable architecture. In Proceedings of the 5th IEEE International Workshop on High Performance Computational Biology (HiCOMB). Google Scholar
Digital Library
- Swiss Institute of Bioinformatics. 2006. Growth of Swiss-Prot. http://www.expasy.org/sprot/ relnotes/#SPstat.Google Scholar
- Wang, T. and Stormo, G. D. 2005. Identifying the conserved network of cis-regulatory sites of a eukaryotic genome. Proc. Natl. Acad. Sci. 102, 17400--5.Google Scholar
Cross Ref
- Yamaguchi, Y., Maruyama, T., and Konagaya, A. 2002. High speed homology search with FPGAs. In Proceedings of the Pacific Symposium on Biocomputing. 271--282.Google Scholar
Index Terms
Mercury BLASTP: Accelerating Protein Sequence Alignment
Recommendations
NCBI BLASTP on High-Performance Reconfigurable Computing Systems
The BLAST sequence alignment program is a central application in bioinformatics. The de facto standard version, NCBI BLAST, uses complex heuristics that make it challenging to simultaneously achieve both high performance and exact agreement. We propose ...
Fast and accurate NCBI BLASTP: acceleration with multiphase FPGA-based prefiltering
ICS '10: Proceedings of the 24th ACM International Conference on SupercomputingNCBI BLAST has become the de facto standard in bioinformatic approximate string matching and so its acceleration is of fundamental importance. The problem is that it uses complex heuristics which make it difficult to simultaneously achieve both ...
BLAST Tree: Fast Filtering for Genomic Sequence Classification
BIBE '10: Proceedings of the 2010 IEEE International Conference on Bioinformatics and BioengineeringWith the advent of next-generation sequencing and culture-independent methods, we now are accumulating an enormous amount of metagenomic data from microbial communities. These data sets are large, hard to assemble, and might encode rare or novel ...






Comments