Abstract
This paper proposes an efficient parallel algorithm for an important class of dynamic programming problems that includes Viterbi, Needleman-Wunsch, Smith-Waterman, and Longest Common Subsequence. In dynamic programming, the subproblems that do not depend on each other, and thus can be computed in parallel, form stages or wavefronts. The algorithm presented in this paper provides additional parallelism allowing multiple stages to be computed in parallel despite dependences among them. The correctness and the performance of the algorithm relies on rank convergence properties of matrix multiplication in the tropical semiring, formed with plus as the multiplicative operation and max as the additive operation.
This paper demonstrates the efficiency of the parallel algorithm by showing significant speed ups on a variety of important dynamic programming problems. In particular, the parallel Viterbi decoder is up-to 24x faster (with 64 processors) than a highly optimized commercial baseline.
- L. Allison and T. I. Dix. A bit-string longest-common- subsequence algorithm. Information Processing Letters, 23 (6):305--310, Dec. 1986. Google Scholar
Digital Library
- S. Aluru, N. Futamura, and K. Mehrotra. Parallel biological sequence comparison using prefix computations. J. Parallel Distrib. Comput., 63(3):264--272, 2003. ISSN 0743-7315. Google Scholar
Digital Library
- A. Apostolico, M. J. Atallah, L. L. Larmore, and S. Mc- Faddin. Efficient parallel algorithms for string editing and related problems. SIAM J. Comput., 19(5):968--988, 1990. Google Scholar
Digital Library
- R. Bellman. Dynamic Programming. Princeton University Press, 1957.Google Scholar
- M. Crochemore, C. S. Iliopoulos, Y. J. Pinzon, and J. F. Reid. A fast and practical bit-vector algorithm for the longest common subsequence problem. Information Processing Letters, 80(6):279--285, 2001. Google Scholar
Digital Library
- S. Deorowicz. Bit-parallel algorithm for the constrained longest common subsequence problem. Fundamenta Informaticae, 99(4):409--433, 2010. Google Scholar
Digital Library
- M. Develin, F. Santos, and B. Sturmfels. On the rank of a tropical matrix. Combinatorial and computational geometry, 52:213--242, 2005.Google Scholar
- M. Farrar. Striped Smith-Waterman speeds database searches six times over other SIMD implementations. Bioinformatics, 23(2):156--161, 2007. Google Scholar
Digital Library
- G. Fettweis and H. Meyr. Parallel Viterbi algorithm implementation: breaking the ACS-bottleneck. IEEE Transactions on Communications, 37(8):785--790, 1989.Google Scholar
Cross Ref
- Z. Galil and K. Park. Parallel algorithms for dynamic programming recurrences with more than O(1) dependency. Journal of Parallel and Distributed Computing, 21(2):213--222, 1994. Google Scholar
Digital Library
- W. D. Hillis and G. L. Steele, Jr. Data parallel algorithms. Communications of the ACM, 29(12):1170--1183, Dec. 1986. Google Scholar
Digital Library
- D. S. Hirschberg. A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341--343, June 1975. Google Scholar
Digital Library
- H. Hyyro. Bit-parallel LCS-length computation revisited. In Proc. 15th Australasian Workshop on Combinatorial Algorithms, pages 16--27, 2004.Google Scholar
- Intel C/C++ Compiler, http://software.intel.com/en-us/c-compilers, 2013.Google Scholar
- Intel MPI Library, http://software.intel.com/en-us/intel-mpi-library/, 2013.Google Scholar
- R. E. Ladner and M. J. Fischer. Parallel prefix computation. Journal of the ACM, 27(4):831--838, Oct. 1980. Google Scholar
Digital Library
- I. Li, W. Shum, and K. Truong. 160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA). BMC Bioinformatics, 8(1):1--7, 2007.Google Scholar
Cross Ref
- L. Ligowski and W. Rudnicki. An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases. In IEEE International Symposium on Parallel Distributed Processing (IPDPS), pages 1--8, 2009. Google Scholar
Digital Library
- W. S. Martins, J. B. D. Cuvillo, F. J. Useche, K. B. Theobald, and G. Gao. A multithreaded parallel implementation of a dynamic programming algorithm for sequence comparison. In Pacific Symposium on Biocomputing, pages 311--322, 2001.Google Scholar
- Y. Muraoka. Parallelism exposure and exploitation in programs. PhD thesis, University of Illinois at Urbana-Champaign, 1971. Google Scholar
Digital Library
- MVAPICH: MPI over InfiniBand, http://mvapich.cse.ohio-state.edu/, 2013.Google Scholar
- National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/, 2013.Google Scholar
- S. B. Needleman and C. D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48: 443--453, 1970.Google Scholar
Cross Ref
- W. W. Peterson and E. J. Weldon. Error-Correcting Codes. MIT Press: Cambridge, Mass, 1972.Google Scholar
- M. Puschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, Special issue on "Program Generation, Optimization, and Adaptation", 93:232--275, 2005.Google Scholar
Cross Ref
- T. Smith and M. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195--197, 1981.Google Scholar
Cross Ref
- Stampede: Dell PowerEdge C8220 Cluster with Intel Xeon Phi coprocessors. Texas Advanced Computing Center, http://www.tacc.utexas.edu/resources/hpc.Google Scholar
- Top500 Supercompute Sites, http://www.top500.org, 2013.Google Scholar
- L. G. Valiant, S. Skyum, S. Berkowitz, and C. Rackoff. Fast parallel computation of polynomials using few processors. SIAM Journal of Computing, 12(4):641--644, 1983.Google Scholar
Digital Library
- A. Viterbi. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2):260--269, 1967. Google Scholar
Digital Library
Index Terms
Parallelizing dynamic programming through rank convergence
Recommendations
Low-Rank Methods for Parallelizing Dynamic Programming Algorithms
Special Issue on PPOPP 2014This article proposes efficient parallel methods for an important class of dynamic programming problems that includes Viterbi, Needleman-Wunsch, Smith-Waterman, and Longest Common Subsequence. In dynamic programming, the subproblems that do not depend ...
Parallelizing dynamic programming through rank convergence
PPoPP '14: Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programmingThis paper proposes an efficient parallel algorithm for an important class of dynamic programming problems that includes Viterbi, Needleman-Wunsch, Smith-Waterman, and Longest Common Subsequence. In dynamic programming, the subproblems that do not ...
Provably Efficient Scheduling of Cache-oblivious Wavefront Algorithms
SPAA '17: Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and ArchitecturesIterative wavefront algorithms for evaluating dynamic programming recurrences exploit optimal parallelism but show poor cache performance. Tiled-iterative wavefront algorithms achieve optimal cache complexity and high parallelism but are cache-aware and ...







Comments