skip to main content
research-article

Parallelizing dynamic programming through rank convergence

Published:06 February 2014Publication History
Skip Abstract Section

Abstract

This paper proposes an efficient parallel algorithm for an important class of dynamic programming problems that includes Viterbi, Needleman-Wunsch, Smith-Waterman, and Longest Common Subsequence. In dynamic programming, the subproblems that do not depend on each other, and thus can be computed in parallel, form stages or wavefronts. The algorithm presented in this paper provides additional parallelism allowing multiple stages to be computed in parallel despite dependences among them. The correctness and the performance of the algorithm relies on rank convergence properties of matrix multiplication in the tropical semiring, formed with plus as the multiplicative operation and max as the additive operation.

This paper demonstrates the efficiency of the parallel algorithm by showing significant speed ups on a variety of important dynamic programming problems. In particular, the parallel Viterbi decoder is up-to 24x faster (with 64 processors) than a highly optimized commercial baseline.

References

  1. L. Allison and T. I. Dix. A bit-string longest-common- subsequence algorithm. Information Processing Letters, 23 (6):305--310, Dec. 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Aluru, N. Futamura, and K. Mehrotra. Parallel biological sequence comparison using prefix computations. J. Parallel Distrib. Comput., 63(3):264--272, 2003. ISSN 0743-7315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Apostolico, M. J. Atallah, L. L. Larmore, and S. Mc- Faddin. Efficient parallel algorithms for string editing and related problems. SIAM J. Comput., 19(5):968--988, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Bellman. Dynamic Programming. Princeton University Press, 1957.Google ScholarGoogle Scholar
  5. M. Crochemore, C. S. Iliopoulos, Y. J. Pinzon, and J. F. Reid. A fast and practical bit-vector algorithm for the longest common subsequence problem. Information Processing Letters, 80(6):279--285, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Deorowicz. Bit-parallel algorithm for the constrained longest common subsequence problem. Fundamenta Informaticae, 99(4):409--433, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Develin, F. Santos, and B. Sturmfels. On the rank of a tropical matrix. Combinatorial and computational geometry, 52:213--242, 2005.Google ScholarGoogle Scholar
  8. M. Farrar. Striped Smith-Waterman speeds database searches six times over other SIMD implementations. Bioinformatics, 23(2):156--161, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Fettweis and H. Meyr. Parallel Viterbi algorithm implementation: breaking the ACS-bottleneck. IEEE Transactions on Communications, 37(8):785--790, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  10. Z. Galil and K. Park. Parallel algorithms for dynamic programming recurrences with more than O(1) dependency. Journal of Parallel and Distributed Computing, 21(2):213--222, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W. D. Hillis and G. L. Steele, Jr. Data parallel algorithms. Communications of the ACM, 29(12):1170--1183, Dec. 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. S. Hirschberg. A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341--343, June 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Hyyro. Bit-parallel LCS-length computation revisited. In Proc. 15th Australasian Workshop on Combinatorial Algorithms, pages 16--27, 2004.Google ScholarGoogle Scholar
  14. Intel C/C++ Compiler, http://software.intel.com/en-us/c-compilers, 2013.Google ScholarGoogle Scholar
  15. Intel MPI Library, http://software.intel.com/en-us/intel-mpi-library/, 2013.Google ScholarGoogle Scholar
  16. R. E. Ladner and M. J. Fischer. Parallel prefix computation. Journal of the ACM, 27(4):831--838, Oct. 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. I. Li, W. Shum, and K. Truong. 160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA). BMC Bioinformatics, 8(1):1--7, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  18. L. Ligowski and W. Rudnicki. An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases. In IEEE International Symposium on Parallel Distributed Processing (IPDPS), pages 1--8, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W. S. Martins, J. B. D. Cuvillo, F. J. Useche, K. B. Theobald, and G. Gao. A multithreaded parallel implementation of a dynamic programming algorithm for sequence comparison. In Pacific Symposium on Biocomputing, pages 311--322, 2001.Google ScholarGoogle Scholar
  20. Y. Muraoka. Parallelism exposure and exploitation in programs. PhD thesis, University of Illinois at Urbana-Champaign, 1971. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. MVAPICH: MPI over InfiniBand, http://mvapich.cse.ohio-state.edu/, 2013.Google ScholarGoogle Scholar
  22. National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/, 2013.Google ScholarGoogle Scholar
  23. S. B. Needleman and C. D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48: 443--453, 1970.Google ScholarGoogle ScholarCross RefCross Ref
  24. W. W. Peterson and E. J. Weldon. Error-Correcting Codes. MIT Press: Cambridge, Mass, 1972.Google ScholarGoogle Scholar
  25. M. Puschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, Special issue on "Program Generation, Optimization, and Adaptation", 93:232--275, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  26. T. Smith and M. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195--197, 1981.Google ScholarGoogle ScholarCross RefCross Ref
  27. Stampede: Dell PowerEdge C8220 Cluster with Intel Xeon Phi coprocessors. Texas Advanced Computing Center, http://www.tacc.utexas.edu/resources/hpc.Google ScholarGoogle Scholar
  28. Top500 Supercompute Sites, http://www.top500.org, 2013.Google ScholarGoogle Scholar
  29. L. G. Valiant, S. Skyum, S. Berkowitz, and C. Rackoff. Fast parallel computation of polynomials using few processors. SIAM Journal of Computing, 12(4):641--644, 1983.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Viterbi. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2):260--269, 1967. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Parallelizing dynamic programming through rank convergence

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!