skip to main content
research-article

Parallel suffix array and least common prefix for the GPU

Published:23 February 2013Publication History
Skip Abstract Section

Abstract

Suffix Array (SA) is a data structure formed by sorting the suffixes of a string into lexicographic order. SAs have been used in a variety of applications, most notably in pattern matching and Burrows-Wheeler Transform (BWT) based lossless data compression. SAs have also become the data structure of choice for many, if not all, string processing problems to which suffix tree methodology is applicable. Over the last two decades researchers have proposed many suffix array construction algorithm (SACAs). We do a systematic study of the main classes of SACAs with the intent of mapping them onto a data parallel architecture like the GPU. We conclude that skew algorithm [12], a linear time recursive algorithm, is the best candidate for GPUs as all its phases can be efficiently mapped to a data parallel hardware. Our OpenCL implementation of skew algorithm achieves a throughput of up to 25 MStrings/sec and a speedup of up to 34x and 5.8x over a single threaded CPU implementation using a discrete GPU and APU respectively. We also compare our OpenCL implementation against the fastest known CPU implementation based on induced copying and achieve a speedup of up to 3.7x. Using SA we construct BWT on GPU and achieve a speedup of 11x over the fastest known BWT on GPU.

Suffix arrays are often augmented with the longest common prefix (LCP) information. We design a novel high-performance parallel algorithm for computing LCP on the GPU. Our GPU implementation of LCP achieves a speedup of up to 25x and 4.3x on discrete GPU and APU respectively.

References

  1. M. I. Abouelhoda, S. Kurtz, and E. Ohlebusch. Replacing suffix trees with enhanced suffix arrays. J. of Discrete Algorithms, 2 (1): 53--86, Mar. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. AMD. AMD accelerated parallel processing programming guide. http://developer.amd.com/sdks/amdappsdk/assets/amd_accelerated_parallel_processing_opencl_programming_guide.pdf, 2012.Google ScholarGoogle Scholar
  3. AMD. AMD Southern Island instruction set architecture. http://developer.amd.com/sdks/amdappsdk/assets/AMD_Southern_islands_Instruction_Set_Architecture.pdf, 2012.Google ScholarGoogle Scholar
  4. J. L. Bentley and R. Sedgewick. Fast algorithms for sorting and searching strings. In Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms, SODA '97, pages 360--369, Philadelphia, PA, USA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, Palo Alto, California, 1994.Google ScholarGoogle Scholar
  6. A. Davidson, D. Tarjan, M. Garland, and J. Owens. Efficient parallel merge sort for fixed and variable length keys. In Innovative Parallel Computing (InPar), 2012, pages 1--9, may 2012.Google ScholarGoogle ScholarCross RefCross Ref
  7. M. Farach. Optimal suffix tree construction with large alphabets. In Foundations of Computer Science, 1997. Proceedings., 38th Annual Symposium on, pages 137--143, oct 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Ferragina and G. Manzini. Opportunistic data structures with applications. In Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on, pages 390--398, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. O. W. Group. Opencl 1.2 specifications. http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf, 2012.Google ScholarGoogle Scholar
  10. R. Homann, D. Fleer, R. Giegerich, and M. Rehmsmeier. mkESA: enhanced suffix array construction tool. Bioinformatics, 25 (8): 1084--1085, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. Itoh and H. Tanaka. An efficient method for in memory construction of suffix arrays. In Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware, SPIRE '99, pages 81--88, Washington, DC, USA, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Karkkainen and P. Sanders. Simple linear work suffix array construction. In Automata, Languages and Programming, volume 2719 of Lecture Notes in Computer Science, pages 943--955. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Karkkainen, P. Sanders, and S. Burkhardt. Linear work suffix array construction. J. ACM, 53 (6): 918--936, Nov. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. M. Karp, R. E. Miller, and A. L. Rosenberg. Rapid identification of repeated patterns in strings, trees and arrays. In Proceedings of the fourth annual ACM symposium on Theory of computing, STOC '72, pages 125--136, New York, NY, USA, 1972. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching, CPM '01, pages 181--192, London, UK, UK, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. F. Kulla and P. Sanders. Scalable parallel suffix array construction. Parallel Computing, 33 (9): 605612, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. J. Larsson and K. Sadakane. Faster suffix sorting. Technical Report LU-CSTR: 99--214, Dept. of Computer Science, Lund University, Sweden, 1999.Google ScholarGoogle Scholar
  18. U. Manber and G. W. Myers. Suffix arrays: a new method for on-line string searches. In Proceedings of the first ACM-SIAM Symposium on Discrete Algorithms, pages 319--327, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Manzini. Two space saving tricks for linear time LCP array computation. In Algorithm Theory - SWAT 2004, volume 3111 of Lecture Notes in Computer Science, pages 372--383. 2004.Google ScholarGoogle ScholarCross RefCross Ref
  20. G. Manzini and P. Ferragina. Engineering a lightweight suffix array construction algorithm. In Proc. 10th Annual European Symposium on Algorithms, volume 2461 of Lecture Notes in Computer Science, pages 698--710. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. M. McCreight. A space-economical suffix tree construction algorithm. J. ACM, 23 (2): 262--272, Apr. 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Mcllroy, K. Bostic, and M. Mcllroy. Engineering radix sort. Computing systems, 6 (1): 5--27, 1993.Google ScholarGoogle Scholar
  23. D. Merrill and A. Grimshaw. Revisiting sorting for gpgpu stream architectures. Technical Report CS2010-03, Department of Computer Science, University of Virginia, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. Mohamed and M. Abouelhoda. Parallel suffix sorting based on bucket pointer refinement. In Biomedical Engineering Conference (CIBEC), 2010 5th Cairo International, pages 98 --102, dec. 2010.Google ScholarGoogle ScholarCross RefCross Ref
  25. Y. Mori. libdivsufsort, version 2.0.1. http://code.google.com/p/libdivsufsort/, 2010.Google ScholarGoogle Scholar
  26. S. A. N. Futamura and S. Kurtz. Parallel suffix sorting. In Proceedings 9th International Conference on Advanced Computing and Communications, pages 76--81, 2001.Google ScholarGoogle Scholar
  27. NVIDIA. NVIDIAs next generation CUDA compute architecture: Kepler GK110. http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf, 2012.Google ScholarGoogle Scholar
  28. R. Patel, Y. Zhang, J. Mak, A. Davidson, and J. Owens. Parallel lossless data compression on the gpu. In Innovative Parallel Computing (InPar), 2012, pages 1--9, may 2012.Google ScholarGoogle ScholarCross RefCross Ref
  29. S. J. Puglisi, W. F. Smyth, and A. H. Turpin. A taxonomy of suffix array construction algorithms. ACM Comput. Surv., 39 (2), July 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. N. Satish, M. Harris, and M. Garland. Designing efficient sorting algorithms for manycore gpus. In Parallel Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pages 1--10, may 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. K. Schürmann and J. Stoye. An incomplex algorithm for fast suffix array construction. In Proc. 7th Workshop on Algorithm Engineering and Experiments and 2nd Workshop on Analytic Algorithmics and Combinatorics (ALENEX/ANALCO)(2005), pages 77--85, 2005.Google ScholarGoogle Scholar
  32. J. Seward. On the performance of bwt sorting algorithms. In Data Compression Conference, 2000. Proceedings. DCC 2000, pages 173--182, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Seward. http:\\www.bzip2.org, 2012.Google ScholarGoogle Scholar
  34. J. Soman, M. Kumar, K. Kothapalli, and P. Narayanan. Efficient discrete range searching primitives on the gpu with applications. In High Performance Computing (HiPC), 2010 International Conference on, pages 1--10, dec. 2010.Google ScholarGoogle ScholarCross RefCross Ref
  35. P. Weiner. Linear pattern matching algorithms. In Switching and Automata Theory, 1973. SWAT '08. IEEE Conference Record of 14th Annual Symposium on, pages 1--11, oct. 1973. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Parallel suffix array and least common prefix for the GPU

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!