Abstract
Suffix Array (SA) is a data structure formed by sorting the suffixes of a string into lexicographic order. SAs have been used in a variety of applications, most notably in pattern matching and Burrows-Wheeler Transform (BWT) based lossless data compression. SAs have also become the data structure of choice for many, if not all, string processing problems to which suffix tree methodology is applicable. Over the last two decades researchers have proposed many suffix array construction algorithm (SACAs). We do a systematic study of the main classes of SACAs with the intent of mapping them onto a data parallel architecture like the GPU. We conclude that skew algorithm [12], a linear time recursive algorithm, is the best candidate for GPUs as all its phases can be efficiently mapped to a data parallel hardware. Our OpenCL implementation of skew algorithm achieves a throughput of up to 25 MStrings/sec and a speedup of up to 34x and 5.8x over a single threaded CPU implementation using a discrete GPU and APU respectively. We also compare our OpenCL implementation against the fastest known CPU implementation based on induced copying and achieve a speedup of up to 3.7x. Using SA we construct BWT on GPU and achieve a speedup of 11x over the fastest known BWT on GPU.
Suffix arrays are often augmented with the longest common prefix (LCP) information. We design a novel high-performance parallel algorithm for computing LCP on the GPU. Our GPU implementation of LCP achieves a speedup of up to 25x and 4.3x on discrete GPU and APU respectively.
- M. I. Abouelhoda, S. Kurtz, and E. Ohlebusch. Replacing suffix trees with enhanced suffix arrays. J. of Discrete Algorithms, 2 (1): 53--86, Mar. 2004. Google Scholar
Digital Library
- AMD. AMD accelerated parallel processing programming guide. http://developer.amd.com/sdks/amdappsdk/assets/amd_accelerated_parallel_processing_opencl_programming_guide.pdf, 2012.Google Scholar
- AMD. AMD Southern Island instruction set architecture. http://developer.amd.com/sdks/amdappsdk/assets/AMD_Southern_islands_Instruction_Set_Architecture.pdf, 2012.Google Scholar
- J. L. Bentley and R. Sedgewick. Fast algorithms for sorting and searching strings. In Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms, SODA '97, pages 360--369, Philadelphia, PA, USA, 1997. Google Scholar
Digital Library
- M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, Palo Alto, California, 1994.Google Scholar
- A. Davidson, D. Tarjan, M. Garland, and J. Owens. Efficient parallel merge sort for fixed and variable length keys. In Innovative Parallel Computing (InPar), 2012, pages 1--9, may 2012.Google Scholar
Cross Ref
- M. Farach. Optimal suffix tree construction with large alphabets. In Foundations of Computer Science, 1997. Proceedings., 38th Annual Symposium on, pages 137--143, oct 1997. Google Scholar
Digital Library
- P. Ferragina and G. Manzini. Opportunistic data structures with applications. In Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on, pages 390--398, 2000. Google Scholar
Digital Library
- K. O. W. Group. Opencl 1.2 specifications. http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf, 2012.Google Scholar
- R. Homann, D. Fleer, R. Giegerich, and M. Rehmsmeier. mkESA: enhanced suffix array construction tool. Bioinformatics, 25 (8): 1084--1085, 2009. Google Scholar
Digital Library
- H. Itoh and H. Tanaka. An efficient method for in memory construction of suffix arrays. In Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware, SPIRE '99, pages 81--88, Washington, DC, USA, 1999. Google Scholar
Digital Library
- J. Karkkainen and P. Sanders. Simple linear work suffix array construction. In Automata, Languages and Programming, volume 2719 of Lecture Notes in Computer Science, pages 943--955. 2003. Google Scholar
Digital Library
- J. Karkkainen, P. Sanders, and S. Burkhardt. Linear work suffix array construction. J. ACM, 53 (6): 918--936, Nov. 2006. Google Scholar
Digital Library
- R. M. Karp, R. E. Miller, and A. L. Rosenberg. Rapid identification of repeated patterns in strings, trees and arrays. In Proceedings of the fourth annual ACM symposium on Theory of computing, STOC '72, pages 125--136, New York, NY, USA, 1972. Google Scholar
Digital Library
- T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching, CPM '01, pages 181--192, London, UK, UK, 2001. Google Scholar
Digital Library
- F. Kulla and P. Sanders. Scalable parallel suffix array construction. Parallel Computing, 33 (9): 605612, 2007. Google Scholar
Digital Library
- N. J. Larsson and K. Sadakane. Faster suffix sorting. Technical Report LU-CSTR: 99--214, Dept. of Computer Science, Lund University, Sweden, 1999.Google Scholar
- U. Manber and G. W. Myers. Suffix arrays: a new method for on-line string searches. In Proceedings of the first ACM-SIAM Symposium on Discrete Algorithms, pages 319--327, 1990. Google Scholar
Digital Library
- G. Manzini. Two space saving tricks for linear time LCP array computation. In Algorithm Theory - SWAT 2004, volume 3111 of Lecture Notes in Computer Science, pages 372--383. 2004.Google Scholar
Cross Ref
- G. Manzini and P. Ferragina. Engineering a lightweight suffix array construction algorithm. In Proc. 10th Annual European Symposium on Algorithms, volume 2461 of Lecture Notes in Computer Science, pages 698--710. 2002. Google Scholar
Digital Library
- E. M. McCreight. A space-economical suffix tree construction algorithm. J. ACM, 23 (2): 262--272, Apr. 1976. Google Scholar
Digital Library
- P. Mcllroy, K. Bostic, and M. Mcllroy. Engineering radix sort. Computing systems, 6 (1): 5--27, 1993.Google Scholar
- D. Merrill and A. Grimshaw. Revisiting sorting for gpgpu stream architectures. Technical Report CS2010-03, Department of Computer Science, University of Virginia, 2010.Google Scholar
Digital Library
- H. Mohamed and M. Abouelhoda. Parallel suffix sorting based on bucket pointer refinement. In Biomedical Engineering Conference (CIBEC), 2010 5th Cairo International, pages 98 --102, dec. 2010.Google Scholar
Cross Ref
- Y. Mori. libdivsufsort, version 2.0.1. http://code.google.com/p/libdivsufsort/, 2010.Google Scholar
- S. A. N. Futamura and S. Kurtz. Parallel suffix sorting. In Proceedings 9th International Conference on Advanced Computing and Communications, pages 76--81, 2001.Google Scholar
- NVIDIA. NVIDIAs next generation CUDA compute architecture: Kepler GK110. http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf, 2012.Google Scholar
- R. Patel, Y. Zhang, J. Mak, A. Davidson, and J. Owens. Parallel lossless data compression on the gpu. In Innovative Parallel Computing (InPar), 2012, pages 1--9, may 2012.Google Scholar
Cross Ref
- S. J. Puglisi, W. F. Smyth, and A. H. Turpin. A taxonomy of suffix array construction algorithms. ACM Comput. Surv., 39 (2), July 2007. Google Scholar
Digital Library
- N. Satish, M. Harris, and M. Garland. Designing efficient sorting algorithms for manycore gpus. In Parallel Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pages 1--10, may 2009. Google Scholar
Digital Library
- K. Schürmann and J. Stoye. An incomplex algorithm for fast suffix array construction. In Proc. 7th Workshop on Algorithm Engineering and Experiments and 2nd Workshop on Analytic Algorithmics and Combinatorics (ALENEX/ANALCO)(2005), pages 77--85, 2005.Google Scholar
- J. Seward. On the performance of bwt sorting algorithms. In Data Compression Conference, 2000. Proceedings. DCC 2000, pages 173--182, 2000. Google Scholar
Digital Library
- J. Seward. http:\\www.bzip2.org, 2012.Google Scholar
- J. Soman, M. Kumar, K. Kothapalli, and P. Narayanan. Efficient discrete range searching primitives on the gpu with applications. In High Performance Computing (HiPC), 2010 International Conference on, pages 1--10, dec. 2010.Google Scholar
Cross Ref
- P. Weiner. Linear pattern matching algorithms. In Switching and Automata Theory, 1973. SWAT '08. IEEE Conference Record of 14th Annual Symposium on, pages 1--11, oct. 1973. Google Scholar
Digital Library
Index Terms
Parallel suffix array and least common prefix for the GPU
Recommendations
Parallel suffix array and least common prefix for the GPU
PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programmingSuffix Array (SA) is a data structure formed by sorting the suffixes of a string into lexicographic order. SAs have been used in a variety of applications, most notably in pattern matching and Burrows-Wheeler Transform (BWT) based lossless data ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance ComputingThe graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
On the Performance, Energy, and Power of Data-Access Methods in Heterogeneous Computing Systems
IPDPSW '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium WorkshopGraphics processing units (GPUs) have delivered promising speedups in data-parallel applications. A discrete GPU resides on the PCIe interface and has traditionally required data to be moved from the host memory to the GPU memory via PCIe. In certain ...







Comments