Abstract
The Threaded many-core memory (TMM) model provides a framework to analyze the performance of algorithms on GPUs. Here, we investigate the effectiveness of the TMM model by analyzing algorithms for 3 classic problems -- suffix tree/array for string matching, fast Fourier transform, and merge sort -- under this model. Our findings indicate that the TMM model can explain and predict previously unexplained trends and artifacts in experimental data.
- G. Encarnaijao, N. Sebastiao, and N. Roma. Advantages and GPU implementation of high-performance indexed DNA search based on suffix arrays. In Proc. of HPCS, 2011.Google Scholar
- N. K. Govindaraju et al. High performance discrete Fourier transforms on graphics processors. In Proc. of SC, 2008. Google Scholar
Digital Library
- L. Ma, K. Agrawal, and R. D. Chamberlain. A memory access model for highly-threaded many-core architectures. Future Generation Computer Systems, 30: 202--215, January 2014. Google Scholar
Digital Library
- N. Satish et al. Designing efficient sorting algorithms for manycore GPUs. In Proc. of IPDPS, 2009. Google Scholar
Digital Library
Index Terms
Theoretical analysis of classic algorithms on highly-threaded many-core GPUs
Recommendations
Theoretical analysis of classic algorithms on highly-threaded many-core GPUs
PPoPP '14: Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programmingThe Threaded many-core memory (TMM) model provides a framework to analyze the performance of algorithms on GPUs. Here, we investigate the effectiveness of the TMM model by analyzing algorithms for 3 classic problems -- suffix tree/array for string ...
Extending a highly parallel data mining algorithm to the intel ® many integrated core architecture
Euro-Par'11: Proceedings of the 2011 international conference on Parallel Processing - Volume 2Extracting knowledge from vast datasets is a major challenge in data-driven applications, such as classification and regression, which are mostly compute bound. In this paper, we extend our SG++ algorithm to the Intel® Many Integrated Core Architecture (...
Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and ManycoresAchieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...







Comments