Abstract
Developing high performance GPGPU programs is challenging for application developers since the performance is dependent upon how well the code leverages the hardware features of specific graphics processors. To solve this problem and relieve application developers of low-level hardware-specific optimizations, we introduce a novel compiler to optimize GPGPU programs. Our compiler takes a naive GPU kernel function, which is functionally correct but without any consideration for performance optimization. The compiler then analyzes the code, identifies memory access patterns, and generates optimized code. The proposed compiler optimizations target at one category of scientific and media processing algorithms, which has the characteristics of input-data sharing when computing neighboring output pixels/elements. Many commonly used algorithms, such as matrix multiplication, convolution, etc., share such characteristics. For these algorithms, novel approaches are proposed to enforce memory coalescing and achieve effective data reuse. Data prefetching and hardware-specific tuning are also performed automatically with our compiler framework. The experimental results based on a set of applications show that our compiler achieves very high performance, either superior or very close to the highly fine-tuned library, NVIDIA CUBLAS 2.1.
- J. Stratton, et. al., MCUDA: An efficient implementation of CUDA kernels on multicores. IMPACT Technical Report, UIUC, 2008.Google Scholar
- S.-I. Lee, et. al., Cetus - an extensible compiler infrastructure for source-to-source transformation. LCPC, 2003.Google Scholar
Index Terms
An optimizing compiler for GPGPU programs with input-data sharing
Recommendations
An optimizing compiler for GPGPU programs with input-data sharing
PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingDeveloping high performance GPGPU programs is challenging for application developers since the performance is dependent upon how well the code leverages the hardware features of specific graphics processors. To solve this problem and relieve application ...
A unified optimizing compiler framework for different GPGPU architectures
This article presents a novel optimizing compiler for general purpose computation on graphics processing units (GPGPU). It addresses two major challenges of developing high performance GPGPU programs: effective utilization of GPU memory hierarchy and ...
gpucc: an open-source GPGPU compiler
CGO '16: Proceedings of the 2016 International Symposium on Code Generation and OptimizationGraphics Processing Units have emerged as powerful accelerators for massively parallel, numerically intensive workloads. The two dominant software models for these devices are NVIDIA’s CUDA and the cross-platform OpenCL standard. Until now, there has ...







Comments