Abstract
Languages such as OpenCL and CUDA offer a standard interface for general-purpose programming of GPUs. However, with these languages, programmers must explicitly manage numerous low-level details involving communication and synchronization. This burden makes programming GPUs difficult and error-prone, rendering these powerful devices inaccessible to most programmers.
We desire a higher-level programming model that makes GPUs more accessible while also effectively exploiting their computational power. This paper presents features of Lime, a new Java-compatible language targeting heterogeneous systems, that allow an optimizing compiler to generate high quality GPU code. The key insight is that the language type system enforces isolation and immutability invariants that allow the compiler to optimize for a GPU without heroic compiler analysis.
Our compiler attains GPU speedups between 75% and 140% of the performance of native OpenCL code.
- Parboil Benchmark Suite. http://impact.crhc.illinois.edu/parboil.php, 2011.Google Scholar
- J. Auerbach, D. F. Bacon, P. Cheng, and R. Rabbah. Lime: a Java-compatible and synthesizable language for heterogeneous architectures. In OOPSLA, 2010. Google Scholar
Digital Library
- I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: stream computing on graphics hardware. In SIGGRAPH, 2004. Google Scholar
Digital Library
- D. Cunningham, R. Bordewekar, and V. Saraswat. GPU programming in a high level language: Compiling X10 to CUDA. In X10 Worksop, 2011. Google Scholar
Digital Library
- M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In ASPLOS, 2006. Google Scholar
Digital Library
- T. D. Han and T. S. Abdelrahman. hiCUDA: High-level GPGPU programming. IEEE Trans. Parallel Distrib. Syst., 22, Jan 2011. Google Scholar
Digital Library
- A. H. Hormati, M. Samadi, M. Woh, T. Mudge, and S. Mahlke. Sponge: portable stream programming on graphics engines. In ASPLOS, 2011. Google Scholar
Digital Library
- T. B. Jablin, P. Prabhu, J. A. Jablin, N. P. Johnson, S. R. Beard, and D. I. August. Automatic CPU-GPU communication management and optimization. In PLDI, 2011. Google Scholar
Digital Library
- Khronos OpenCL Working Group. The OpenCL Specification.Google Scholar
- S. Lee, S.-J. Min, and R. Eigenmann. OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In PPoPP, 2009. Google Scholar
Digital Library
- C.-K. Luk, S. Hong, and H. Kim. Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In MICRO, 2009. Google Scholar
Digital Library
- W. R. Mark, R. S. Glanville, K. Akeley, and M. J. Kilgard. Cg: a system for programming graphics hardware in a C-like language. In SIGGRAPH, 2003. Google Scholar
Digital Library
- J. A. Mathew, P. D. Coddington, and K. A. Hawick. Analysis and development of Java Grande benchmarks. In Proceedings of the ACM 1999 conference on Java Grande, JAVA '99, pp. 72--80, New York, NY, USA, 1999. ACM. Google Scholar
Digital Library
- C. Newburn, B. So, Z. Liu, M. McCool, A. Ghuloum, S. Toit, Z. G. Wang, Z. H. Du, Y. Chen, G. Wu, P. Guo, Z. Liu, and D. Zhang. Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language. In CGO, 2011. Google Scholar
Digital Library
- NVIDIA Corporation. The CUDA Specification.Google Scholar
- A. D. Reid, K. Flautner, E. Grimley-Evans, and Y. Lin. SoC-C: efficient programming abstractions for heterogeneous multicore systems on chip. In CASES, 2008. Google Scholar
Digital Library
- S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W.-m. W. Hwu. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In PPoPP, 2008. Google Scholar
Digital Library
- D. Tarditi, S. Puri, and J. Oglesby. Accelerator: using data parallelism to program GPUs for general-purpose uses. In ASPLOS, 2006. Google Scholar
Digital Library
- W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A language for streaming applications. In CC, 2002. Google Scholar
Digital Library
- A. Udupa, R. Govindarajan, and M. J. Thazhuthaveetil. Software pipelined execution of stream programs on GPUs. In CGO, 2009. Google Scholar
Digital Library
- S.-Z. Ueng, M. Lathara, S. S. Baghsorkhi, and W.-M. W. Hwu. Languages and compilers for parallel computing. In LCPC, 2008.Google Scholar
- P. H. Wang, J. D. Collins, G. N. Chinya, H. Jiang, X. Tian, M. Girkar, N. Y. Yang, G.-Y. Lueh, and H. Wang. EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system. In PLDI, 2007. Google Scholar
Digital Library
- Y. Yang, P. Xiang, J. Kong, and H. Zhou. A GPGPU compiler for memory optimization and parallelism management. In PLDI, 2010. Google Scholar
Digital Library
Index Terms
Compiling a high-level language for GPUs: (via language support for architectures and compilers)
Recommendations
Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code
ICFP '15Computers have become increasingly complex with the emergence of heterogeneous hardware combining multicore CPUs and GPUs. These parallel systems exhibit tremendous computational power at the cost of increased programming effort resulting in a tension ...
Compiling a high-level language for GPUs: (via language support for architectures and compilers)
PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and ImplementationLanguages such as OpenCL and CUDA offer a standard interface for general-purpose programming of GPUs. However, with these languages, programmers must explicitly manage numerous low-level details involving communication and synchronization. This burden ...
An OpenCL Micro-Benchmark Suite for GPUs and CPUs
PDCAT '12: Proceedings of the 2012 13th International Conference on Parallel and Distributed Computing, Applications and TechnologiesOpenCL (Open Computing Language) is the first open, royalty-free standard for cross-platform, parallel programming of modern processors in personal computers, servers and handheld/embedded devices. OpenCL is vendor-independent and hence not specialized ...







Comments