Abstract
Recent advances have enabled GPUs to be used as general-purpose parallel processors on commodity hardware for little cost. However, the ability to program these devices has not kept up with their performance. The programming model for GPUs has a number of restrictions that make it difficult to program. For example, software running on the GPU cannot perform dynamic memory allocation, requiring the programmer to pre-allocate all memory the GPU might use. To achieve good performance, GPU programmers must also be aware of how data is moved between host and GPU memory and between the different levels of the GPU memory hierarchy.
We describe Firepile, a library for GPU programming in Scala. The library enables a subset of Scala to be executed on the GPU. Code trees can be created from run-time function values, which can then be analyzed and transformed to generate GPU code. A key property of this mechanism is that it is modular: unlike with other meta-programming constructs, the use of code trees need not be exposed in the library interface. Code trees are general and can be used by library writers in other application domains. Our experiments show Firepile users can achieve performance comparable to C code targeted to the GPU with shorter, simpler, and easier-to-understand code.
- Aparapi: Java API for expressing GPU bound data parallel algorithms. http://developer.amd.com/zones/java/aparapi/Pages/default.aspx, 2011.Google Scholar
- Joshua Auerbach, David F. Bacon, Perry Cheng, and Rodric Rabbah. Lime: a Java-compatible and synthesizable language for heterogeneous architectures. In Proceedings of the 25th ACM Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA 2010), pages 89--108, 2010. Google Scholar
Digital Library
- Alan Bawden. Quasiquotation in Lisp. In Partial Evaluation and Semantic-Based Program Manipulation, pages 4--12, 1999.Google Scholar
- Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. Brook for GPUs: stream computing on graphics hardware. In ACM SIGGRAPH 2004 Papers (SIGGRAPH '04), pages 777--786, 2004. Google Scholar
Digital Library
- Hassan Chafi, Zach DeVito, Adriaan Moors, Tiark Rompf, Arvind K. Sujeeth, Pat Hanrahan, Martin Odersky, and Kunle Olukotun. Language virtualization for heterogeneous parallel computing. In Onward! '10: Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, October 2010. Google Scholar
Digital Library
- Olivier Chafik. JavaCL: Java wrappers for OpenCL. http://code.google.com/p/javacl, 2011.Google Scholar
- Olivier Chafik. ScalaCL: Faster Scala: optimizing compiler plugin+GPU-based collections (OpenCL). http://code.google.com/p/scalacl, 2011.Google Scholar
- Clyther: Python language extension for OpenCL. http://clyther.sourceforge.net, 2011.Google Scholar
- ECMA. Standard ECMA-334: C# language specification (4th edition). http://www.ecma-international.org/publications/standards/Ecma-334.htm, June 2006.Google Scholar
- Miguel Garcia, Anastasia Izmaylova, and Sibylle Schupp. Extending Scala with database query capability. Journal of Object Technology, July 2010.Google Scholar
Cross Ref
- James Gosling, Bill Joy, Guy Steele, and Gilad Bracha. The Java Language Specification. Addison Wesley, 3rd edition, 2005. ISBN 0321246780. Google Scholar
Digital Library
- GPU.NET: Library for developing GPU-accelerated applications with .NET. http://www.tidepowerd.com/product, 2011.Google Scholar
- Shan Shan Huang, Amir Hormati, David F. Bacon, and Rodric Rabbah. Liquid metal: Object-oriented programming across the hardware/software boundary. In Proceedings of the 22nd European Conference on Object-Oriented Programming (ECOOP 2008), volume 5142 of Lecture Notes in Computer Science, pages 76--103, 2008. Google Scholar
Digital Library
- JOCL: Java bindings for OpenCL. http://www.jocl.org, 2011.Google Scholar
- Richard Kelsey, William Clinger, and Jonathan Rees (editors). Revised5 report on the algorithmic language Scheme. ACM SIGPLAN Notices, 33(9):26--76, October 1998. Google Scholar
Digital Library
- Andreas Klöckner, Nicolas Pinto, Yunsup Lee, Bryan C. Catanzaro, Paul Ivanov, and Ahmed Fasih. PyCUDA: GPU run-time code generation for high-performance computing. http://arxiv.org/abs/0911.3456, 2009. In submission.Google Scholar
- G. Korland, N. Shavit, and P. Felber. Noninvasive concurrency with Java S™. In Third Workshop on Programmability Issues for Multi-Core Computers (MULTIPROG-3), January 2010.Google Scholar
- Sean Lee, Vinod Grover, Manuel M. T. Chakravarty, and Gabriele Keller. GPU kernels as data-parallel array computations in Haskell. In Workshop on Exploiting Parallelism using GPUs and other Hardware-Assisted Methods (EPHAM), 2009.Google Scholar
- Calle Lejdfors and Lennart Ohlsson. Implementing an embedded gpu language by combining translation and generation. In Proceedings of the 2006 ACM symposium on Applied computing (SAC '06), pages 1610--1614, 2006. Google Scholar
Digital Library
- Geoffrey Mainland and Greg Morrisett. Nikola: embedding compiled GPU functions in Haskell. In Proceedings of the third ACM symposium on Haskell (Haskell '10), pages 67--78, 2010. Google Scholar
Digital Library
- Geoffrey B. Mainland. Why it's nice to be quoted: Quasiquoting for Haskell. In Proceedings of the 2007 ACM symposium on Haskell (Haskell '07), 2007. Google Scholar
Digital Library
- A. Munshi and Khronos OpenCL Working Group. The OpenCL specification, 2009.Google Scholar
- NVIDIA. Compute unified device architecture programming guide. http://developer.download.nvidia.com/compute/cuda/1_0/NVIDIA_CUDA_Programming_Guide_1.0.pdf, 2008.Google Scholar
- NVIDIA. NVIDIA OpenCL best practices guide, version 1.0. http://www.nvidia.com/content/cudazone/CUDABrowser/downloads/papersNVIDIA_OpenCL_BestPracticesGuide.pdf, 2009.Google Scholar
- NVIDIA. NVIDIA's next generation CUDA compute architecture: Fermi. http://www.nvidia.com/content/PDF/fermi_white_papersNVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf, 2010.Google Scholar
- Martin Odersky et al. The Scala language specification, 2006--2011.Google Scholar
- PyOpenCL: Python programming environment for OpenCL. http://mathema.tician.de/software/pyopencl, 2011.Google Scholar
- Johannes Rudolph and Peter Thiemann. Mnemonics: type-safe bytecode generation at run time. In Proceedings of the 2010 ACM SIGPLAN workshop on Partial evaluation and program manipulation (PEPM), pages 15--24, 2010. Google Scholar
Digital Library
- Guy L. Steele, Jr. and Richard P. Gabriel. The evolution of Lisp. In HOPL-II: The second ACM SIGPLAN conference on History of programming languages, pages 231--270, New York, NY, USA, 1993. ACM. Google Scholar
Digital Library
- Walid Taha. A gentle introduction to multi-stage programming. In Domain-Specific Program Generation, pages 30---50, 2003.Google Scholar
- Walid Taha and Tim Sheard. Multi-stage programming with explicit annotations. In Proceedings of the ACM-SIGPLAN Symposium on Partial Evaluation and semantic based program manipulations (PEPM), pages 203--217, 1997. Google Scholar
Digital Library
- David Tarditi, Sidd Puri, and Jose Oglesby. Accelerator: Using data parallelism to program GPUs for general-purpose uses. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 2006. Google Scholar
Digital Library
- Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie Hendren, Patrick Lam, and Vijay Vijay Sundaresan. Soot: A Java bytecode optimization framework. In Proceedings of the 1999 conference of the Centre for Advanced Studies on Collaborative research (CASCON), 1999. Google Scholar
Digital Library
- Yonghong Yan, Max Grossman, and Vivek Sarkar. JCUDA: A programmer-friendly interface for accelerating Java programs with CUDA. In Proceedings of the 15th International Euro-Par Conference on Parallel Processing (Euro-Par '09), pages 887--899, 2009. Google Scholar
Digital Library
Index Terms
Firepile: run-time compilation for GPUs in scala
Recommendations
Firepile: run-time compilation for GPUs in scala
GPCE '11: Proceedings of the 10th ACM international conference on Generative programming and component engineeringRecent advances have enabled GPUs to be used as general-purpose parallel processors on commodity hardware for little cost. However, the ability to program these devices has not kept up with their performance. The programming model for GPUs has a number ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance ComputingThe graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
An OpenCL Micro-Benchmark Suite for GPUs and CPUs
PDCAT '12: Proceedings of the 2012 13th International Conference on Parallel and Distributed Computing, Applications and TechnologiesOpenCL (Open Computing Language) is the first open, royalty-free standard for cross-platform, parallel programming of modern processors in personal computers, servers and handheld/embedded devices. OpenCL is vendor-independent and hence not specialized ...







Comments