Abstract
Using GPUs as general-purpose processors has revolutionized parallel computing by offering, for a large and growing set of algorithms, massive data-parallelization on desktop machines. An obstacle to widespread adoption, however, is the difficulty of programming them and the low-level control of the hardware required to achieve good performance. This paper suggests a programming library, SafeGPU, that aims at striking a balance between programmer productivity and performance, by making GPU data-parallel operations accessible from within a classical object-oriented programming language. The solution is integrated with the design-by-contract approach, which increases confidence in functional program correctness by embedding executable program specifications into the program text. We show that our library leads to modular and maintainable code that is accessible to GPGPU non-experts, while providing performance that is comparable with hand-written CUDA code. Furthermore, runtime contract checking turns out to be feasible, as the contracts can be executed on the GPU.
- U. Beaugnon, A. Kravets, S. van Haastregt, R. Baghdadi, D. Tweed, J. Absar, and A. Lokhmotov. VOBLA: A vehicle for optimized basic linear algebra. In Proceedings of the ACM SIGPLAN Conference on Languages, Compilers and Tools for Embedded Systems (LCTES ’14), pages 115–124. ACM, 2014. Google Scholar
Digital Library
- A. Betts, N. Chong, A. F. Donaldson, J. Ketema, S. Qadeer, P. Thomson, and J. Wickerson. The design and implementation of a verification technique for GPU kernels. ACM Transactions on Programming Languages and Systems, 37(3):10, 2015. Google Scholar
Digital Library
- S. Blom, M. Huisman, and M. Mihelˇci´c. Specification and verification of GPGPU programs. Science of Computer Programming, 95:376–388, 2014. Google Scholar
Digital Library
- P. Collingbourne, C. Cadar, and P. H. J. Kelly. Symbolic crosschecking of data-parallel floating-point code. IEEE Transactions on Software Engineering, 40(7):710–737, 2014. Google Scholar
Digital Library
- C. Dubach, P. Cheng, R. Rabbah, D. F. Bacon, and S. J. Fink. Compiling a high-level language for GPUs: (via language support for architectures and compilers). In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’12), pages 1–12. ACM, 2012. Google Scholar
Digital Library
- J. Enmyren and C. W. Kessler. SkePU: A multi-backend skeleton programming library for multi-GPU systems. In Proceedings of the 4th International Workshop on High-level Parallel Programming and Applications (HLPP ’10), pages 5–14. ACM, 2010. Google Scholar
Digital Library
- S. Ernsting and H. Kuchen. Algorithmic skeletons for multicore, multi-GPU systems and clusters. International Journal of High Performance Computing and Networking, 7(2):129– 138, 2012. Google Scholar
Digital Library
- M. Goli and H. González-Vélez. Heterogeneous algorithmic skeletons for FastFlow with seamless coordination over hybrid architectures. In Proceedings of the 21st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP ’13), pages 148–156. IEEE, 2013. Google Scholar
Digital Library
- M. Harris. An efficient matrix transpose in CUDA C/C++. http://devblogs.nvidia.com/parallelforall/ efficient-matrix-transpose-cuda-cc/, accessed: March 2015.Google Scholar
- A. H. Hormati, M. Samadi, M. Woh, T. Mudge, and S. Mahlke. Sponge: Portable stream programming on graphics engines. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’11), pages 381–392. ACM, 2011. Google Scholar
Digital Library
- Khronos OpenCL Working Group. The OpenCL specification: Version 1.2. https://www.khronos.org/registry/cl/ specs/opencl-1.2.pdf, 2012.Google Scholar
- A. Klöckner, N. Pinto, Y. Lee, B. C. Catanzaro, P. Ivanov, and A. Fasih. PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation. Parallel Computing, 38(3):157–174, 2012. Google Scholar
Digital Library
- G. Li and G. Gopalakrishnan. Scalable SMT-based verification of GPU kernel functions. In Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE ’10), pages 187–196. ACM, 2010. Google Scholar
Digital Library
- G. Li, P. Li, G. Sawaya, G. Gopalakrishnan, I. Ghosh, and S. P. Rajan. GKLEE: Concolic verification and test generation for GPUs. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’12), pages 215–224. ACM, 2012. Google Scholar
Digital Library
- Linear Algebra: Gaussian Elimination. http://www.cs. rutgers.edu/~venugopa/parallel_summer2012/ge.html, accessed: March 2015.Google Scholar
- D. Makarov and M. Hauswirth. CLOP: A multi-stage compiler to seamlessly embed heterogeneous code. In Proceedings of the 14th International Conference on Generative Programming: Concepts and Experiences (GPCE ’15). ACM, 2015. Google Scholar
Digital Library
- R. Marqués, H. Paulino, F. Alexandre, and P. D. Medeiros. Algorithmic skeleton framework for the orchestration of GPU computations. In Proceedings of the 19th International Conference on Parallel Processing (Euro-Par ’13), volume 8097 of LNCS, pages 874–885. Springer, 2013. Google Scholar
Digital Library
- B. Meyer. Object-Oriented Software Construction, 2nd Edition. Prentice-Hall, 1997. Google Scholar
Digital Library
- S. Nanz and C. A. Furia. A comparative study of programming languages in Rosetta Code. In Proceedings of the 37th International Conference on Software Engineering (ICSE ’15), pages 778–788. IEEE, 2015. Google Scholar
Digital Library
- NVIDIA: CUDA Parallel Computing Platform. http://www. nvidia.com/object/cuda_home_new.html, accessed: March 2015.Google Scholar
- NVIDIA: CUDA Toolkit Documentation – Thrust. http: //docs.nvidia.com/cuda/thrust/, accessed: March 2015.Google Scholar
- NVIDIA: GPU Applications. http://www.nvidia.com/ object/gpu-applications.html, accessed: March 2015.Google Scholar
- N. Nystrom, D. White, and K. Das. Firepile: Run-time compilation for GPUs in Scala. In Proceedings of the 10th International Conference on Generative Programming and Component Engineering (GPCE ’11), pages 107–116. ACM, 2011. Google Scholar
Digital Library
- P. C. Pratt-Szeliga, J. W. Fawcett, and R. D. Welch. Rootbeer: Seamlessly using GPUs from Java. In Proceedings of the 14th International Conference on High Performance Computing and Communication & 9th International Conference on Embedded Software and Systems (HPCC-ICESS ’’12), pages 375–380. IEEE, 2012. Google Scholar
Digital Library
- M. Steuwer and S. Gorlatch. SkelCL: Enhancing OpenCL for high-level programming of multi-GPU systems. In Proceedings of the 12th International Conference on Parallel Computing Technologies (PaCT ’13), volume 7979 of LNCS, pages 258–272. Springer, 2013.Google Scholar
- A. K. Sujeeth, K. J. Brown, H. Lee, T. Rompf, H. Chafi, M. Odersky, and K. Olukotun. Delite: A compiler architecture for performance-oriented embedded domain-specific languages. ACM Transactions on Embedded Computing Systems, 13(4s):134, 2014. Google Scholar
Digital Library
- W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A language for streaming applications. In Proceedings of the 11th International Conference on Compiler Construction (CC ’02), volume 2304 of LNCS, pages 179–196. Springer, 2002. Google Scholar
Digital Library
- S. West, S. Nanz, and B. Meyer. Efficient and reasonable object-oriented concurrency. In Proceedings of the 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE ’15). ACM, 2015. Google Scholar
Digital Library
- Y. Yan, M. Grossman, and V. Sarkar. JCUDA: A programmerfriendly interface for accelerating Java programs with CUDA. In Proceedings of the 15th International Euro-Par Conference on Parallel Processing (Euro-Par ’09), volume 5704 of LNCS, pages 887–899. Springer, 2009. Google Scholar
Digital Library
- S. Yoo, M. Harman, and S. Ur. GPGPU test suite minimisation: search based software engineering performance improvement using graphics cards. Empirical Software Engineering, 18(3):550–593, 2013.Google Scholar
Cross Ref
Index Terms
Contract-based general-purpose GPU programming
Recommendations
Contract-based general-purpose GPU programming
GPCE 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Generative Programming: Concepts and ExperiencesUsing GPUs as general-purpose processors has revolutionized parallel computing by offering, for a large and growing set of algorithms, massive data-parallelization on desktop machines. An obstacle to widespread adoption, however, is the difficulty of ...
A performance study of general-purpose applications on graphics processors using CUDA
Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Research on accelerating method for video quality measurement program using GPGPU
RACS '16: Proceedings of the International Conference on Research in Adaptive and Convergent SystemsIn recent times, with the advancing of the graphics processing unit (GPU), parallel computing using general-purpose computing on GPU (GPGPU) is expanding. This is achieved through a processing speed faster than those of traditional computing ...






Comments