skip to main content
research-article

Contract-based general-purpose GPU programming

Published:26 October 2015Publication History
Skip Abstract Section

Abstract

Using GPUs as general-purpose processors has revolutionized parallel computing by offering, for a large and growing set of algorithms, massive data-parallelization on desktop machines. An obstacle to widespread adoption, however, is the difficulty of programming them and the low-level control of the hardware required to achieve good performance. This paper suggests a programming library, SafeGPU, that aims at striking a balance between programmer productivity and performance, by making GPU data-parallel operations accessible from within a classical object-oriented programming language. The solution is integrated with the design-by-contract approach, which increases confidence in functional program correctness by embedding executable program specifications into the program text. We show that our library leads to modular and maintainable code that is accessible to GPGPU non-experts, while providing performance that is comparable with hand-written CUDA code. Furthermore, runtime contract checking turns out to be feasible, as the contracts can be executed on the GPU.

References

  1. U. Beaugnon, A. Kravets, S. van Haastregt, R. Baghdadi, D. Tweed, J. Absar, and A. Lokhmotov. VOBLA: A vehicle for optimized basic linear algebra. In Proceedings of the ACM SIGPLAN Conference on Languages, Compilers and Tools for Embedded Systems (LCTES ’14), pages 115–124. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Betts, N. Chong, A. F. Donaldson, J. Ketema, S. Qadeer, P. Thomson, and J. Wickerson. The design and implementation of a verification technique for GPU kernels. ACM Transactions on Programming Languages and Systems, 37(3):10, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Blom, M. Huisman, and M. Mihelˇci´c. Specification and verification of GPGPU programs. Science of Computer Programming, 95:376–388, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Collingbourne, C. Cadar, and P. H. J. Kelly. Symbolic crosschecking of data-parallel floating-point code. IEEE Transactions on Software Engineering, 40(7):710–737, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Dubach, P. Cheng, R. Rabbah, D. F. Bacon, and S. J. Fink. Compiling a high-level language for GPUs: (via language support for architectures and compilers). In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’12), pages 1–12. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Enmyren and C. W. Kessler. SkePU: A multi-backend skeleton programming library for multi-GPU systems. In Proceedings of the 4th International Workshop on High-level Parallel Programming and Applications (HLPP ’10), pages 5–14. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Ernsting and H. Kuchen. Algorithmic skeletons for multicore, multi-GPU systems and clusters. International Journal of High Performance Computing and Networking, 7(2):129– 138, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Goli and H. González-Vélez. Heterogeneous algorithmic skeletons for FastFlow with seamless coordination over hybrid architectures. In Proceedings of the 21st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP ’13), pages 148–156. IEEE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Harris. An efficient matrix transpose in CUDA C/C++. http://devblogs.nvidia.com/parallelforall/ efficient-matrix-transpose-cuda-cc/, accessed: March 2015.Google ScholarGoogle Scholar
  10. A. H. Hormati, M. Samadi, M. Woh, T. Mudge, and S. Mahlke. Sponge: Portable stream programming on graphics engines. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’11), pages 381–392. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Khronos OpenCL Working Group. The OpenCL specification: Version 1.2. https://www.khronos.org/registry/cl/ specs/opencl-1.2.pdf, 2012.Google ScholarGoogle Scholar
  12. A. Klöckner, N. Pinto, Y. Lee, B. C. Catanzaro, P. Ivanov, and A. Fasih. PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation. Parallel Computing, 38(3):157–174, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Li and G. Gopalakrishnan. Scalable SMT-based verification of GPU kernel functions. In Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE ’10), pages 187–196. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Li, P. Li, G. Sawaya, G. Gopalakrishnan, I. Ghosh, and S. P. Rajan. GKLEE: Concolic verification and test generation for GPUs. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’12), pages 215–224. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Linear Algebra: Gaussian Elimination. http://www.cs. rutgers.edu/~venugopa/parallel_summer2012/ge.html, accessed: March 2015.Google ScholarGoogle Scholar
  16. D. Makarov and M. Hauswirth. CLOP: A multi-stage compiler to seamlessly embed heterogeneous code. In Proceedings of the 14th International Conference on Generative Programming: Concepts and Experiences (GPCE ’15). ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Marqués, H. Paulino, F. Alexandre, and P. D. Medeiros. Algorithmic skeleton framework for the orchestration of GPU computations. In Proceedings of the 19th International Conference on Parallel Processing (Euro-Par ’13), volume 8097 of LNCS, pages 874–885. Springer, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Meyer. Object-Oriented Software Construction, 2nd Edition. Prentice-Hall, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Nanz and C. A. Furia. A comparative study of programming languages in Rosetta Code. In Proceedings of the 37th International Conference on Software Engineering (ICSE ’15), pages 778–788. IEEE, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. NVIDIA: CUDA Parallel Computing Platform. http://www. nvidia.com/object/cuda_home_new.html, accessed: March 2015.Google ScholarGoogle Scholar
  21. NVIDIA: CUDA Toolkit Documentation – Thrust. http: //docs.nvidia.com/cuda/thrust/, accessed: March 2015.Google ScholarGoogle Scholar
  22. NVIDIA: GPU Applications. http://www.nvidia.com/ object/gpu-applications.html, accessed: March 2015.Google ScholarGoogle Scholar
  23. N. Nystrom, D. White, and K. Das. Firepile: Run-time compilation for GPUs in Scala. In Proceedings of the 10th International Conference on Generative Programming and Component Engineering (GPCE ’11), pages 107–116. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. C. Pratt-Szeliga, J. W. Fawcett, and R. D. Welch. Rootbeer: Seamlessly using GPUs from Java. In Proceedings of the 14th International Conference on High Performance Computing and Communication & 9th International Conference on Embedded Software and Systems (HPCC-ICESS ’’12), pages 375–380. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Steuwer and S. Gorlatch. SkelCL: Enhancing OpenCL for high-level programming of multi-GPU systems. In Proceedings of the 12th International Conference on Parallel Computing Technologies (PaCT ’13), volume 7979 of LNCS, pages 258–272. Springer, 2013.Google ScholarGoogle Scholar
  26. A. K. Sujeeth, K. J. Brown, H. Lee, T. Rompf, H. Chafi, M. Odersky, and K. Olukotun. Delite: A compiler architecture for performance-oriented embedded domain-specific languages. ACM Transactions on Embedded Computing Systems, 13(4s):134, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A language for streaming applications. In Proceedings of the 11th International Conference on Compiler Construction (CC ’02), volume 2304 of LNCS, pages 179–196. Springer, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. West, S. Nanz, and B. Meyer. Efficient and reasonable object-oriented concurrency. In Proceedings of the 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE ’15). ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Y. Yan, M. Grossman, and V. Sarkar. JCUDA: A programmerfriendly interface for accelerating Java programs with CUDA. In Proceedings of the 15th International Euro-Par Conference on Parallel Processing (Euro-Par ’09), volume 5704 of LNCS, pages 887–899. Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Yoo, M. Harman, and S. Ur. GPGPU test suite minimisation: search based software engineering performance improvement using graphics cards. Empirical Software Engineering, 18(3):550–593, 2013.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Contract-based general-purpose GPU programming

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!