Abstract
We present a technique for verifying race- and divergence-freedom of GPU kernels that are written in mainstream kernel programming languages such as OpenCL and CUDA. Our approach is founded on a novel formal operational semantics for GPU programming termed synchronous, delayed visibility (SDV) semantics. The SDV semantics provides a precise definition of barrier divergence in GPU kernels and allows kernel verification to be reduced to analysis of a sequential program, thereby completely avoiding the need to reason about thread interleavings, and allowing existing modular techniques for program verification to be leveraged. We describe an efficient encoding for data race detection and propose a method for automatically inferring loop invariants required for verification. We have implemented these techniques as a practical verification tool, GPUVerify, which can be applied directly to OpenCL and CUDA source code. We evaluate GPUVerify with respect to a set of 163 kernels drawn from public and commercial sources. Our evaluation demonstrates that GPUVerify is capable of efficient, automatic verification of a large number of real-world kernels.
- AMD. AMD Accelerated Parallel Processing (APP) SDK. developer.amd.com/sdks/amdappsdk/pages/default.aspxGoogle Scholar
- M. Barnett, B. Chang, R. DeLine, B. Jacobs, and K. R. M. Leino. Boogie: A modular reusable verifier for object-oriented programs. In FMCO, 2005. Google Scholar
Digital Library
- D. Beyer, T. A. Henzinger, R. Majumdar, and A. Rybalchenko. Path invariants. In PLDI, 2007. Google Scholar
Digital Library
- M. Boyer, K. Skadron, and W. Weimer. Automated dynamic analysis of CUDA programs. In STMCS, 2008.Google Scholar
- C. Cadar, D. Dunbar, and D. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI, 2008. Google Scholar
Digital Library
- P. Collingbourne, C. Cadar, and P. H. J. Kelly. Symbolic testing of OpenCL code. In HVC, 2011.Google Scholar
- P. Cousot and R. Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In POPL, 1977. Google Scholar
Digital Library
- L. de Moura and N. Bjørner. Z3: An efficient SMT solver. In TACAS, 2008. Google Scholar
Digital Library
- A. Donaldson, D. Kroening and P. Rummer. Automatic analysis of scratch-pad memory code for heterogeneous multicore processors. In TACAS, 2010. Google Scholar
Digital Library
- A. Donaldson, D. Kroening and P. Rummer. Automatic analysis of DMA races using model checking and k-induction. Formal Methods in System Design 39(1):83--113. Google Scholar
Digital Library
- M. D. Ernst, J. H. Perkins, P. J. Guo, S. McCamant, C. Pacheco, M. S. Tschantz, and C. Xiao The Daikon system for dynamic detection of likely invariants. Sci. Comput. Program. 69(1-3):35--45. Google Scholar
Digital Library
- C. Flanagan and K. R. M. Leino. Houdini, an annotation assistant for ESC/Java. In FME, 2001. Google Scholar
Digital Library
- S. Graf and H. Sa idi. Construction of abstract state graphs with PVS. In CAV, 1997. Google Scholar
Digital Library
- A. Habermaier and A. Knapp. On the correctness of the SIMT execution model of GPUs. In ESOP, 2012. Google Scholar
Digital Library
- T. Kahsai, Y. Ge, and C. Tinelli. Instantiation-based invariant discovery. In NFM, 2011. Google Scholar
Digital Library
- R. K. Karmani, P. Madhusudan, and B. Moore. Thread contracts for safe parallelism. In PPOPP, 2011. Google Scholar
Digital Library
- Khronos OpenCL Working Group. The OpenCL specifica-tion, version 1.1, 2011. Document Revision: 44.Google Scholar
- S. K. Lahiri and S. Qadeer. Complexity and algorithms for monomial and clausal predicate. In CADE, 2009. Google Scholar
Digital Library
- K. R. M. Leino, G. Nelson, and J. B. Saxe. ESC/Java user's manual. Technical Note 2000-002, Compaq Systems Research Center, October 2000.Google Scholar
- A. Leung, M. Gupta, Y. Agarwal, R. Gupta, R. Jhala, and S. Lerner. Verifying GPU kernels by test amplification. In PLDI, 2012. Google Scholar
Digital Library
- G. Li and G. Gopalakrishnan. Scalable SMT-based verifica-tion of GPU kernel functions. In FSE, 2010. Google Scholar
Digital Library
- G. Li, P. Li, G. Gopalakrishnan. Parametric flows: automated behaviour equivalencing for symbolic analysis of races in CUDA programs. In SC, 2012. Google Scholar
Digital Library
- G. Li, P. Li, G. Sawaya, G. Gopalakrishnan, I. Ghosh, and S. P. Rajan. GKLEE: concolic verification and test generation for GPUs. In PPOPP, 2012. Google Scholar
Digital Library
- llvm.org. clang: a C language family frontend for LLVM. clang.llvm.orgGoogle Scholar
- A. Lokhmotov. Mobile and embedded computing on Mali GPUs. In UK GPU Computing Conference, 2011.Google Scholar
- K. L. McMillan. Lazy abstraction with interpolants. In CAV, 2006. Google Scholar
Digital Library
- Microsoft Corporation. C++ AMP sample projects for download (MSDN blog). blogs.msdn.com/b/nativeconcurrency/archive/2012/01/30/c-amp-sample-projects-for-download.aspxGoogle Scholar
- D. Moth and Y. Levanoni. Microsoft's C++ AMP unveiled. www.drdobbs.com/windows/231600761Google Scholar
- NVIDIA. CUDA Toolkit Release Archive. developer.nvidia.com/cuda-toolkit-archiveGoogle Scholar
- NVIDIA. CUDA C programming guide, v4.0, 2011.Google Scholar
- NVIDIA. PTX: Parallel thread execution ISA, v2.3, 2011.Google Scholar
- L. Nyland. Personal communication, April 2012.Google Scholar
- L. Nyland, M. Harris, and J. Prins. Fast N-body simulation with CUDA. GPU Gems 3, Chapter 31. Addison-Wesley, 2007.Google Scholar
- Rightware Oy. Basemark CL. www.rightware.com/en/Benchmarking+Software/Basemark%99+CLGoogle Scholar
- S. Srivastava and S. Gulwani. Program verification using templates over predicate abstraction. In PLDI, 2009. Google Scholar
Digital Library
- B. Steensgaard. Points-to analysis in almost linear time. In POPL, 1996. Google Scholar
Digital Library
- S. Stone, J. Haldar, S. Tsao, W. Hwu, B. Sutton, and Z. Liang. Accelerating advanced MRI reconstructions on GPUs. J. Parallel Distrib. Comput., 68(10):1307--1318, 2008. Google Scholar
Digital Library
- S. Tripakis, C. Stergiou, and R. Lublinerman. Checking non-interference in SPMD programs. In HotPar, 2010.Google Scholar
Index Terms
GPUVerify: a verifier for GPU kernels
Recommendations
The Design and Implementation of a Verification Technique for GPU Kernels
We present a technique for the formal verification of GPU kernels, addressing two classes of correctness properties: data races and barrier divergence. Our approach is founded on a novel formal operational semantics for GPU kernels termed <i>synchronous,...
GPUVerify: a verifier for GPU kernels
OOPSLA '12: Proceedings of the ACM international conference on Object oriented programming systems languages and applicationsWe present a technique for verifying race- and divergence-freedom of GPU kernels that are written in mainstream kernel programming languages such as OpenCL and CUDA. Our approach is founded on a novel formal operational semantics for GPU programming ...
Barrier invariants: a shared state abstraction for the analysis of data-dependent GPU kernels
OOPSLA '13Data-dependent GPU kernels, whose data or control flow are dependent on the input of the program, are difficult to verify because they require reasoning about shared state manipulated by many parallel threads. Existing verification techniques for GPU ...







Comments