Abstract
Data-dependent GPU kernels, whose data or control flow are dependent on the input of the program, are difficult to verify because they require reasoning about shared state manipulated by many parallel threads. Existing verification techniques for GPU kernels achieve soundness and scalability by using a two-thread reduction and making the contents of the shared state nondeterministic each time threads synchronise at a barrier, to account for all possible thread interactions. This coarse abstraction prohibits verification of data-dependent kernels. We present barrier invariants, a novel abstraction technique which allows key properties about the shared state of a kernel to be preserved across barriers during formal reasoning. We have integrated barrier invariants with the GPUVerify tool, and present a detailed case study showing how they can be used to verify three prefix sum algorithms, allowing efficient modular verification of a stream compaction kernel, a key building block for GPU programming. This analysis goes significantly beyond what is possible using existing verification techniques for GPU kernels.
- M. Barnett et al. Boogie: A modular reusable verifier for object-oriented programs. In FMCO, pages 364--387, 2005. Google Scholar
Digital Library
- A. Betts, N. Chong, A. F. Donaldson, S. Qadeer, and P. Thomson. GPUVerify: a verifier for GPU kernels. In OOPSLA, pages 113--132, 2012. Google Scholar
Digital Library
- M. Billeter, O. Olsson, and U. Assarsson. Efficient stream compaction on wide SIMD many-core architectures. In HPG, pages 159--166, 2009. Google Scholar
Digital Library
- G. E. Blelloch. Prefix sums and their applications. In J. H. Reif, editor, Synthesis of Parallel Algorithms. Morgan Kaufmann, 1990.Google Scholar
- R. P. Brent and H.-T. Kung. A regular layout for parallel adders. IEEE Trans. Computers, 31 (3): 260--264, 1982. Google Scholar
Digital Library
- C. Cadar, D. Dunbar, and D. R. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI, pages 209--224, 2008. Google Scholar
Digital Library
- S. Che et al. Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization, pages 44--54, 2009. Google Scholar
Digital Library
- W.-F. Chiang, G. Gopalakrishnan, G. Li, and Z. Rakamaric. Formal analysis of GPU programs with atomics via conflict-directed delay-bounding. In NFM, pages 213--228, 2013.Google Scholar
Cross Ref
- C.-T. Chou, P. K. Mannava, and S. Park. A simple method for parameterized verification of cache coherence protocols. In FMCAD, pages 382--398, 2004.Google Scholar
Cross Ref
- M. Christakis, P. Müller, and V. Wüstholz. Collaborative verification and testing with explicit assumptions. In FM, pages 132--146, 2012.Google Scholar
Cross Ref
- P. Collingbourne, C. Cadar, and P. H. J. Kelly. Symbolic testing of OpenCL code. In HVC, pages 203--218, 2011. Google Scholar
Digital Library
- P. Collingbourne, A. F. Donaldson, J. Ketema, and S. Qadeer. Interleaving and lock-step semantics for analysis and verification of GPU kernels. In ESOP, pages 270--289, 2013. Google Scholar
Digital Library
- P. Cousot and R. Cousot. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In POPL, pages 238--252, 1977. Google Scholar
Digital Library
- L. M. de Moura and N. Bjørner. Efficient E-matching for SMT solvers. In CADE, pages 183--198, 2007. Google Scholar
Digital Library
- L. M. de Moura and N. Bjørner. Z3: An efficient SMT solver. In TACAS, pages 337--340, 2008. Google Scholar
Digital Library
- D. Detlefs, G. Nelson, and J. B. Saxe. Simplify: a theorem prover for program checking. J. ACM, 52 (3): 365--473, 2005. Google Scholar
Digital Library
- A. F. Donaldson, L. Haller, and D. Kroening. Strengthening induction-based race checking with lightweight static analysis. In VMCAI, pages 169--183, 2011. Google Scholar
Digital Library
- M. Harris, S. Sengupta, and J. D. Owens. Parallel prefix sum (scan) with CUDA. In H. Nguyen, editor, GPU Gems 3. Addison-Wesley, 2007.Google Scholar
- M. Huisman and M. Mihelčić. Specification and verification of GPGPU programs using permission-based separation logic. In BYTECODE, 2013.Google Scholar
- C. B. Jones. Tentative steps toward a development method for interfering programs. ACM Trans. Program. Lang. Syst., 5 (4): 596--619, 1983. Google Scholar
Digital Library
- R. Karmani, P. Madhusudan, and B. Moore. Thread contracts for safe parallelism. In PPoPP, pages 125--134, 2011. Google Scholar
Digital Library
- Khronos OpenCL Working Group. The OpenCL specification, version 1.2, 2012.Google Scholar
- P. M. Kogge and H. S. Stone. A parallel algorithm for the efficient solution of a general class of recurrence equations. IEEE Trans. Computers, C-22 (8): 786--793, 1973. Google Scholar
Digital Library
- A. Leung, M. Gupta, Y. Agarwal, et al. Verifying GPU kernels by test amplification. In PLDI, pages 383--394, 2012. Google Scholar
Digital Library
- G. Li and G. Gopalakrishnan. Scalable SMT-based verification of GPU kernel functions. In FSE, pages 187--196, 2010. Google Scholar
Digital Library
- G. Li, P. Li, G. Sawaya, G. Gopalakrishnan, I. Ghosh, and S. P. Rajan. GKLEE: Concolic verification and test generation for GPUs. In PPoPP, pages 215--224, 2012. Google Scholar
Digital Library
- P. Li, G. Li, and G. Gopalakrishnan. Parametric flows: Automated behavior equivalencing for symbolic analysis of races in CUDA programs. In SC, pages 29:1--29:10, 2012. Google Scholar
Digital Library
- K. McMillan. Verification of infinite state systems by compositional model checking. In CHARME, pages 219--234, 1999. Google Scholar
Digital Library
- M. Moskal. Programming with triggers. In SMT, pages 20--29, 2009. Google Scholar
Digital Library
- NVIDIA. CUDA C programming guide, version 5.0, 2012.Google Scholar
- S. S. Owicki and D. Gries. An axiomatic proof technique for parallel programs I. Acta Inf., 6: 319--340, 1976.Google Scholar
Digital Library
- S. F. Siegel and T. K. Zirkel. Loop invariant symbolic execution for parallel programs. In VMCAI, pages 412--427, 2012. Google Scholar
Digital Library
- J. Stratton et al. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Technical Report IMPACT-12-01, UIUC, 2012.Google Scholar
- M. Talupur and M. R. Tuttle. Going with the flow: Parameterized verification using message flows. In FMCAD, pages 1--8, 2008. Google Scholar
Digital Library
Index Terms
Barrier invariants: a shared state abstraction for the analysis of data-dependent GPU kernels
Recommendations
The Design and Implementation of a Verification Technique for GPU Kernels
We present a technique for the formal verification of GPU kernels, addressing two classes of correctness properties: data races and barrier divergence. Our approach is founded on a novel formal operational semantics for GPU kernels termed <i>synchronous,...
GPUVerify: a verifier for GPU kernels
OOPSLA '12: Proceedings of the ACM international conference on Object oriented programming systems languages and applicationsWe present a technique for verifying race- and divergence-freedom of GPU kernels that are written in mainstream kernel programming languages such as OpenCL and CUDA. Our approach is founded on a novel formal operational semantics for GPU programming ...
Barrier invariants: a shared state abstraction for the analysis of data-dependent GPU kernels
OOPSLA '13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applicationsData-dependent GPU kernels, whose data or control flow are dependent on the input of the program, are difficult to verify because they require reasoning about shared state manipulated by many parallel threads. Existing verification techniques for GPU ...







Comments