Abstract
Vectorizing compilers employ divergence analysis to detect at which program point a specific variable is uniform, i.e. has the same value on all SPMD threads that execute this program point. They exploit uniformity to retain branching to counter branch divergence and defer computations to scalar processor units. Divergence is a hyper-property and is closely related to non-interference and binding time. There exist several divergence, binding time, and non-interference analyses already but they either sacrifice precision or make significant restrictions to the syntactical structure of the program in order to achieve soundness.
In this paper, we present the first abstract interpretation for uniformity that is general enough to be applicable to reducible CFGs and, at the same time, more precise than other analyses that achieve at least the same generality.
Our analysis comes with a correctness proof that is to a large part mechanized in Coq. Our experimental evaluation shows that the compile time and the precision of our analysis is on par with LLVM's default divergence analysis that is only sound on more restricted CFGs. At the same time, our analysis is faster and achieves better precision than a state-of-the-art non-interference analysis that is sound and at least as general as our analysis.
Supplemental Material
Available for Download
The appendix includes the full evaluation results, a description of the artifacts accompanying the paper and a detailed implementation guide for the analysis.
- Martín Abadi, Anindya Banerjee, Nevin Heintze, and Jon G. Riecke. 1999. A Core Calculus of DependePnOcPy. LIn'99, Proceedings of the 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Antonio, TX, USA, January 20-22, 1999. 147-160. https://doi.org/10.1145/292540.292555 Google Scholar
Digital Library
- Alexander Aiken and David Gay. 1998. Barrier Inference. POInPL '98, Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Diego, CA, USA, January 19-21, 1998. 342-354. https://doi.org/ 10.1145/268946.268974 Google Scholar
Digital Library
- Rajeev Alur, Joseph Devieti, Omar S. Navarro Leija, and Nimit Singhania. 2017. GPUDrano: Detecting Uncoalesced Accesses in GPU Programs. InComputer Aided Verification-29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part I. 507-525. https://doi.org/10.1007/978-3-319-63387-9_25 Google Scholar
Cross Ref
- Torben Amtoft. 2008. Slicing for modern program structures: a theory for eliminating irrelevant Ilnof. oPprso.cess. Let. 106, 2 ( 2008 ), 45-51. https://doi.org/10.1016/j.ipl. 2007. 10.002 Google Scholar
Digital Library
- Mounir Assaf, David A. Naumann, Julien Signoles, Eric Totel, and Frédéric Tronel. 2017. Hypercollecting semantics and its application to static analysis of information floPrwo. ceInedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, Paris, France, January 18-20, 2017, Giuseppe Castagna and Andrew D. Gordon (Eds.). ACM, 874-887. https://doi.org/10.1145/3009837 Google Scholar
Digital Library
- Joel Auslander, Mathai Philipose, Craig Chambers, Susan J. Eggers, and Brian N. Bershad. 1996. Fast, Efective Dynamic Compilation. InProceedings of the ACM SIGPLAN'96 Conference on Programming Language Design and Implementation (PLDI), Philadephia, Pennsylvania, USA, May 21-24, 1996. 149-159. https://doi.org/10.1145/231379.231409 Google Scholar
Digital Library
- Robert A. Ballance, Arthur B. Maccabe, and Karl J. Otenstein. 1990. The Program Dependence Web: A Representation Supporting Control, Data, and Demand-Driven Interpretation of Imperative LanguPargoecese. dInings of the ACM SIGPLAN'90 Conference on Programming Language Design and Implementation (PLDI), White Plains, New York, USA, June 20-22, 1990. 257-271. https://doi.org/10.1145/93542.93578 Google Scholar
Digital Library
- Gilles Barthe, Mathilde Duclos, and Yassine Lakhnech. 2011. A Computational Indistinguishability Logic for the Bounde Storage Model. InFoundations and Practice of Security-4th Canada-France MITACS Workshop, FPS 2011, Paris, France, May 12-13, 2011, Revised Selected Papers. 102-117. https://doi.org/10.1007/978-3-642-27901-0_9 Google Scholar
Digital Library
- David Bucciarelli et al. 2020. LuxMark, v3h.t1.tps://wiki.luxcorerender.org/LuxMark._vA3ccessed: 2020. 06.30.Google Scholar
- Anupama Chandrasekhar, Gang Chen, Po-Yu Chen, Wei-Yu Chen, Junjie Gu, Peng Guo, Shruthi Hebbur Prasanna Kumar, Guei-Yuan Lueh, Pankaj Mistry, Wei Pan, Thomas Raoux, and Konrad Trifunovic. 2019. IGC: The Open Source Intel Graphics Compiler. InIEEE/ACM International Symposium on Code Generation and Optimization, CGO 2019, Washington, DC, USA, February 16-20, 2019. 254-265. https://doi.org/10.1109/CGO. 2019.8661189 Google Scholar
Cross Ref
- Michael R. Clarkson and Fred B. Schneider. 2010. HyperpropertiJeosu.rnal of Computer Security 18, 6 ( 2010 ), 1157-1210. https://doi.org/10.3233/JCS-2009-0393 Google Scholar
Cross Ref
- Sylvain Collange. 2011I. dentifying scalar behavior in CUDA kernels. Technical Report. ENS Lyon.https://hal.archivesouvertes.fr/hal-00555134/Google Scholar
- Charles Consel. 1990. Binding Time Analysis for High Order Untyped Functional LanguaPgreosc. eeIndings of the 1990 ACM Conference on LISP and Functional Programming, LFP 1990, Nice, France, 27-29 June 1990. ACM, 264-272. https: //doi.org/10.1145/91556.91668 Google Scholar
Digital Library
- Keith D. Cooper, Timothy J. Harvey, and Ken Kennedy. 2001A. Simple, Fast Dominance Algorithm. Technical Report TR-06-33870. Rice University.Google Scholar
- Patrick Cousot. 2019. Abstract Semantic DependencSyt. aItnic Analysis-26th International Symposium, SAS 2019, Porto, Portugal, October 8-11, 2019, Proceedings (Lecture Notes in Computer Science, Vol. 11822 ), Bor-Yuh Evan Chang (Ed.). Springer, 389-410. https://doi.org/10.1007/978-3-030-32304-2_19 Google Scholar
Cross Ref
- Patrick Cousot and Radhia Cousot. 1977. Abstract Interpretation: A Unified Latice Model for Static Analysis of Programs by Construction or Approximation of FixpointsC. oInference Record of the Fourth ACM Symposium on Principles of Programming Languages, Los Angeles, California, USA, January 1977. 238-252. https://doi.org/10.1145/512950.512973 Google Scholar
Digital Library
- Patrick Cousot and Radhia Cousot. 1979. Systematic Design of Program Analysis FramewoCroknsf. eIrnence Record of the Sixth Annual ACM Symposium on Principles of Programming Languages, San Antonio, Texas, USA, January 1979, Alfred V. Aho, Stephen N. Zilles, and Barry K. Rosen (Eds.). ACM Press, 269-28h2.ttps://doi.org/10.1145/567752.567778Google Scholar
- Bruno Coutinho, Diogo Sampaio, Fernando Magno Quintão Pereira, and Wagner Meira Jr. 2011. Divergence Analysis and Optimizations. In2011 International Conference on Parallel Architectures and Compilation Techniques, PACT 2011, Galveston, TX, USA, October 10-14, 2011. 320-329. https://doi.org/10.1109/PACT. 2011.63 Google Scholar
Digital Library
- Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. 1989. An Eficient Method of Computing Static Single Assignment Form. CIonnference Record of the Sixteenth Annual ACM Symposium on Principles of Programming Languages, Austin, Texas, USA, January 11-13, 1989. ACM Press, 25-35. https://doi.org/10.1145/75277. 75280 Google Scholar
Digital Library
- Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. 1991. Eficiently Computing Static Single Assignment Form and the Control Dependence GraApChM. Trans. Program. Lang. Syst. 13, 4 ( 1991 ), 451-490. https://doi.org/10.1145/115372.115320 Google Scholar
Digital Library
- Craig A. Farrell and Dorota H. Kieronska. 1996. Formal Specification of Parallel SIMD ExecTheuotr.ioCno. mput. Sci. 169, 1 ( 1996 ), 39-65. https://doi.org/10.1016/S0304-3975 ( 96 ) 00113-2 Google Scholar
Digital Library
- Axel Habermaier and Alexander Knapp. 2012. On the Correctness of the SIMT Execution Model of GPProUgsr. aImnming Languages and Systems-21st European Symposium on Programming, ESOP 2012, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2012, Tallinn, Estonia, March 24-April 1, 2012. Proceedings. 316-335. https://doi.org/10.1007/978-3-642-28869-2_16 Google Scholar
Digital Library
- Paul Havlak. 1993. Construction of Thinned Gated Single-Assignment FormL.aInguages and Compilers for Parallel Computing, 6th International Workshop, Portland, Oregon, USA, August 12-14, 1993, Proceedings. 477-499. https: //doi.org/10.1007/3-540-57659-2_28 Google Scholar
Cross Ref
- Mathew S. Hecht and Jefrey D. Ullman. 1974. Characterizations of Reducible Flow GraJp. hAsC.M 21, 3 ( 1974 ), 367-375. https://doi.org/10.1145/321832.321835 Google Scholar
Digital Library
- Pekka Jääskeläinen, Carlos Sánchez de La Lama, Erik Schneter, Kalle Raiskila, Jarmo Takala, and Heikki Berg. 2015. pocl: A Performance-Portable OpenCL Implementati oInnt. J. Parallel Program. 43, 5 ( 2015 ), 752-785. https://doi.org/10.1007/ s10766-014-0320-y Google Scholar
Digital Library
- Neil D. Jones, Peter Sestoft, and Harald Søndergaard. 1989. Mix: A Self-Applicable Partial Evaluator for Experiments in Compiler GenerationL.isp and Symbolic Computation 2, 1 ( 1989 ), 9-50.Google Scholar
- Ralf Karrenberg. 2015. Automatic SIMD Vectorization of SSA-based Control Flow Graphs. Springer. https://doi.org/10.1007/ 978-3-658-10113-8 Google Scholar
Cross Ref
- Máté Kovács, Helmut Seidl, and Bernd Finkbeiner. 2013. Relational abstract interpretation for the verification of 2-hypersafety properties. In2013 ACM SIGSAC Conference on Computer and Communications Security, CCS'13, Berlin, Germany, November 4-8, 2013. 211-222. https://doi.org/10.1145/2508859.2516721 Google Scholar
Digital Library
- Christian Latner. 2004. Loop Optimizer Notes. Retrieved November 20, 2019 fromhttp://nondot.org/sabre/LLVMNotes/ LoopOptimizerNotes.txtGoogle Scholar
- Chris Latner and Vikram S. Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004 ), 20-24 March 2004, San Jose, CA, USA. 75-88. https://doi.org/10.1109/CGO. 2004.1281665 Google Scholar
Cross Ref
- Yunsup Lee, Ronny Krashinsky, Vinod Grover, Stephen W. Keckler, and Krste Asanovic. 2013. Convergence and scalarization for data-parallel architecturPerso. cIenedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2013, Shenzhen, China, February 23-27, 2013. 32 : 1-32 : 11. https://doi.org/10.1109/CGO. 2013.6494995 Google Scholar
Digital Library
- Roland Leißa, Immanuel Hafner, and Sebastian Hack. 2014. Sierra: a SIMD extension for C+P+r. oIcneedings of the 2014 Workshop on Programming models for SIMD/Vector processing, WPMVP 2014, Orlando, Florida, USA, February 16, 2014. 17-24. https://doi.org/10.1145/2568058.2568062 Google Scholar
Digital Library
- Yun Liang, Muhammad Teguh Satria, Kyle Rupnow, and Deming Chen. 2016. An Accurate GPU Performance Model for Efective Control Flow Divergence OptimizatioInEE. E Trans. on CAD of Integrated Circuits and Systems 35, 7 ( 2016 ), 1165-1178. https://doi.org/10.1109/TCAD. 2015.2501303 Google Scholar
Digital Library
- Erik Lindholm, John Nickolls, Stuart F. Oberman, and John Montrym. 2008. NVIDIA Tesla: A Unified Graphics and Computing ArchitecturIeE. EE Micro 28, 2 ( 2008 ), 39-55. https://doi.org/10.1109/MM. 2008.31 Google Scholar
Digital Library
- Taylor Lloyd, Karim Ali, and José Nelson Amaral. 2019. GPUCheck: Detecting CUDA Thread Divergence with Static Analysis. ( 2019 ).Google Scholar
- Jan Christian Menz. 2016. A Coq Library for Finite TypBeas. chelor's thesis, Universität des Saarlandes ( 2016 ).Google Scholar
- Simon Moll and Sebastian Hack. 2018. Partial control-flow linearizatPiroonce. eIndings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018. 543-556. https://doi.org/10.1145/3192366.3192413 Google Scholar
Digital Library
- Flemming Nielson, Hanne R. Nielson, and Chris Hankin. 199P9r.inciples of Program Analysis. Springer-Verlag, Berlin, Heidelberg.Google Scholar
- Hanne Riis Nielson and Flemming Nielson. 1988. Automatic Binding Time Analysis for a Typed lambda-CaSlciu.lus. Comput. Program. 10, 1 ( 1988 ), 139-176. https://doi.org/10.1016/ 0167-6423 ( 88 ) 90025-1 Google Scholar
Digital Library
- NVIDIA. 2017. V100 GPU architecture. https://images.nvidia.com/content/volta-architecture/pdf/volta-architecturewhitepaper.pd.f Accessed: 2019. 11.20.Google Scholar
- Arsène Pérard-Gayot, Richard Membarth, Roland Leißa, Sebastian Hack, and Philipp Slusallek. 2019. Rodent: generating renderers without writing a generatAoCr. M Trans. Graph. 38, 4 ( 2019 ), 40 : 1-40 : 12. https://doi.org/10.1145/3306346. 3322955 Google Scholar
Digital Library
- Mat Pharr and William R Mark. 2012. ispc: A SPMD compiler for high-performance CPU programming. IInnnovative Parallel Computing (InPar), 2012. IEEE, 1-13.Google Scholar
- Sebastian Pop, Pierre Jouvelot, and Georges-André Silber. 2009. In and Out of SSA: A Denotational Specification. ( 2009 ). http://cri.ensmp.fr/classement/doc/E-285.pdfGoogle Scholar
- Oliver Reiche, Christof Kobylko, Frank Hannig, and Jürgen Teich. 2017. Auto-vectorization for image processing DSLs. In Proceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, LCTES 2017, Barcelona, Spain, June 21-22, 2017, Vijay Nagarajan and Zili Shao (Eds.). ACM, 21-30h.ttps://doi.org/10. 1145/3078633.3081039Google Scholar
Digital Library
- Diogo Sampaio, Rafael Martins de Souza, Sylvain Collange, and Fernando Magno Quintão Pereira. 2013. Divergence analysis. ACM Trans. Program. Lang. Syst. 35, 4 ( 2013 ), 13 : 1-13 : 36. https://doi.org/10.1145/2523815 Google Scholar
Digital Library
- Sigurd Schneider. 2018. A verified compiler for a linear imperative / functional intermediate language. Ph.D. Dissertation. Saarland University, Saarbrücken, Germanhy.ttps://publikationen.sulb.uni-saarland.de/handle/20.500.11880/27296Google Scholar
- Alexander Schrijver. 2017A. Course in Combinatorial Optimization. Retrieved November 19, 2019 fromhttps://homepages. cwi.nl/~lex/files/agt3.pIdIIf. Disjoint paths, Theorem 3.Google Scholar
- Gregor Snelting, Torsten Robschink, and Jens Krinke. 2006. Eficient path conditions in dependence graphs for software safety analysisA. CM Trans. Softw. Eng. Methodol. 15, 4 ( 2006 ), 410-457. https://doi.org/10.1145/1178625.1178628 Google Scholar
Digital Library
- Yifan Sun, Xiang Gong, Amir Kavyan Ziabari, Leiming Yu, Xiangyu Li, Saoni Mukherjee, Carter McCardwell, Alejandro Villegas, and David R. Kaeli. 2016. Hetero-mark, a benchmark suite for CPU-GPU collaborative compu2t0i1n6gIE. IEnE International Symposium on Workload Characterization, IISWC 2016, Providence, RI, USA, September 25-27, 2016. 13-22. https://doi.org/10.1109/IISWC. 2016.7581262 Google Scholar
Cross Ref
- Torsten Tholey. 2012. Linear time algorithms for two disjoint paths problems on directed acyclic gTheroarp. hCso.mput. Sci. 465 ( 2012 ), 35-48. https://doi.org/10.1016/j.tcs. 2012. 09.025 Google Scholar
Digital Library
- John R. Tramm, Andrew R. Siegel, Benoit Forget, and Colin Josey. 2014a. Performance Analysis of a Reduced Data Movement Algorithm for Neutron Cross Section Data in Monte Carlo SimulatSioolnvisn.gInSoftware Challenges for Exascale-International Conference on Exascale Applications and Software, EASC 2014, Stockholm, Sweden, April 2-3, 2014, Revised Selected Papers. 39-56. https://doi.org/10.1007/978-3-319-15976-8_3 Google Scholar
Cross Ref
- John R Tramm, Andrew R Siegel, Tanzima Islam, and Martin Schulz. 2014b. XSBench-The Development and Verification of a Performance Abstraction for Monte Carlo Reactor AnalyPsHisY. SIOnR 2014-The Role of Reactor Physics toward a Sustainable Future. Kyoto. https://www.mcs.anl.gov/papers/P5064-0114.pdfGoogle Scholar
- Peng Tu and David A. Padua. 1995. Eficient Building and Placing of Gating FunctioPnrso. ceInedings of the ACM SIGPLAN'95 Conference on Programming Language Design and Implementation (PLDI), La Jolla, California, USA, June 18-21, 1995. 47-55. https://doi.org/10.1145/207110.207115 Google Scholar
Digital Library
- Caterina Urban and Peter Müller. 2018. An Abstract Interpretation Framework for Input Data PUrsoaggraem. Imning Languages and Systems-27th European Symposium on Programming, ESOP 2018, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2018, Thessaloniki, Greece, April 14-20, 2018, Proceedings. 683-710. https://doi.org/10.1007/978-3-319-89884-1_24 Google Scholar
Cross Ref
- Daniel Wasserrab, Denis Lohner, and Gregor Snelting. 2009. On PDG-based noninterference and its modular proof. In Proceedings of the 2009 Workshop on Programming Languages and Analysis for Security, PLAS 2009, Dublin, Ireland, 15-21 June, 2009. 31-44. https://doi.org/10.1145/1554339.1554345 Google Scholar
Digital Library
Index Terms
An abstract interpretation for SPMD divergence on reducible control flow graphs
Recommendations
Abstract interpretation of resolution-based semantics
We extend the abstract interpretation point of view on context-free grammars by Cousot and Cousot to resolution-based logic programs and proof systems. Starting from a transition-based small-step operational semantics of Prolog programs (akin to the ...
Control-flow analysis of function calls and returns by abstract interpretation
Abstract interpretation techniques are used to derive a control-flow analysis for a simple higher-order functional language. The analysis approximates the interprocedural control-flow of both function calls and returns in the presence of first-class ...
Pointer-Based Divergence Analysis for OpenCL 2.0 Programs
A modern GPU is designed with many large thread groups to achieve a high throughput and performance. Within these groups, the threads are grouped into fixed-size SIMD batches in which the same instruction is applied to vectors of data in a lockstep. This ...






Comments