skip to main content

An abstract interpretation for SPMD divergence on reducible control flow graphs

Published:04 January 2021Publication History
Skip Abstract Section

Abstract

Vectorizing compilers employ divergence analysis to detect at which program point a specific variable is uniform, i.e. has the same value on all SPMD threads that execute this program point. They exploit uniformity to retain branching to counter branch divergence and defer computations to scalar processor units. Divergence is a hyper-property and is closely related to non-interference and binding time. There exist several divergence, binding time, and non-interference analyses already but they either sacrifice precision or make significant restrictions to the syntactical structure of the program in order to achieve soundness.

In this paper, we present the first abstract interpretation for uniformity that is general enough to be applicable to reducible CFGs and, at the same time, more precise than other analyses that achieve at least the same generality.

Our analysis comes with a correctness proof that is to a large part mechanized in Coq. Our experimental evaluation shows that the compile time and the precision of our analysis is on par with LLVM's default divergence analysis that is only sound on more restricted CFGs. At the same time, our analysis is faster and achieves better precision than a state-of-the-art non-interference analysis that is sound and at least as general as our analysis.

Skip Supplemental Material Section

Supplemental Material

References

  1. Martín Abadi, Anindya Banerjee, Nevin Heintze, and Jon G. Riecke. 1999. A Core Calculus of DependePnOcPy. LIn'99, Proceedings of the 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Antonio, TX, USA, January 20-22, 1999. 147-160. https://doi.org/10.1145/292540.292555 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alexander Aiken and David Gay. 1998. Barrier Inference. POInPL '98, Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Diego, CA, USA, January 19-21, 1998. 342-354. https://doi.org/ 10.1145/268946.268974 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Rajeev Alur, Joseph Devieti, Omar S. Navarro Leija, and Nimit Singhania. 2017. GPUDrano: Detecting Uncoalesced Accesses in GPU Programs. InComputer Aided Verification-29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part I. 507-525. https://doi.org/10.1007/978-3-319-63387-9_25 Google ScholarGoogle ScholarCross RefCross Ref
  4. Torben Amtoft. 2008. Slicing for modern program structures: a theory for eliminating irrelevant Ilnof. oPprso.cess. Let. 106, 2 ( 2008 ), 45-51. https://doi.org/10.1016/j.ipl. 2007. 10.002 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Mounir Assaf, David A. Naumann, Julien Signoles, Eric Totel, and Frédéric Tronel. 2017. Hypercollecting semantics and its application to static analysis of information floPrwo. ceInedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, Paris, France, January 18-20, 2017, Giuseppe Castagna and Andrew D. Gordon (Eds.). ACM, 874-887. https://doi.org/10.1145/3009837 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Joel Auslander, Mathai Philipose, Craig Chambers, Susan J. Eggers, and Brian N. Bershad. 1996. Fast, Efective Dynamic Compilation. InProceedings of the ACM SIGPLAN'96 Conference on Programming Language Design and Implementation (PLDI), Philadephia, Pennsylvania, USA, May 21-24, 1996. 149-159. https://doi.org/10.1145/231379.231409 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Robert A. Ballance, Arthur B. Maccabe, and Karl J. Otenstein. 1990. The Program Dependence Web: A Representation Supporting Control, Data, and Demand-Driven Interpretation of Imperative LanguPargoecese. dInings of the ACM SIGPLAN'90 Conference on Programming Language Design and Implementation (PLDI), White Plains, New York, USA, June 20-22, 1990. 257-271. https://doi.org/10.1145/93542.93578 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Gilles Barthe, Mathilde Duclos, and Yassine Lakhnech. 2011. A Computational Indistinguishability Logic for the Bounde Storage Model. InFoundations and Practice of Security-4th Canada-France MITACS Workshop, FPS 2011, Paris, France, May 12-13, 2011, Revised Selected Papers. 102-117. https://doi.org/10.1007/978-3-642-27901-0_9 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. David Bucciarelli et al. 2020. LuxMark, v3h.t1.tps://wiki.luxcorerender.org/LuxMark._vA3ccessed: 2020. 06.30.Google ScholarGoogle Scholar
  10. Anupama Chandrasekhar, Gang Chen, Po-Yu Chen, Wei-Yu Chen, Junjie Gu, Peng Guo, Shruthi Hebbur Prasanna Kumar, Guei-Yuan Lueh, Pankaj Mistry, Wei Pan, Thomas Raoux, and Konrad Trifunovic. 2019. IGC: The Open Source Intel Graphics Compiler. InIEEE/ACM International Symposium on Code Generation and Optimization, CGO 2019, Washington, DC, USA, February 16-20, 2019. 254-265. https://doi.org/10.1109/CGO. 2019.8661189 Google ScholarGoogle ScholarCross RefCross Ref
  11. Michael R. Clarkson and Fred B. Schneider. 2010. HyperpropertiJeosu.rnal of Computer Security 18, 6 ( 2010 ), 1157-1210. https://doi.org/10.3233/JCS-2009-0393 Google ScholarGoogle ScholarCross RefCross Ref
  12. Sylvain Collange. 2011I. dentifying scalar behavior in CUDA kernels. Technical Report. ENS Lyon.https://hal.archivesouvertes.fr/hal-00555134/Google ScholarGoogle Scholar
  13. Charles Consel. 1990. Binding Time Analysis for High Order Untyped Functional LanguaPgreosc. eeIndings of the 1990 ACM Conference on LISP and Functional Programming, LFP 1990, Nice, France, 27-29 June 1990. ACM, 264-272. https: //doi.org/10.1145/91556.91668 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Keith D. Cooper, Timothy J. Harvey, and Ken Kennedy. 2001A. Simple, Fast Dominance Algorithm. Technical Report TR-06-33870. Rice University.Google ScholarGoogle Scholar
  15. Patrick Cousot. 2019. Abstract Semantic DependencSyt. aItnic Analysis-26th International Symposium, SAS 2019, Porto, Portugal, October 8-11, 2019, Proceedings (Lecture Notes in Computer Science, Vol. 11822 ), Bor-Yuh Evan Chang (Ed.). Springer, 389-410. https://doi.org/10.1007/978-3-030-32304-2_19 Google ScholarGoogle ScholarCross RefCross Ref
  16. Patrick Cousot and Radhia Cousot. 1977. Abstract Interpretation: A Unified Latice Model for Static Analysis of Programs by Construction or Approximation of FixpointsC. oInference Record of the Fourth ACM Symposium on Principles of Programming Languages, Los Angeles, California, USA, January 1977. 238-252. https://doi.org/10.1145/512950.512973 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Patrick Cousot and Radhia Cousot. 1979. Systematic Design of Program Analysis FramewoCroknsf. eIrnence Record of the Sixth Annual ACM Symposium on Principles of Programming Languages, San Antonio, Texas, USA, January 1979, Alfred V. Aho, Stephen N. Zilles, and Barry K. Rosen (Eds.). ACM Press, 269-28h2.ttps://doi.org/10.1145/567752.567778Google ScholarGoogle Scholar
  18. Bruno Coutinho, Diogo Sampaio, Fernando Magno Quintão Pereira, and Wagner Meira Jr. 2011. Divergence Analysis and Optimizations. In2011 International Conference on Parallel Architectures and Compilation Techniques, PACT 2011, Galveston, TX, USA, October 10-14, 2011. 320-329. https://doi.org/10.1109/PACT. 2011.63 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. 1989. An Eficient Method of Computing Static Single Assignment Form. CIonnference Record of the Sixteenth Annual ACM Symposium on Principles of Programming Languages, Austin, Texas, USA, January 11-13, 1989. ACM Press, 25-35. https://doi.org/10.1145/75277. 75280 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. 1991. Eficiently Computing Static Single Assignment Form and the Control Dependence GraApChM. Trans. Program. Lang. Syst. 13, 4 ( 1991 ), 451-490. https://doi.org/10.1145/115372.115320 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Craig A. Farrell and Dorota H. Kieronska. 1996. Formal Specification of Parallel SIMD ExecTheuotr.ioCno. mput. Sci. 169, 1 ( 1996 ), 39-65. https://doi.org/10.1016/S0304-3975 ( 96 ) 00113-2 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Axel Habermaier and Alexander Knapp. 2012. On the Correctness of the SIMT Execution Model of GPProUgsr. aImnming Languages and Systems-21st European Symposium on Programming, ESOP 2012, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2012, Tallinn, Estonia, March 24-April 1, 2012. Proceedings. 316-335. https://doi.org/10.1007/978-3-642-28869-2_16 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Paul Havlak. 1993. Construction of Thinned Gated Single-Assignment FormL.aInguages and Compilers for Parallel Computing, 6th International Workshop, Portland, Oregon, USA, August 12-14, 1993, Proceedings. 477-499. https: //doi.org/10.1007/3-540-57659-2_28 Google ScholarGoogle ScholarCross RefCross Ref
  24. Mathew S. Hecht and Jefrey D. Ullman. 1974. Characterizations of Reducible Flow GraJp. hAsC.M 21, 3 ( 1974 ), 367-375. https://doi.org/10.1145/321832.321835 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Pekka Jääskeläinen, Carlos Sánchez de La Lama, Erik Schneter, Kalle Raiskila, Jarmo Takala, and Heikki Berg. 2015. pocl: A Performance-Portable OpenCL Implementati oInnt. J. Parallel Program. 43, 5 ( 2015 ), 752-785. https://doi.org/10.1007/ s10766-014-0320-y Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Neil D. Jones, Peter Sestoft, and Harald Søndergaard. 1989. Mix: A Self-Applicable Partial Evaluator for Experiments in Compiler GenerationL.isp and Symbolic Computation 2, 1 ( 1989 ), 9-50.Google ScholarGoogle Scholar
  27. Ralf Karrenberg. 2015. Automatic SIMD Vectorization of SSA-based Control Flow Graphs. Springer. https://doi.org/10.1007/ 978-3-658-10113-8 Google ScholarGoogle ScholarCross RefCross Ref
  28. Máté Kovács, Helmut Seidl, and Bernd Finkbeiner. 2013. Relational abstract interpretation for the verification of 2-hypersafety properties. In2013 ACM SIGSAC Conference on Computer and Communications Security, CCS'13, Berlin, Germany, November 4-8, 2013. 211-222. https://doi.org/10.1145/2508859.2516721 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Christian Latner. 2004. Loop Optimizer Notes. Retrieved November 20, 2019 fromhttp://nondot.org/sabre/LLVMNotes/ LoopOptimizerNotes.txtGoogle ScholarGoogle Scholar
  30. Chris Latner and Vikram S. Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004 ), 20-24 March 2004, San Jose, CA, USA. 75-88. https://doi.org/10.1109/CGO. 2004.1281665 Google ScholarGoogle ScholarCross RefCross Ref
  31. Yunsup Lee, Ronny Krashinsky, Vinod Grover, Stephen W. Keckler, and Krste Asanovic. 2013. Convergence and scalarization for data-parallel architecturPerso. cIenedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2013, Shenzhen, China, February 23-27, 2013. 32 : 1-32 : 11. https://doi.org/10.1109/CGO. 2013.6494995 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Roland Leißa, Immanuel Hafner, and Sebastian Hack. 2014. Sierra: a SIMD extension for C+P+r. oIcneedings of the 2014 Workshop on Programming models for SIMD/Vector processing, WPMVP 2014, Orlando, Florida, USA, February 16, 2014. 17-24. https://doi.org/10.1145/2568058.2568062 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yun Liang, Muhammad Teguh Satria, Kyle Rupnow, and Deming Chen. 2016. An Accurate GPU Performance Model for Efective Control Flow Divergence OptimizatioInEE. E Trans. on CAD of Integrated Circuits and Systems 35, 7 ( 2016 ), 1165-1178. https://doi.org/10.1109/TCAD. 2015.2501303 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Erik Lindholm, John Nickolls, Stuart F. Oberman, and John Montrym. 2008. NVIDIA Tesla: A Unified Graphics and Computing ArchitecturIeE. EE Micro 28, 2 ( 2008 ), 39-55. https://doi.org/10.1109/MM. 2008.31 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Taylor Lloyd, Karim Ali, and José Nelson Amaral. 2019. GPUCheck: Detecting CUDA Thread Divergence with Static Analysis. ( 2019 ).Google ScholarGoogle Scholar
  36. Jan Christian Menz. 2016. A Coq Library for Finite TypBeas. chelor's thesis, Universität des Saarlandes ( 2016 ).Google ScholarGoogle Scholar
  37. Simon Moll and Sebastian Hack. 2018. Partial control-flow linearizatPiroonce. eIndings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018. 543-556. https://doi.org/10.1145/3192366.3192413 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Flemming Nielson, Hanne R. Nielson, and Chris Hankin. 199P9r.inciples of Program Analysis. Springer-Verlag, Berlin, Heidelberg.Google ScholarGoogle Scholar
  39. Hanne Riis Nielson and Flemming Nielson. 1988. Automatic Binding Time Analysis for a Typed lambda-CaSlciu.lus. Comput. Program. 10, 1 ( 1988 ), 139-176. https://doi.org/10.1016/ 0167-6423 ( 88 ) 90025-1 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. NVIDIA. 2017. V100 GPU architecture. https://images.nvidia.com/content/volta-architecture/pdf/volta-architecturewhitepaper.pd.f Accessed: 2019. 11.20.Google ScholarGoogle Scholar
  41. Arsène Pérard-Gayot, Richard Membarth, Roland Leißa, Sebastian Hack, and Philipp Slusallek. 2019. Rodent: generating renderers without writing a generatAoCr. M Trans. Graph. 38, 4 ( 2019 ), 40 : 1-40 : 12. https://doi.org/10.1145/3306346. 3322955 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Mat Pharr and William R Mark. 2012. ispc: A SPMD compiler for high-performance CPU programming. IInnnovative Parallel Computing (InPar), 2012. IEEE, 1-13.Google ScholarGoogle Scholar
  43. Sebastian Pop, Pierre Jouvelot, and Georges-André Silber. 2009. In and Out of SSA: A Denotational Specification. ( 2009 ). http://cri.ensmp.fr/classement/doc/E-285.pdfGoogle ScholarGoogle Scholar
  44. Oliver Reiche, Christof Kobylko, Frank Hannig, and Jürgen Teich. 2017. Auto-vectorization for image processing DSLs. In Proceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, LCTES 2017, Barcelona, Spain, June 21-22, 2017, Vijay Nagarajan and Zili Shao (Eds.). ACM, 21-30h.ttps://doi.org/10. 1145/3078633.3081039Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Diogo Sampaio, Rafael Martins de Souza, Sylvain Collange, and Fernando Magno Quintão Pereira. 2013. Divergence analysis. ACM Trans. Program. Lang. Syst. 35, 4 ( 2013 ), 13 : 1-13 : 36. https://doi.org/10.1145/2523815 Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Sigurd Schneider. 2018. A verified compiler for a linear imperative / functional intermediate language. Ph.D. Dissertation. Saarland University, Saarbrücken, Germanhy.ttps://publikationen.sulb.uni-saarland.de/handle/20.500.11880/27296Google ScholarGoogle Scholar
  47. Alexander Schrijver. 2017A. Course in Combinatorial Optimization. Retrieved November 19, 2019 fromhttps://homepages. cwi.nl/~lex/files/agt3.pIdIIf. Disjoint paths, Theorem 3.Google ScholarGoogle Scholar
  48. Gregor Snelting, Torsten Robschink, and Jens Krinke. 2006. Eficient path conditions in dependence graphs for software safety analysisA. CM Trans. Softw. Eng. Methodol. 15, 4 ( 2006 ), 410-457. https://doi.org/10.1145/1178625.1178628 Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Yifan Sun, Xiang Gong, Amir Kavyan Ziabari, Leiming Yu, Xiangyu Li, Saoni Mukherjee, Carter McCardwell, Alejandro Villegas, and David R. Kaeli. 2016. Hetero-mark, a benchmark suite for CPU-GPU collaborative compu2t0i1n6gIE. IEnE International Symposium on Workload Characterization, IISWC 2016, Providence, RI, USA, September 25-27, 2016. 13-22. https://doi.org/10.1109/IISWC. 2016.7581262 Google ScholarGoogle ScholarCross RefCross Ref
  50. Torsten Tholey. 2012. Linear time algorithms for two disjoint paths problems on directed acyclic gTheroarp. hCso.mput. Sci. 465 ( 2012 ), 35-48. https://doi.org/10.1016/j.tcs. 2012. 09.025 Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. John R. Tramm, Andrew R. Siegel, Benoit Forget, and Colin Josey. 2014a. Performance Analysis of a Reduced Data Movement Algorithm for Neutron Cross Section Data in Monte Carlo SimulatSioolnvisn.gInSoftware Challenges for Exascale-International Conference on Exascale Applications and Software, EASC 2014, Stockholm, Sweden, April 2-3, 2014, Revised Selected Papers. 39-56. https://doi.org/10.1007/978-3-319-15976-8_3 Google ScholarGoogle ScholarCross RefCross Ref
  52. John R Tramm, Andrew R Siegel, Tanzima Islam, and Martin Schulz. 2014b. XSBench-The Development and Verification of a Performance Abstraction for Monte Carlo Reactor AnalyPsHisY. SIOnR 2014-The Role of Reactor Physics toward a Sustainable Future. Kyoto. https://www.mcs.anl.gov/papers/P5064-0114.pdfGoogle ScholarGoogle Scholar
  53. Peng Tu and David A. Padua. 1995. Eficient Building and Placing of Gating FunctioPnrso. ceInedings of the ACM SIGPLAN'95 Conference on Programming Language Design and Implementation (PLDI), La Jolla, California, USA, June 18-21, 1995. 47-55. https://doi.org/10.1145/207110.207115 Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Caterina Urban and Peter Müller. 2018. An Abstract Interpretation Framework for Input Data PUrsoaggraem. Imning Languages and Systems-27th European Symposium on Programming, ESOP 2018, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2018, Thessaloniki, Greece, April 14-20, 2018, Proceedings. 683-710. https://doi.org/10.1007/978-3-319-89884-1_24 Google ScholarGoogle ScholarCross RefCross Ref
  55. Daniel Wasserrab, Denis Lohner, and Gregor Snelting. 2009. On PDG-based noninterference and its modular proof. In Proceedings of the 2009 Workshop on Programming Languages and Analysis for Security, PLAS 2009, Dublin, Ireland, 15-21 June, 2009. 31-44. https://doi.org/10.1145/1554339.1554345 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An abstract interpretation for SPMD divergence on reducible control flow graphs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!