Abstract
We describe, implement and benchmark EigenCFA, an algorithm for accelerating higher-order control-flow analysis (specifically, 0CFA) with a GPU. Ultimately, our program transformations, reductions and optimizations achieve a factor of 72 speedup over an optimized CPU implementation.
We began our investigation with the view that GPUs accelerate high-arithmetic, data-parallel computations with a poor tolerance for branching. Taking that perspective to its limit, we reduced Shivers's abstract-interpretive 0CFA to an algorithm synthesized from linear-algebra operations. Central to this reduction were "abstract" Church encodings, and encodings of the syntax tree and abstract domains as vectors and matrices.
A straightforward (dense-matrix) implementation of EigenCFA performed slower than a fast CPU implementation. Ultimately, sparse-matrix data structures and operations turned out to be the critical accelerants. Because control-flow graphs are sparse in practice (up to 96% empty), our control-flow matrices are also sparse, giving the sparse matrix operations an overwhelming space and speed advantage.
We also achieved speedups by carefully permitting data races. The monotonicity of 0CFA makes it sound to perform analysis operations in parallel, possibly using stale or even partially-updated data.
Supplemental Material
- NVIDIA CUDA Programming Guide 2.3, Aug. 2009.Google Scholar
- F. E. Allen and J. Cocke. A program data flow analysis procedure. Commun. ACM, 19 (3): 137, 1976. ISSN 0001-0782. Google Scholar
Digital Library
- S. Balay, K. Buschelman, V. Eijkhout, W. Gropp, D. Kaushik, M. Knepley, L. C. McInnes, B. Smith, and H. Zhang. Sparse Matrices. In PETSc Users Manual, chapter 3, pages 55--66. 3.0.0 edition, Dec. 2008.Google Scholar
- F. Banterle and R. Giacobazzi. A Fast Implementation of the Octagon Abstract Domain on Graphics Hardware. In H. R. Nielson and G. Filé, editors, phStatic Analysis, volume 4634 of Lecture Notes in Computer Science, chapter 20, pages 315--332. Springer Berlin Heidelberg, Berlin, Heidelberg, 2007. ISBN 978-3-540-74060-5. Google Scholar
Digital Library
- N. Bell and M. Garland. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pages 1--11, New York, NY, USA, 2009. ACM. ISBN 9781-605-5874-4-8. Google Scholar
Digital Library
- S. Chaudhuri. Subcubic algorithms for recursive state machines. In phProceedings of the 35th annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '08, pages 159--169, New York, NY, USA, 2008. ACM. ISBN 9781-595-9368-9-9. Google Scholar
Digital Library
- P. Cousot and R. Cousot. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Conference Record of the Fourth ACM Symposium on Principles of Programming Languages, pages 238--252, New York, NY, USA, 1977. ACM Press. Google Scholar
Digital Library
- P. Cousot and R. Cousot. Systematic design of program analysis frameworks. In POPL '79: Proceedings of the 6th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, pages 269--282, New York, NY, USA, 1979. ACM. Google Scholar
Digital Library
- R. Gupta, L. Pollock, and M. L. Soffa. Parallelizing data flow analysis. 1990.Google Scholar
- M. S. Hecht. phFlow Analysis of Computer Programs. Elsevier Science Inc., New York, NY, USA, 1977. ISBN 0-444-00216-2. Google Scholar
Digital Library
- N. D. Jones. Flow Analysis of Lambda Expressions (Preliminary Version). In Proceedings of the 8th Colloquium on Automata, Languages and Programming, pages 114--128, London, UK, 1981. Springer-Verlag. ISBN 3-540-10843-2. Google Scholar
Digital Library
- G. A. Kildall. A unified approach to global program optimization. In POPL '73: Proceedings of the 1st annual ACM SIGACT-SIGPLAN symposium on Principles of programming languages, pages 194--206, New York, NY, USA, 1973. ACM. Google Scholar
Digital Library
- R. Kramer, R. Gupta, and M. L. Soffa. The Combining DAG: A Technique for Parallel Data Flow Analysis. IEEE Transactions on Parallel and Distributed Systems, 5 (8): 805--813, Aug. 1994. ISSN 1045-9219. Google Scholar
Digital Library
- Y. F. Lee, T. J. Marlowe, and B. G. Ryder. Performing data flow analysis in parallel. In Supercomputing '90: Proceedings of the 1990 ACM/IEEE conference on Supercomputing, pages 942--951, Los Alamitos, CA, USA, 1990. IEEE Computer Society Press. ISBN 0-89791-412-0. Google Scholar
Digital Library
- Y. F. Lee, B. G. Ryder, and M. E. Fiuczynski. Region Analysis: A Parallel Elimination Method for Data Flow Analysis. IEEE Trans. Softw. Eng., 21 (11): 913--926, 1995. ISSN 0098-5589. Google Scholar
Digital Library
- N. P. Lopes and A. Rybalchenko. Distributed and Predictable Software Model Checking. In Proc. of the 12th International Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI), Jan. 2011. Google Scholar
Digital Library
- T. J. Marlowe and B. G. Ryder. An efficient hybrid algorithm for incremental data flow analysis. In POPL '90: Proceedings of the 17th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 184--196, New York, NY, USA, 1990. ACM. ISBN 0-89791-343--4. Google Scholar
Digital Library
- M. Méndez-Lojo, A. Mathew, and K. Pingali. Parallel inclusion-based points-to analysis. In Proceedings of the ACM International Conference on Object Oriented Programming Systems languages and Applications, OOPSLA '10, pages 428--443, New York, NY, USA, 2010. ACM. ISBN 9781-450-3020-3-6. Google Scholar
Digital Library
- J. Midtgaard and D. Van Horn. Subcubic control flow analysis algorithm. Higher-Order and Symbolic Computation, To appear.Google Scholar
- M. Might and O. Shivers. Improving flow analyses via (Γ)CFA: Abstract garbage collection and counting. In ICFP '06: Proceedings of the Eleventh ACM SIGPLAN International Conference on Functional Programming, pages 13--25, New York, NY, USA, 2006. ACM. ISBN 1-59593-309-3. Google Scholar
Digital Library
- M. Might, Y. Smaragdakis, and D. V. Horn. Resolving and exploiting the k-CFA paradox: illuminating functional vs. object-oriented program analysis. In PLDI '10: Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 305--315, New York, NY, USA, 2010. ACM. ISBN 9781-450-3001-9-3. Google Scholar
Digital Library
- J. Palsberg. Closure analysis in constraint form. ACM Transactions on Programming Languages and Systems, 17 (1): 47--62, Jan. 1995. ISSN 0164-0925. Google Scholar
Digital Library
- B. G. Ryder and M. C. Paull. Elimination algorithms for data flow analysis. ACM Comput. Surv., 18 (3): 277--316, 1986. ISSN 0360-0300. Google Scholar
Digital Library
- O. Shivers. Control flow analysis in Scheme. In Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation, volume 23, pages 164--174, New York, NY, USA, July 1988. ACM. ISBN 0-89791-269-1. Google Scholar
Digital Library
- O. G. Shivers. Control-Flow Analysis of Higher-Order Languages. PhD thesis, Carnegie Mellon University, 1991. Google Scholar
Digital Library
- D. Van Horn and H. G. Mairson. Relating complexity and precision in control flow analysis. In ICFP '07: Proceedings of the 12th ACM SIGPLAN International Conference on Functional Programming, pages 85--96, New York, NY, USA, 2007. ACM. ISBN 9781-59593-815-2. Google Scholar
Digital Library
- D. Van Horn and H. G. Mairson. Deciding k-CFA is complete for EXPTIME. In ICFP '08: Proceeding of the 13th ACM SIGPLAN International Conference on Functional Programming, pages 275--282, New York, NY, USA, 2008. ACM. ISBN 978-1-59593-919-7. Google Scholar
Digital Library
- A. Zobel. Parallel interval analysis of data flow equations. volume II. The Penn State University press, Aug. 1990.Google Scholar
Index Terms
EigenCFA: accelerating flow analysis with GPUs
Recommendations
EigenCFA: accelerating flow analysis with GPUs
POPL '11: Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languagesWe describe, implement and benchmark EigenCFA, an algorithm for accelerating higher-order control-flow analysis (specifically, 0CFA) with a GPU. Ultimately, our program transformations, reductions and optimizations achieve a factor of 72 speedup over an ...
A²I: abstract² interpretation
The fundamental idea of Abstract2 Interpretation (A2I), also called meta-abstract interpretation, is to apply abstract interpretation to abstract interpretation-based static program analyses. A2I is generally meant to use abstract interpretation to ...
Environment analysis via ΔCFA
Proceedings of the 2006 POPL ConferenceWe describe a new program-analysis framework, based on CPS and procedure-string abstractions, that can handle critical analyses which the k-CFA framework cannot. We present the main theorems concerning correctness, show an application analysis, and ...







Comments