Abstract
Despite decades of progress, static analysis tools still have great difficulty dealing with programs that combine arithmetic, loops, dynamic memory allocation, and linked data structures. In this paper we draw attention to two fundamental reasons for this difficulty: First, typical underlying program abstractions are low-level and inherently scalar, characterizing compound entities like data structures or results computed through iteration only indirectly. Second, to ensure termination, analyses typically project away the dimension of time, and merge information per program point, which incurs a loss in precision.
As a remedy, we propose to make collective operations first-class in program analysis – inspired by Σ-notation in mathematics, and also by the success of high-level intermediate languages based on @map/reduce@ operations in program generators and aggressive optimizing compilers for domain-specific languages (DSLs). We further propose a novel structured heap abstraction that preserves a symbolic dimension of time, reflecting the program’s loop structure and thus unambiguously correlating multiple temporal points in the dynamic execution with a single point in the program text.
This paper presents a formal model, based on a high-level intermediate analysis language, a practical realization in a prototype tool that analyzes C code, and an experimental evaluation that demonstrates competitive results on a series of benchmarks. Remarkably, our implementation achieves these results in a fully semantics-preserving strongest-postcondition model, which is a worst-case for analysis/verification. The underlying ideas, however, are not tied to this model and would equally apply in other settings, e.g., demand-driven invariant inference in a weakest-precondition model. Given its semantics-preserving nature, our implementation is not limited to analysis for verification, but can also check program equivalence, and translate legacy C code to high-performance DSLs.
Supplemental Material
- Maaz Bin Safeer Ahmad and Alvin Cheung. 2016. Leveraging Parallel Data Processing Frameworks with Verified Lifting. In SYNT/CAV (EPTCS), Vol. 229. 67–83.Google Scholar
- Nada Amin and Tiark Rompf. 2017. Type soundness proofs with definitional interpreters. In POPL. ACM, 666–679.Google Scholar
- Andrew W. Appel. 1998. SSA is Functional Programming. SIGPLAN Notices 33, 4 (1998), 17–20.Google Scholar
Digital Library
- Olaf Bachmann, Paul S. Wang, and Eugene V. Zima. 1994. Chains of Recurrences - a Method to Expedite the Evaluation of Closed-form Functions. In ISSAC. ACM, 242–249.Google Scholar
- Riyadh Baghdadi, Ulysse Beaugnon, Albert Cohen, Tobias Grosser, Michael Kruse, Chandan Reddy, Sven Verdoolaege, Adam Betts, Alastair F. Donaldson, Jeroen Ketema, Javed Absar, Sven van Haastregt, Alexey Kravets, Anton Lokhmotov, Robert David, and Elnar Hajiyev. 2015. PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming. In PACT. IEEE Computer Society, 138–149.Google Scholar
- Mohamed-Walid Benabderrahmane, Louis-Noël Pouchet, Albert Cohen, and Cédric Bastoul. 2010. The Polyhedral Model Is More Widely Applicable Than You Think. In CC (Lecture Notes in Computer Science), Vol. 6011. Springer, 283–303.Google Scholar
- Jan A. Bergstra, T. B. Dinesh, John Field, and Jan Heering. 1996. A Complete Transformational Toolkit for Compilers. In ESOP (Lecture Notes in Computer Science), Vol. 1058. Springer, 92–107.Google Scholar
- Dirk Beyer. 2012. Competition on Software Verification - (SV-COMP). In TACAS (Lecture Notes in Computer Science), Vol. 7214. Springer, 504–524.Google Scholar
- Dirk Beyer and M. Erkan Keremoglu. 2011. CPAchecker: A Tool for Configurable Software Verification. In CAV (Lecture Notes in Computer Science), Vol. 6806. Springer, 184–190.Google Scholar
- Jesse D. Bingham and Zvonimir Rakamaric. 2006. A Logic and Decision Procedure for Predicate Abstraction of HeapManipulating Programs. In VMCAI (Lecture Notes in Computer Science), Vol. 3855. Springer, 207–221.Google Scholar
- Stephen Brookes and Peter W. O’Hearn. 2016. Concurrent separation logic. SIGLOG News 3, 3 (2016), 47–65.Google Scholar
Digital Library
- Kevin J. Brown, HyoukJoong Lee, Tiark Rompf, Arvind K. Sujeeth, Christopher De Sa, Christopher R. Aberger, and Kunle Olukotun. 2016. Have abstraction and eat performance, too: optimized heterogeneous computing with parallel patterns. In CGO. ACM, 194–205.Google Scholar
- Kevin J. Brown, Arvind K. Sujeeth, HyoukJoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2011. A Heterogeneous Parallel Framework for Domain-Specific Languages. 20th International Conference on Parallel Architectures and Compilation Techniques.Google Scholar
Digital Library
- Cristiano Calcagno, Dino Distefano, Jérémy Dubreil, Dominik Gabi, Pieter Hooimeijer, Martino Luca, Peter W. O’Hearn, Irene Papakonstantinou, Jim Purbrick, and Dulma Rodriguez. 2015. Moving Fast with Software Verification. In NFM (Lecture Notes in Computer Science), Vol. 9058. Springer, 3–11.Google Scholar
- Cristiano Calcagno, Dino Distefano, Peter W. O’Hearn, and Hongseok Yang. 2011. Compositional Shape Analysis by Means of Bi-Abduction. J. ACM 58, 6 (2011), 26:1–26:66.Google Scholar
Digital Library
- Manuel M. T. Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, and Vinod Grover. 2011. Accelerating Haskell array codes with multicore GP Us. In DAMP. ACM, 3–14.Google Scholar
- Edsger Wybe Dijkstra. 1976. A Discipline of Programming (1st ed.). Prentice Hall PTR, Upper Saddle River, NJ, USA.Google Scholar
Digital Library
- Isil Dillig, Thomas Dillig, and Alex Aiken. 2011a. Cuts from proofs: a complete and practical technique for solving linear inequalities over integers. Formal Methods in System Design 39, 3 (2011), 246–260.Google Scholar
Digital Library
- Isil Dillig, Thomas Dillig, and Alex Aiken. 2011b. Precise reasoning for programs using containers. In POPL. ACM, 187–200.Google Scholar
- Dino Distefano, Peter W. O’Hearn, and Hongseok Yang. 2006. A Local Shape Analysis Based on Separation Logic. In TACAS (Lecture Notes in Computer Science), Vol. 3920. Springer, 287–302.Google Scholar
- Robert A. Van Engelen, Johnnie Birch, Yixin Shou, Burt Walsh, and Kyle A. Gallivan. 2004. A unified framework for nonlinear dependence testing and symbolic analysis. In ICS. ACM, 106–115.Google Scholar
- Azadeh Farzan and Zachary Kincaid. 2015. Compositional Recurrence Analysis. In FMCAD. IEEE, 57–64.Google Scholar
- Joseph Fourier. 1820. Extrait d’une mémoire sur le refroidissement séculaire du globe terrestre. Bulletin des Sciences par la Société Philomathique de Paris, April 1820 (1820), 58–70.Google Scholar
- Denis Gopan, Thomas W. Reps, and Shmuel Sagiv. 2005. A framework for numeric analysis of array operations. In POPL. ACM, 338–350.Google Scholar
- Arie Gurfinkel, Temesghen Kahsai, Anvesh Komuravelli, and Jorge A. Navas. 2015. The SeaHorn Verification Framework. In CAV (1) (Lecture Notes in Computer Science), Vol. 9206. Springer, 343–361.Google Scholar
- Thomas A. Henzinger, Thibaud Hottelier, Laura Kovács, and Andrey Rybalchenko. 2010. Aligators for Arrays (Tool Paper). In LPAR (Yogyakarta) (Lecture Notes in Computer Science), Vol. 6397. Springer, 348–356.Google Scholar
- David Van Horn and Matthew Might. 2010. Abstracting abstract machines. In ICFP. ACM, 51–62.Google Scholar
- Kenneth E. Iverson. 1980. Notation as a Tool of Thought. Commun. ACM 23, 8 (1980), 444–465.Google Scholar
Digital Library
- Bertrand Jeannet, Peter Schrammel, and Sriram Sankaranarayanan. 2014. Abstract acceleration of general linear loops. In POPL. ACM, 529–540.Google Scholar
- Shoaib Kamil, Alvin Cheung, Shachar Itzhaky, and Armando Solar-Lezama. 2016. Verified lifting of stencil computations. In PLDI. ACM, 711–726.Google Scholar
- Zachary Kincaid, Jason Breck, Ashkan Forouhi Boroujeni, and Thomas Reps. 2017. Compositional Recurrence Analysis Revisited. In PLDI.Google Scholar
- Kathleen Knobe and Vivek Sarkar. 1998. Array SSA Form and Its Use in Parallelization. In POPL. ACM, 107–120.Google Scholar
- Laura Kovács. 2008. Reasoning Algebraically About P-Solvable Loops. In TACAS (Lecture Notes in Computer Science), Vol. 4963. Springer, 249–264.Google Scholar
- HyoukJoong Lee, Kevin J. Brown, Arvind K. Sujeeth, Hassan Chafi, Tiark Rompf, Martin Odersky, and Kunle Olukotun. 2011. Implementing Domain-Specific Languages for Heterogeneous Parallel Computing. IEEE Micro 31, 5 (2011), 42–53.Google Scholar
Digital Library
- Sorin Lerner, David Grove, and Craig Chambers. 2002. Composing dataflow analyses and transformations. In POPL. ACM, 270–282.Google Scholar
- Sorin Lerner, Todd D. Millstein, and Craig Chambers. 2003. Automatically proving the correctness of compiler optimizations. In PLDI. 220–231.Google Scholar
- Ivan Llopard, Christian Fabre, and Albert Cohen. 2017. From a Formalized Parallel Action Language to Its Efficient Code Generation. ACM Trans. Embedded Comput. Syst. 16, 2 (2017), 37:1–37:28.Google Scholar
Digital Library
- Trevor L. McDonell, Manuel M. T. Chakravarty, Vinod Grover, and Ryan R. Newton. 2015. Type-safe runtime code generation: accelerate to LLVM. In Haskell. ACM, 201–212.Google Scholar
- Charith Mendis, Jeffrey Bosboom, Kevin Wu, Shoaib Kamil, Jonathan Ragan-Kelley, Sylvain Paris, Qin Zhao, and Saman P. Amarasinghe. 2015. Helium: lifting high-performance stencil kernels from stripped x86 binaries to Halide DSL code. In PLDI. ACM, 391–402.Google Scholar
- Hakjoo Oh, Kihong Heo, Wonchan Lee, Woosuk Lee, and Kwangkeun Yi. 2012. Design and implementation of sparse global analyses for C-like languages. In PLDI. ACM, 229–238.Google Scholar
- Scott Owens, Magnus O. Myreen, Ramana Kumar, and Yong Kiam Tan. 2016. Functional Big-Step Semantics. In ESOP (Lecture Notes in Computer Science), Vol. 9632. Springer, 589–615.Google Scholar
- William Pugh. 1991. The Omega test: a fast and practical integer programming algorithm for dependence analysis. In SC. ACM, 4–13.Google Scholar
- Cosmin Radoi, Stephen J. Fink, Rodric M. Rabbah, and Manu Sridharan. 2014. Translating imperative code to MapReduce. In OOPSLA. ACM, 909–927.Google Scholar
- Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman P. Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In PLDI. ACM, 519–530.Google Scholar
- Lyle Ramshaw. 1988. Eliminating go to’s while preserving program structure. J. ACM 35, 4 (1988), 893–920.Google Scholar
Digital Library
- Veselin Raychev, Madanlal Musuvathi, and Todd Mytkowicz. 2015. Parallelizing user-defined aggregations using symbolic execution. In SOSP. ACM, 153–167.Google Scholar
- Thomas W. Reps, Emma Turetsky, and Prathmesh Prabhu. 2016. Newtonian program analysis via tensor product. In POPL. ACM, 663–677.Google Scholar
- John C. Reynolds. 2002. Separation Logic: A Logic for Shared Mutable Data Structures. In LICS. IEEE Computer Society, 55–74.Google Scholar
Digital Library
- Tiark Rompf and Kevin J. Brown. 2017. Functional parallels of sequential imperatives (short paper). In PEPM. ACM, 83–88.Google Scholar
- Tiark Rompf, Arvind K. Sujeeth, Nada Amin, Kevin Brown, Vojin Jovanovic, HyoukJoong Lee, Manohar Jonnalagedda, Kunle Olukotun, and Martin Odersky. 2013. Optimizing Data Structures in High-Level Programs (POPL).Google Scholar
- Tiark Rompf, Arvind K. Sujeeth, Kevin J. Brown, HyoukJoong Lee, Hassan Chafi, and Kunle Olukotun. 2014. Surgical precision JIT compilers. In PLDI. ACM, 41–52.Google Scholar
- Tiark Rompf, Arvind K. Sujeeth, HyoukJoong Lee, Kevin J. Brown, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2011. Building-Blocks for Performance Oriented DSLs. In DSL (EPTCS), Vol. 66. 93–117.Google Scholar
Cross Ref
- Shmuel Sagiv, Thomas W. Reps, and Reinhard Wilhelm. 2002. Parametric shape analysis via 3-valued logic. ACM Trans. Program. Lang. Syst. 24, 3 (2002), 217–298.Google Scholar
Digital Library
- Jack Schwartz. 1970. Set theory as a language for program specification and programming. Courant Institute of Mathematical Sciences, New York University 12 (1970), 193–208.Google Scholar
- Jeremy G. Siek. 2016. Denotational Semantics of IMP without the Least Fixed Point. http://siek.blogspot.ch/2016/12/ denotational-semantics-of-imp-without.html.Google Scholar
- Jeremy G. Siek. 2017. Declarative semantics for functional languages: compositional, extensional, and elementary. CoRR abs/1707.03762 (2017). arXiv: 1707.03762 http://arxiv.org/abs/1707.03762Google Scholar
- Calvin Smith and Aws Albarghouthi. 2016. MapReduce program synthesis. In PLDI. ACM, 326–340.Google Scholar
- Michel Steuwer, Christian Fensch, Sam Lindley, and Christophe Dubach. 2015. Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code. In ICFP. ACM, 205–217.Google Scholar
- Michel Steuwer, Toomas Remmelg, and Christophe Dubach. 2017. Lift: a functional data-parallel IR for high-performance GP U code generation. In CGO. ACM, 74–85.Google Scholar
- Arvind K. Sujeeth, Kevin J. Brown, HyoukJoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2014. Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages. ACM Trans. Embedded Comput. Syst. 13, 4s (2014), 134:1–134:25.Google Scholar
Digital Library
- A. K. Sujeeth, H. Lee, K. J. Brown, T. Rompf, Michael Wu, A. R. Atreya, M. Odersky, and K. Olukotun. 2011. OptiML: an Implicitly Parallel Domain-Specific Language for Machine Learning. In Proceedings of the 28th International Conference on Machine Learning (ICML).Google Scholar
- Bo Joel Svensson, Mary Sheeran, and Ryan R. Newton. 2014. Design exploration through code-generating DSLs. Commun. ACM 57, 6 (2014), 56–63.Google Scholar
Digital Library
- Bo Joel Svensson, Michael Vollmer, Eric Holk, Trevor L. McDonell, and Ryan R. Newton. 2015. Converting data-parallelism to task-parallelism by rewrites: purely functional programs across multiple GP Us. In FHPC/ICFP. ACM, 12–22.Google Scholar
- Tian Tan, Yue Li, and Jingling Xue. 2017. Efficient and precise points-to analysis: modeling the heap by merging equivalent automata. In PLDI. ACM, 278–291.Google Scholar
- Robert Endre Tarjan. 1981. Fast Algorithms for Solving Path Problems. J. ACM 28, 3 (1981), 594–614.Google Scholar
Digital Library
- Ross Tate, Michael Stepp, and Sorin Lerner. 2010. Generating compiler optimizations from proofs. In POPL. 389–402.Google Scholar
- Ross Tate, Michael Stepp, Zachary Tatlock, and Sorin Lerner. 2011. Equality Saturation: A New Approach to Optimization. Logical Methods in Computer Science 7, 1 (2011).Google Scholar
- Michael Vollmer, Bo Joel Svensson, Eric Holk, and Ryan R. Newton. 2015. Meta-programming and auto-tuning in the search for high performance GP U code. In FHPC/ICFP. ACM, 1–11.Google Scholar
- Bin Xin, William N. Sumner, and Xiangyu Zhang. 2008. Efficient program execution indexing. In PLDI. ACM, 238–248.Google Scholar
- Khaled Yakdan, Sebastian Eschweiler, Elmar Gerhards-Padilla, and Matthew Smith. 2015. No More Gotos: Decompilation Using Pattern-Independent Control-Flow Structuring and Semantic-Preserving Transformations. In NDSS. The Internet Society.Google Scholar
- He Zhu, Stephen Magill, and Suresh Jagannathan. 2018. A data-driven CHC solver. In PLDI. ACM, 707–721.Google Scholar
Index Terms
Precise reasoning with structured time, structured heaps, and collective operations
Recommendations
Precise flow-insensitive may-alias analysis is NP-hard
Determining aliases is one of the foundamental static analysis problems, in part because the precision with which this problem is solved can affect the precision of other analyses such as live variables, available expressions, and constant propagation. ...
Soundly Handling Static Fields: Issues, Semantics and Analysis
Although in most cases class initialization works as expected, some static fields may be read before being initialized, despite being initialized in their corresponding class initializer. We propose an analysis which compute, for each program point, the ...
Precise shape analysis using field sensitivity
We present a static shape analysis technique to infer the shapes of the heap structures created by a program at run time. Our technique is field sensitive in that it uses field information to compute the shapes. The shapes of the heap structures are ...






Comments