Abstract
Given an edge-labeled graph, context-free language reachability (CFL-reachability) computes reachable node pairs by deriving new edges and adding them to the graph. The redundancy that limits the scalability of CFL-reachability manifests as redundant derivations, i.e., identical edges can be derived multiple times due to the many paths between two reachable nodes. We observe that most redundancy arises from the derivations involving transitive relations of reachable node pairs. Unfortunately, existing techniques for reducing redundancy in transitive-closure-based problems are either ineffective or inapplicable to identifying and eliminating redundant derivations during on-the-fly CFL-reachability solving.
This paper proposes a scalable yet precision-preserving approach to all-pairs CFL-reachability analysis by taming its transitive redundancy. Our key insight is that transitive relations are intrinsically ordered, and utilizing the order for edge derivation can avoid most redundancy. To address the challenges in determining the derivation order from the dynamically changed graph during CFL-reachability solving, we introduce a hybrid graph representation by combining spanning trees and adjacency lists, together with a dynamic construction algorithm. Based on this representation, we propose a fast and effective partially ordered algorithm POCR to boost the performance of CFL-reachability analysis by reducing its transitive redundancy during on-the-fly solving. Our experiments on context-sensitive value-flow analysis and field-sensitive alias analysis for C/C++ demonstrate the promising performance of POCR. On average, POCR eliminates 98.50% and 97.26% redundant derivations respectively for the value-flow and alias analysis, achieving speedups of 21.48× and 19.57× over the standard CFL-reachability algorithm. We also compare POCR with two recent open-source tools, Graspan (a CFL-reachability solver) and Soufflé (a Datalog engine). The results demonstrate that POCR is over 3.67× faster than Graspan and Soufflé on average for both value-flow analysis and alias analysis.
- Alfred V. Aho, Michael R Garey, and Jeffrey D. Ullman. 1972. The transitive reduction of a directed graph. SIAM J. Comput., 1, 2 (1972), 131–137. https://doi.org/10.1137/0201008
Google Scholar
Digital Library
- Rajeev Alur, Michael Benedikt, Kousha Etessami, Patrice Godefroid, Thomas Reps, and Mihalis Yannakakis. 2005. Analysis of recursive state machines. ACM Transactions on Programming Languages and Systems (TOPLAS), 27, 4 (2005), 786–818. https://doi.org/10.1007/3-540-44585-4_18
Google Scholar
Cross Ref
- Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly declarative specification of sophisticated points-to analyses. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications. 243–262. https://doi.org/10.1145/1640089.1640108
Google Scholar
Digital Library
- Krishnendu Chatterjee, Bhavya Choudhary, and Andreas Pavlogiannis. 2018. Optimal Dyck reachability for data-dependence and alias analysis. Proc. ACM Program. Lang., 2, POPL (2018), 30:1–30:30. https://doi.org/10.48550/arXiv.1910.00241
Google Scholar
Digital Library
- Swarat Chaudhuri. 2008. Subcubic algorithms for recursive state machines. In Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 159–169. https://doi.org/10.1145/1328897.1328460
Google Scholar
Digital Library
- Manuel Fähndrich, Jeffrey S Foster, Zhendong Su, and Alexander Aiken. 1998. Partial online cycle elimination in inclusion constraint graphs. In Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation. 85–96. https://doi.org/10.1145/277652.277667
Google Scholar
Digital Library
- Olivier Gauwin, Anca Muscholl, and Michael Raskin. 2019. Minimization of visibly pushdown automata is NP-complete. arXiv preprint arXiv:1907.09563, https://doi.org/10.48550/arXiv.1907.09563
Google Scholar
- Ben Hardekopf and Calvin Lin. 2007. The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation. 290–299. https://doi.org/10.1145/1273442.1250767
Google Scholar
Digital Library
- Matthias Heizmann, Christian Schilling, and Daniel Tischner. 2017. Minimization of visibly pushdown automata using partial Max-SAT. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. 461–478. https://doi.org/10.1007/978-3-662-54577-5_27
Google Scholar
Digital Library
- Harry T Hsu. 1975. An algorithm for finding a minimal equivalent graph of a digraph. Journal of the ACM (JACM), 22, 1 (1975), 11–16. https://doi.org/10.1145/321864.321866
Google Scholar
Digital Library
- Giuseppe F. Italiano. 1986. Amortized efficiency of a path retrieval data structure. Theoretical Computer Science, 48 (1986), 273–281. https://doi.org/10.1016/0304-3975(86)90098-8
Google Scholar
Cross Ref
- Herbert Jordan, Bernhard Scholz, and Pavle Subotić. 2016. Soufflé. https://github.com/souffle-lang/souffle
Google Scholar
- Herbert Jordan, Bernhard Scholz, and Pavle Subotić. 2016. Soufflé: On synthesis of program analyzers. In International Conference on Computer Aided Verification. 422–430. https://doi.org/10.1007/978-3-319-41540-6_23
Google Scholar
Cross Ref
- John Kodumal and Alex Aiken. 2004. The set constraint/CFL reachability connection in practice. ACM Sigplan Notices, 39, 6 (2004), 207–218. https://doi.org/10.1145/996893.996867
Google Scholar
Digital Library
- Yuxiang Lei and Yulei Sui. 2019. Fast and precise handling of positive weight cycles for field-sensitive pointer analysis. In International Static Analysis Symposium. 27–47. https://doi.org/10.1007/978-3-030-32304-2_3
Google Scholar
Digital Library
- Yuxiang Lei, Yulei Sui, Shuo Ding, and Qirun Zhang. 2022. Artifact of “Taming Transitive Redundancy for Context-Free Language Reachability”. https://doi.org/10.5281/zenodo.7066401
Google Scholar
Digital Library
- Yuanbo Li, Qirun Zhang, and Thomas Reps. 2020. Fast graph simplification for interleaved Dyck-reachability. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. 780–793. https://doi.org/10.1145/3492428
Google Scholar
Digital Library
- David Melski and Thomas Reps. 2000. Interconvertibility of a class of set constraints and context-free-language reachability. Theoretical Computer Science, 248, 1-2 (2000), 29–98. https://doi.org/10.1016/S0304-3975(00)00049-9
Google Scholar
Digital Library
- Dennis M Moyles and Gerald L Thompson. 1969. An algorithm for finding a minimum equivalent graph of a digraph. Journal of the ACM (JACM), 16, 3 (1969), 455–460. https://doi.org/10.1145/321526.321534
Google Scholar
Digital Library
- Nomair A Naeem and Ondrej Lhoták. 2008. Typestate-like analysis of multiple interacting objects. ACM Sigplan Notices, 43, 10 (2008), 347–366. https://doi.org/10.1145/1449955.1449792
Google Scholar
Digital Library
- Patrick Nappa, David Zhao, Pavle Subotić, and Bernhard Scholz. 2019. Fast parallel equivalence relations in a datalog compiler. In 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT). 82–96. https://doi.org/10.1109/PACT.2019.00015
Google Scholar
Cross Ref
- Esko Nuutila and Eljas Soisalon-Soininen. 1994. On finding the strongly connected components in a directed graph. Inform. Process. Lett., 49, 1 (1994), 9–14. https://doi.org/10.1016/0020-0190(94)90047-7
Google Scholar
Digital Library
- Fernando Magno Quintao Pereira and Daniel Berlin. 2009. Wave propagation and deep propagation for pointer analysis. In 2009 International Symposium on Code Generation and Optimization. 126–135. https://doi.org/10.1109/CGO.2009.9
Google Scholar
Digital Library
- Jakob Rehof and Manuel Fähndrich. 2001. Type-base flow analysis: from polymorphic subtyping to CFL-reachability. ACM SIGPLAN Notices, 36, 3 (2001), 54–66. https://doi.org/10.1145/360204.360208
Google Scholar
Digital Library
- Thomas Reps. 1995. Shape analysis as a generalized path problem. In Proceedings of the 1995 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation. 1–11. https://doi.org/10.1145/215465.215466
Google Scholar
Digital Library
- Thomas Reps. 2000. Undecidability of context-sensitive data-dependence analysis. ACM Transactions on Programming Languages and Systems (TOPLAS), 22, 1 (2000), 162–186. https://doi.org/10.1145/345099.345137
Google Scholar
Digital Library
- Thomas Reps, Susan Horwitz, and Mooly Sagiv. 1995. Precise interprocedural dataflow analysis via graph reachability. In Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 49–61. https://doi.org/10.1145/199448.199462
Google Scholar
Digital Library
- Atanas Rountev and Satish Chandra. 2000. Off-line variable substitution for scaling points-to analysis. Acm Sigplan Notices, 35, 5 (2000), 47–56. https://doi.org/10.1145/358438.349310
Google Scholar
Digital Library
- Yu Su, Ding Ye, and Jingling Xue. 2014. Parallel pointer analysis with CFL-reachability. In 2014 43rd International Conference on Parallel Processing. 451–460. https://doi.org/10.1109/ICPP.2014.54
Google Scholar
Digital Library
- Zhendong Su, Manuel Fähndrich, and Alexander Aiken. 2000. Projection merging: Reducing redundancies in inclusion constraint graphs. In Proceedings of the 27th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 81–95. https://doi.org/10.1145/325694.325706
Google Scholar
Digital Library
- Yulei Sui and Jingling Xue. 2016. SVF: Interprocedural Static Value-Flow Analysis in LLVM. In CC ’16. 265–266. https://doi.org/10.1145/2892208.2892235
Google Scholar
Digital Library
- Yulei Sui, Ding Ye, and Jingling Xue. 2014. Detecting memory leaks statically with full-sparse value-flow analysis. IEEE Transactions on Software Engineering, 40, 2 (2014), 107–122. https://doi.org/10.1109/TSE.2014.2302311
Google Scholar
Digital Library
- Robert Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM journal on computing, 1, 2 (1972), 146–160. https://doi.org/10.1109/SWAT.1971.10
Google Scholar
Digital Library
- Haijun Wang, Xiaofei Xie, Yi Li, Cheng Wen, Yuekang Li, Yang Liu, Shengchao Qin, Hongxu Chen, and Yulei Sui. 2020. Typestate-Guided Fuzzer for Discovering Use-after-Free Vulnerabilities. In 42nd International Conference on Software Engineering. https://doi.org/10.1145/3377811.3380386
Google Scholar
Digital Library
- Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing Xu, and Ardalan Amiri Sani. 2017. Graspan: A single-machine disk-based graph system for interprocedural static analyses of large-scale systems code. ACM SIGARCH Computer Architecture News, 45, 1 (2017), 389–404. https://doi.org/10.1145/3093336.3037744
Google Scholar
Digital Library
- Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing Xu, and Ardalan Amiri Sani. 2020. Graspan-cpp. https://github.com/Graspan/graspan-cpp
Google Scholar
- Guoqing Xu, Atanas Rountev, and Manu Sridharan. 2009. Scaling CFL-reachability-based points-to analysis using context-sensitive must-not-alias analysis. In European Conference on Object-Oriented Programming. 98–122. https://doi.org/10.1007/978-3-642-03013-0_6
Google Scholar
Digital Library
- Hao Yuan and Patrick Eugster. 2009. An efficient algorithm for solving the dyck-cfl reachability problem on trees. In European Symposium on Programming. 175–189. https://doi.org/10.1007/978-3-642-00590-9_13
Google Scholar
Digital Library
- Qirun Zhang, Michael R Lyu, Hao Yuan, and Zhendong Su. 2013. Fast algorithms for Dyck-CFL-reachability with applications to alias analysis. In Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation. 435–446. https://doi.org/10.1145/2491956.2462159
Google Scholar
Digital Library
- Xin Zheng and Radu Rugina. 2008. Demand-driven alias analysis for C. In Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 197–208. https://doi.org/10.1145/1328897.1328464
Google Scholar
Digital Library
Index Terms
Taming transitive redundancy for context-free language reachability
Recommendations
Precise and scalable context-sensitive pointer analysis via value flow graph
ISMM '13: Proceedings of the 2013 international symposium on memory managementIn this paper, we propose a novel method for context-sensitive pointer analysis using the value flow graph (VFG) formulation. We achieve context-sensitivity by simultaneously applying function cloning and computing context-free language reachability (...
Precise and scalable context-sensitive pointer analysis via value flow graph
ISMM '13: Proceedings of the 2013 international symposium on memory managementIn this paper, we propose a novel method for context-sensitive pointer analysis using the value flow graph (VFG) formulation. We achieve context-sensitivity by simultaneously applying function cloning and computing context-free language reachability (...
Precise and scalable context-sensitive pointer analysis via value flow graph
ISMM '13In this paper, we propose a novel method for context-sensitive pointer analysis using the value flow graph (VFG) formulation. We achieve context-sensitivity by simultaneously applying function cloning and computing context-free language reachability (...






Comments