skip to main content

Taming transitive redundancy for context-free language reachability

Published:31 October 2022Publication History
Skip Abstract Section

Abstract

Given an edge-labeled graph, context-free language reachability (CFL-reachability) computes reachable node pairs by deriving new edges and adding them to the graph. The redundancy that limits the scalability of CFL-reachability manifests as redundant derivations, i.e., identical edges can be derived multiple times due to the many paths between two reachable nodes. We observe that most redundancy arises from the derivations involving transitive relations of reachable node pairs. Unfortunately, existing techniques for reducing redundancy in transitive-closure-based problems are either ineffective or inapplicable to identifying and eliminating redundant derivations during on-the-fly CFL-reachability solving.

This paper proposes a scalable yet precision-preserving approach to all-pairs CFL-reachability analysis by taming its transitive redundancy. Our key insight is that transitive relations are intrinsically ordered, and utilizing the order for edge derivation can avoid most redundancy. To address the challenges in determining the derivation order from the dynamically changed graph during CFL-reachability solving, we introduce a hybrid graph representation by combining spanning trees and adjacency lists, together with a dynamic construction algorithm. Based on this representation, we propose a fast and effective partially ordered algorithm POCR to boost the performance of CFL-reachability analysis by reducing its transitive redundancy during on-the-fly solving. Our experiments on context-sensitive value-flow analysis and field-sensitive alias analysis for C/C++ demonstrate the promising performance of POCR. On average, POCR eliminates 98.50% and 97.26% redundant derivations respectively for the value-flow and alias analysis, achieving speedups of 21.48× and 19.57× over the standard CFL-reachability algorithm. We also compare POCR with two recent open-source tools, Graspan (a CFL-reachability solver) and Soufflé (a Datalog engine). The results demonstrate that POCR is over 3.67× faster than Graspan and Soufflé on average for both value-flow analysis and alias analysis.

References

  1. Alfred V. Aho, Michael R Garey, and Jeffrey D. Ullman. 1972. The transitive reduction of a directed graph. SIAM J. Comput., 1, 2 (1972), 131–137. https://doi.org/10.1137/0201008 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Rajeev Alur, Michael Benedikt, Kousha Etessami, Patrice Godefroid, Thomas Reps, and Mihalis Yannakakis. 2005. Analysis of recursive state machines. ACM Transactions on Programming Languages and Systems (TOPLAS), 27, 4 (2005), 786–818. https://doi.org/10.1007/3-540-44585-4_18 Google ScholarGoogle ScholarCross RefCross Ref
  3. Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly declarative specification of sophisticated points-to analyses. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications. 243–262. https://doi.org/10.1145/1640089.1640108 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Krishnendu Chatterjee, Bhavya Choudhary, and Andreas Pavlogiannis. 2018. Optimal Dyck reachability for data-dependence and alias analysis. Proc. ACM Program. Lang., 2, POPL (2018), 30:1–30:30. https://doi.org/10.48550/arXiv.1910.00241 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Swarat Chaudhuri. 2008. Subcubic algorithms for recursive state machines. In Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 159–169. https://doi.org/10.1145/1328897.1328460 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Manuel Fähndrich, Jeffrey S Foster, Zhendong Su, and Alexander Aiken. 1998. Partial online cycle elimination in inclusion constraint graphs. In Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation. 85–96. https://doi.org/10.1145/277652.277667 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Olivier Gauwin, Anca Muscholl, and Michael Raskin. 2019. Minimization of visibly pushdown automata is NP-complete. arXiv preprint arXiv:1907.09563, https://doi.org/10.48550/arXiv.1907.09563 Google ScholarGoogle Scholar
  8. Ben Hardekopf and Calvin Lin. 2007. The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation. 290–299. https://doi.org/10.1145/1273442.1250767 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Matthias Heizmann, Christian Schilling, and Daniel Tischner. 2017. Minimization of visibly pushdown automata using partial Max-SAT. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. 461–478. https://doi.org/10.1007/978-3-662-54577-5_27 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Harry T Hsu. 1975. An algorithm for finding a minimal equivalent graph of a digraph. Journal of the ACM (JACM), 22, 1 (1975), 11–16. https://doi.org/10.1145/321864.321866 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Giuseppe F. Italiano. 1986. Amortized efficiency of a path retrieval data structure. Theoretical Computer Science, 48 (1986), 273–281. https://doi.org/10.1016/0304-3975(86)90098-8 Google ScholarGoogle ScholarCross RefCross Ref
  12. Herbert Jordan, Bernhard Scholz, and Pavle Subotić. 2016. Soufflé. https://github.com/souffle-lang/souffle Google ScholarGoogle Scholar
  13. Herbert Jordan, Bernhard Scholz, and Pavle Subotić. 2016. Soufflé: On synthesis of program analyzers. In International Conference on Computer Aided Verification. 422–430. https://doi.org/10.1007/978-3-319-41540-6_23 Google ScholarGoogle ScholarCross RefCross Ref
  14. John Kodumal and Alex Aiken. 2004. The set constraint/CFL reachability connection in practice. ACM Sigplan Notices, 39, 6 (2004), 207–218. https://doi.org/10.1145/996893.996867 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yuxiang Lei and Yulei Sui. 2019. Fast and precise handling of positive weight cycles for field-sensitive pointer analysis. In International Static Analysis Symposium. 27–47. https://doi.org/10.1007/978-3-030-32304-2_3 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Yuxiang Lei, Yulei Sui, Shuo Ding, and Qirun Zhang. 2022. Artifact of “Taming Transitive Redundancy for Context-Free Language Reachability”. https://doi.org/10.5281/zenodo.7066401 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yuanbo Li, Qirun Zhang, and Thomas Reps. 2020. Fast graph simplification for interleaved Dyck-reachability. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. 780–793. https://doi.org/10.1145/3492428 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. David Melski and Thomas Reps. 2000. Interconvertibility of a class of set constraints and context-free-language reachability. Theoretical Computer Science, 248, 1-2 (2000), 29–98. https://doi.org/10.1016/S0304-3975(00)00049-9 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Dennis M Moyles and Gerald L Thompson. 1969. An algorithm for finding a minimum equivalent graph of a digraph. Journal of the ACM (JACM), 16, 3 (1969), 455–460. https://doi.org/10.1145/321526.321534 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Nomair A Naeem and Ondrej Lhoták. 2008. Typestate-like analysis of multiple interacting objects. ACM Sigplan Notices, 43, 10 (2008), 347–366. https://doi.org/10.1145/1449955.1449792 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Patrick Nappa, David Zhao, Pavle Subotić, and Bernhard Scholz. 2019. Fast parallel equivalence relations in a datalog compiler. In 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT). 82–96. https://doi.org/10.1109/PACT.2019.00015 Google ScholarGoogle ScholarCross RefCross Ref
  22. Esko Nuutila and Eljas Soisalon-Soininen. 1994. On finding the strongly connected components in a directed graph. Inform. Process. Lett., 49, 1 (1994), 9–14. https://doi.org/10.1016/0020-0190(94)90047-7 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Fernando Magno Quintao Pereira and Daniel Berlin. 2009. Wave propagation and deep propagation for pointer analysis. In 2009 International Symposium on Code Generation and Optimization. 126–135. https://doi.org/10.1109/CGO.2009.9 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jakob Rehof and Manuel Fähndrich. 2001. Type-base flow analysis: from polymorphic subtyping to CFL-reachability. ACM SIGPLAN Notices, 36, 3 (2001), 54–66. https://doi.org/10.1145/360204.360208 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Thomas Reps. 1995. Shape analysis as a generalized path problem. In Proceedings of the 1995 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation. 1–11. https://doi.org/10.1145/215465.215466 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Thomas Reps. 2000. Undecidability of context-sensitive data-dependence analysis. ACM Transactions on Programming Languages and Systems (TOPLAS), 22, 1 (2000), 162–186. https://doi.org/10.1145/345099.345137 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Thomas Reps, Susan Horwitz, and Mooly Sagiv. 1995. Precise interprocedural dataflow analysis via graph reachability. In Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 49–61. https://doi.org/10.1145/199448.199462 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Atanas Rountev and Satish Chandra. 2000. Off-line variable substitution for scaling points-to analysis. Acm Sigplan Notices, 35, 5 (2000), 47–56. https://doi.org/10.1145/358438.349310 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yu Su, Ding Ye, and Jingling Xue. 2014. Parallel pointer analysis with CFL-reachability. In 2014 43rd International Conference on Parallel Processing. 451–460. https://doi.org/10.1109/ICPP.2014.54 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Zhendong Su, Manuel Fähndrich, and Alexander Aiken. 2000. Projection merging: Reducing redundancies in inclusion constraint graphs. In Proceedings of the 27th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 81–95. https://doi.org/10.1145/325694.325706 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Yulei Sui and Jingling Xue. 2016. SVF: Interprocedural Static Value-Flow Analysis in LLVM. In CC ’16. 265–266. https://doi.org/10.1145/2892208.2892235 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Yulei Sui, Ding Ye, and Jingling Xue. 2014. Detecting memory leaks statically with full-sparse value-flow analysis. IEEE Transactions on Software Engineering, 40, 2 (2014), 107–122. https://doi.org/10.1109/TSE.2014.2302311 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Robert Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM journal on computing, 1, 2 (1972), 146–160. https://doi.org/10.1109/SWAT.1971.10 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Haijun Wang, Xiaofei Xie, Yi Li, Cheng Wen, Yuekang Li, Yang Liu, Shengchao Qin, Hongxu Chen, and Yulei Sui. 2020. Typestate-Guided Fuzzer for Discovering Use-after-Free Vulnerabilities. In 42nd International Conference on Software Engineering. https://doi.org/10.1145/3377811.3380386 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing Xu, and Ardalan Amiri Sani. 2017. Graspan: A single-machine disk-based graph system for interprocedural static analyses of large-scale systems code. ACM SIGARCH Computer Architecture News, 45, 1 (2017), 389–404. https://doi.org/10.1145/3093336.3037744 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing Xu, and Ardalan Amiri Sani. 2020. Graspan-cpp. https://github.com/Graspan/graspan-cpp Google ScholarGoogle Scholar
  37. Guoqing Xu, Atanas Rountev, and Manu Sridharan. 2009. Scaling CFL-reachability-based points-to analysis using context-sensitive must-not-alias analysis. In European Conference on Object-Oriented Programming. 98–122. https://doi.org/10.1007/978-3-642-03013-0_6 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Hao Yuan and Patrick Eugster. 2009. An efficient algorithm for solving the dyck-cfl reachability problem on trees. In European Symposium on Programming. 175–189. https://doi.org/10.1007/978-3-642-00590-9_13 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Qirun Zhang, Michael R Lyu, Hao Yuan, and Zhendong Su. 2013. Fast algorithms for Dyck-CFL-reachability with applications to alias analysis. In Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation. 435–446. https://doi.org/10.1145/2491956.2462159 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Xin Zheng and Radu Rugina. 2008. Demand-driven alias analysis for C. In Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 197–208. https://doi.org/10.1145/1328897.1328464 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Taming transitive redundancy for context-free language reachability

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!