Abstract
Inclusion-based set constraint solving is the most popular technique for whole-program points-to analysis whereby an analysis is typically formulated as repeatedly resolving constraints between points-to sets of program variables. The set union operation is central to this process. The number of points-to sets can grow as analyses become more precise and input programs become larger, resulting in more time spent performing unions and more space used storing these points-to sets. Most existing approaches focus on improving scalability of precise points-to analyses from an algorithmic perspective and there has been less research into improving the data structures behind the analyses.
Bit-vectors as one of the more popular data structures have been used in several mainstream analysis frameworks to represent points-to sets. To store memory objects in bit-vectors, objects need to mapped to integral identifiers. We observe that this object-to-identifier mapping is critical for a compact points-to set representation and the set union operation. If objects in the same points-to sets (co-pointees) are not given numerically close identifiers, points-to resolution can cost significantly more space and time. Without data on the unpredictable points-to relations which would be discovered by the analysis, an ideal mapping is extremely challenging.
In this paper, we present a new approach to inclusion-based analysis by compacting points-to sets through object clustering. Inspired by recent staged analysis where an auxiliary analysis produces results approximating a more precise main analysis, we formulate points-to set compaction as an optimisation problem solved by integer programming using constraints generated from the auxiliary analysis’s results in order to produce an effective mapping. We then develop a more approximate mapping, yet much more efficiently, using hierarchical clustering to compact bit-vectors. We also develop an improved representation of bit-vectors (called core bit-vectors) to fully take advantage of the newly produced mapping. Our approach requires no algorithmic change to the points-to analysis. We evaluate our object clustering on flow sensitive points-to analysis using 8 open-source programs (>3.1 million lines of LLVM instructions) and our results show that our approach can successfully improve the analysis with an up to 1.83× speed up and an up to 4.05× reduction in memory usage.
Supplemental Material
- Lars Ole Andersen. 1994. Program Analysis and Specialization for the C Programming Language. Ph.D. Dissertation. University of Copenhagen. Denmark.Google Scholar
- George Balatsouras and Yannis Smaragdakis. 2016. Structure-Sensitive Points-To Analysis for C and C++. In International Static Analysis Symposium (SAS ’16). Springer, Germany. 84–104. https://doi.org/10.1007/978-3-662-53413-7_5 Google Scholar
Cross Ref
- Mohamad Barbar and Yulei Sui. 2021. Compacting Points-To Sets through Object Clustering (Artifact). https://doi.org/10.5281/zenodo.5507442 Google Scholar
Digital Library
- Mohamad Barbar, Yulei Sui, and Shiping Chen. 2020. Flow-Sensitive Type-Based Heap Cloning. In 34th European Conference on Object-Oriented Programming (ECOOP ’18, Vol. 166). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Germany. 24:1–24:26. https://doi.org/10.4230/LIPIcs.ECOOP.2020.24 Google Scholar
Cross Ref
- Mohamad Barbar, Yulei Sui, and Shiping Chen. 2021. Object Versioning for Flow-Sensitive Pointer Analysis. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO ’21). IEEE Computer Society, USA. 222–235. https://doi.org/10.1109/CGO51591.2021.9370334 Google Scholar
Digital Library
- Marc Berndl, Ondrej Lhoták, Feng Qian, Laurie Hendren, and Navindra Umanee. 2003. Points-to Analysis using BDDs. In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation (PLDI ’03). ACM, USA. 103–114. https://doi.org/10.1145/781131.781144 Google Scholar
Digital Library
- Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly Declarative Specification of Sophisticated Points-to Analyses. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’09). ACM, USA. 243–262. https://doi.org/10.1145/1640089.1640108 Google Scholar
Digital Library
- Fred C. Chow, Sun Chan, Shin-Ming Liu, Raymond Lo, and Mark Streich. 1996. Effective Representation of Aliases and Indirect Memory Operations in SSA Form. In Proceedings of the 6th International Conference on Compiler Construction (CC ’96). Springer, Germany. 253–267. https://doi.org/10.1007/3-540-61053-7_66 Google Scholar
Cross Ref
- 2021. crux-bitcode. https://github.com/mbarbar/crux-bitcode Last accessed on 14 September 2021.Google Scholar
- Manuel Fähndrich, Jeffrey S. Foster, Zhendong Su, and Alexander Aiken. 1998. Partial Online Cycle Elimination in Inclusion Constraint Graphs. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation (PLDI ’98). ACM, USA. 85–96. https://doi.org/10.1145/277650.277667 Google Scholar
Digital Library
- Stephen J. Fink, Eran Yahav, Nurit Dor, G. Ramalingam, and Emmanuel Geay. 2008. Effective Typestate Verification in the Presence of Aliasing. ACM Transactions on Software Engineering and Methodology, 17, 2 (2008), Article 9, May, 34 pages. https://doi.org/10.1145/1348250.1348255 Google Scholar
Digital Library
- Ben Hardekopf and Calvin Lin. 2007. The Ant and the Grasshopper: Fast and Accurate Pointer Analysis for Millions of Lines of Code. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’07). ACM, USA. 290–299. https://doi.org/10.1145/1250734.1250767 Google Scholar
Digital Library
- Ben Hardekopf and Calvin Lin. 2007. Exploiting Pointer and Location Equivalence to Optimize Pointer Analysis. In International Static Analysis Symposium (SAS ’07). Springer, Germany. 265–280. https://doi.org/10.1007/978-3-540-74061-2_17 Google Scholar
Cross Ref
- Ben Hardekopf and Calvin Lin. 2009. Semi-Sparse Flow-Sensitive Pointer Analysis. In Proceedings of the 36th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’09). ACM, USA. 226–238. https://doi.org/10.1145/1480881.1480911 Google Scholar
Digital Library
- Ben Hardekopf and Calvin Lin. 2011. Flow-Sensitive Pointer Analysis for Millions of Lines of Code. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO ’11). IEEE Computer Society, USA. 289–298. https://doi.org/10.1109/CGO.2011.5764696 Google Scholar
Cross Ref
- Nevin Heintze and Olivier Tardieu. 2001. Ultra-Fast Aliasing Analysis Using CLA: A Million Lines of C Code in a Second. In Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation (PLDI ’01). ACM, USA. 254––263. https://doi.org/10.1145/378795.378855 Google Scholar
Digital Library
- ISO/IEC. 2017. ISO/IEC 14882:2017 — Programming languages — C++ (fifth ed.). International Organization for Standardization, Switzerland.Google Scholar
- Paul Jaccard. 1912. The Distribution of the Flora in the Alpine Zone. New Phytologist, 11, 2 (1912), 37–50. https://doi.org/10.1111/j.1469-8137.1912.tb05611.x Google Scholar
Cross Ref
- Minseok Jeon, Sehun Jeong, and Hakjoo Oh. 2018. Precise and Scalable Points-to Analysis via Data-Driven Context Tunneling. Proceedings of the ACM on Programming Languages, 2, OOPSLA (2018), Article 140, Oct., 29 pages. https://doi.org/10.1145/3276510 Google Scholar
Digital Library
- Minseok Jeon, Myungho Lee, and Hakjoo Oh. 2020. Learning Graph-Based Heuristics for Pointer Analysis without Handcrafting Application-Specific Features. Proceedings of the ACM on Programming Languages, 4, OOPSLA (2020), Article 179, Nov., 30 pages. https://doi.org/10.1145/3428247 Google Scholar
Digital Library
- Jakub Kuderski, Jorge A Navas, and Arie Gurfinkel. 2019. Unification-based Pointer Analysis without Oversharing. In 2019 Formal Methods in Computer Aided Design (FMCAD ’19). IEEE Computer Society, USA. 37–45. https://doi.org/10.23919/FMCAD.2019.8894275 Google Scholar
Cross Ref
- Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization (CGO ’04). IEEE Computer Society, USA. 75. https://doi.org/10.1109/CGO.2004.1281665 Google Scholar
- Yuxiang Lei and Yulei Sui. 2019. Fast and Precise Handling of Positive Weight Cycles for Field-Sensitive Pointer Analysis. In International Static Analysis Symposium (SAS ’19). Springer, Germany. 27–47. https://doi.org/10.1007/978-3-030-32304-2_3 Google Scholar
Cross Ref
- Ondrej Lhoták and Kwok-Chiang Andrew Chung. 2011. Points-to Analysis with Efficient Strong Updates. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’11). ACM, USA. 3–16. https://doi.org/10.1145/1926385.1926389 Google Scholar
Digital Library
- Ondřej Lhoták and Laurie Hendren. 2003. Scaling Java Points-to Analysis Using SPARK. In Proceedings of the 12th International Conference on Compiler Construction (CC ’03). Springer, Germany. 153–169.Google Scholar
Digital Library
- Bozhen Liu, Jeff Huang, and Lawrence Rauchwerger. 2019. Rethinking Incremental and Parallel Pointer Analysis. ACM Transactions on Programming Languages and Systems, 41, 1 (2019), Article 6, March, 31 pages. https://doi.org/10.1145/3293606 Google Scholar
Digital Library
- Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondřej Lhoták, J. Nelson Amaral, Bor-Yuh Evan Chang, Samuel Z. Guyer, Uday P. Khedker, Anders Møller, and Dimitrios Vardoulakis. 2015. In Defense of Soundiness: A Manifesto. Commun. ACM, 58, 2 (2015), 44–46. https://doi.org/10.1145/2644805 Google Scholar
Digital Library
- 2021. https://llvm.org/doxygen/BitVector_8h_source.html Last accessed on 16 April 2021.Google Scholar
- 2021. https://llvm.org/doxygen/SparseBitVector_8h_source.html Last accessed on 16 April 2021.Google Scholar
- Oded Maimon and Lior Rokach. 2005. Data Mining and Knowledge Discovery Handbook (1st ed.). Springer, USA. isbn:0387098224 https://doi.org/10.1007/b107408 Google Scholar
Cross Ref
- Mario Méndez-Lojo, Augustine Mathew, and Keshav Pingali. 2010. Parallel Inclusion-Based Points-to Analysis. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’10). ACM, USA. 428–443. https://doi.org/10.1145/1869459.1869495 Google Scholar
Digital Library
- Fionn Murtagh. 1983. A Survey of Recent Advances in Hierarchical Clustering Algorithms. Comput. J., 26, 4 (1983), 11, 354–359. https://doi.org/10.1093/comjnl/26.4.354 Google Scholar
Cross Ref
- Daniel Müllner. 2013. fastcluster: Fast Hierarchical, Agglomerative Clustering Routines for R and Python. Journal of Statistical Software, Articles, 53, 9 (2013), 1–18. https://doi.org/10.18637/jss.v053.i09 Google Scholar
Cross Ref
- Hakjoo Oh, Kihong Heo, Wonchan Lee, Woosuk Lee, and Kwangkeun Yi. 2012. Design and Implementation of Sparse Global Analyses for C-like Languages. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’12). ACM, USA. 229––238. https://doi.org/10.1145/2254064.2254092 Google Scholar
Digital Library
- David J Pearce, Paul HJ Kelly, and Chris Hankin. 2003. Online cycle detection and difference propagation for pointer analysis. In Proceedings of the Third IEEE International Workshop on Source Code Analysis and Manipulation. IEEE Computer Society, USA. 3–12. https://doi.org/10.1109/SCAM.2003.1238026 Google Scholar
Cross Ref
- David J. Pearce, Paul H.J. Kelly, and Chris Hankin. 2007. Efficient Field-Sensitive Pointer Analysis of C. ACM Transactions on Programming Languages and Systems, 30, 1 (2007), Nov., 4:1–4:42. https://doi.org/10.1145/1290520.1290524 Google Scholar
Digital Library
- Fernando Magno Quintao Pereira and Daniel Berlin. 2009. Wave Propagation and Deep Propagation for Pointer Analysis. In Proceedings of the 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO ’09). IEEE Computer Society, USA. 126–135. https://doi.org/10.1109/CGO.2009.9 Google Scholar
Digital Library
- Atanas Rountev and Satish Chandra. 2000. Off-Line Variable Substitution for Scaling Points-to Analysis. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation (PLDI ’00). ACM, USA. 47––56. https://doi.org/10.1145/349299.349310 Google Scholar
Digital Library
- Philipp Dominik Schubert, Ben Hermann, and Eric Bodden. 2019. PhASAR: an inter-procedural static analysis framework for C/C++. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS ’19). Springer, Germany. 393–410. https://doi.org/10.1007/978-3-030-17465-1_22 Google Scholar
Cross Ref
- Yannis Smaragdakis, George Balatsouras, and George Kastrinis. 2013. Set-Based Pre-Processing for Points-to Analysis. In Proceedings of the 28th Annual ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA ’13). ACM, USA. 253––270. https://doi.org/10.1145/2509136.2509524 Google Scholar
Digital Library
- Yannis Smaragdakis, Martin Bravenboer, and Ondrej Lhoták. 2011. Pick Your Contexts Well: Understanding Object-Sensitivity. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’11). ACM, USA. 17–30. https://doi.org/10.1145/1926385.1926390 Google Scholar
Digital Library
- Yulei Sui and Jingling Xue. 2016. On-Demand Strong Update Analysis via Value-Flow Refinement. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE ’16). ACM, USA. 460–473. https://doi.org/10.1145/2950290.2950296 Google Scholar
Digital Library
- Yulei Sui and Jingling Xue. 2016. SVF: Interprocedural Static Value-Flow Analysis in LLVM. In Proceedings of the 25th International Conference on Compiler Construction (CC 2016). ACM, USA. 265–266. https://doi.org/10.1145/2892208.2892235 Google Scholar
Digital Library
- Yulei Sui, Ding Ye, and Jingling Xue. 2012. Static Memory Leak Detection Using Full-Sparse Value-Flow Analysis. In Proceedings of the 2012 International Symposium on Software Testing and Analysis (ISSTA ’12). ACM, USA. 254–264. https://doi.org/10.1145/2338965.2336784 Google Scholar
Digital Library
- André Tavares, Benoit Boissinot, Fernando Pereira, and Fabrice Rastello. 2014. Parameterized Construction of Program Representations for Sparse Dataflow Analyses. In Proceedings of the 23rd International Conference on Compiler Construction. Springer, Germany. 18–39. https://doi.org/10.1007/978-3-642-54807-9_2 Google Scholar
Cross Ref
- Hamid A Toussi and Ahmed Khademzadeh. 2013. Improving Bit-Vector Representation of Points-To Sets Using Class Hierarchy. International Journal of Computer Theory and Engineering, 5, 3 (2013), 494–499. https://doi.org/10.7763/IJCTE.2013.V5.736 Google Scholar
Cross Ref
- 2021. The T. J. Watson Libraries for Analysis (WALA). http://wala.sf.net/ Last accessed on 14 September 2021.Google Scholar
- John Whaley. 2007. Context-Sensitive Pointer Analysis Using Binary Decision Diagrams. Ph.D. Dissertation. Stanford University. USA.Google Scholar
Digital Library
- 2021. Whole Program LLVM in Go. https://github.com/SRI-CSL/gllvm Last accessed on 14 September 2021.Google Scholar
- Jianwen Zhu and Silvian Calman. 2004. Symbolic Pointer Analysis Revisited. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (PLDI ’04). ACM, USA. 145–157. https://doi.org/10.1145/996841.996860 Google Scholar
Digital Library
Index Terms
Compacting points-to sets through object clustering
Recommendations
Dynamic points-to sets: a comparison with static analyses and potential applications in program understanding and optimization
PASTE '01: Proceedings of the 2001 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineeringIn this paper, we compare the behavior of pointers in C programs, as approximated by static pointer analysis algorithms, with the actual behavior of pointers when these programs are run. In order to perform this comparison, we have implemented several ...
Parameterized object sensitivity for points-to and side-effect analyses for Java
The goal of points-to analysis for Java is to determine the set of objects pointed to by a reference variable or a reference objet field. Improving the precision of practical points-to analysis is important because points-to information has a wide ...
Parameterized object sensitivity for points-to analysis for Java
The goal of points-to analysis for Java is to determine the set of objects pointed to by a reference variable or a reference object field. We present object sensitivity, a new form of context sensitivity for flow-insensitive points-to analysis for Java. ...






Comments