Abstract
Computing a precise points-to analysis for very large Java programs remains challenging despite the large body of research on points-to analysis. Any approach must solve an underlying dynamic graph reachability problem, for which the best algorithms have near-cubic worst-case runtime complexity, and, hence, previous work does not scale to programs with millions of lines of code. In this work, we present a novel approach for solving the field-sensitive points-to problem for Java with the means of (1) a transitive-closure data-structure, and (2) a pre-computed set of potentially matching load/store pairs to accelerate the fix-point calculation. Experimentation on Java benchmarks validates the superior performance of our approach over the standard context-free language reachability implementations. Our approach computes a points-to index for the OpenJDK with over 1.5 billion tuples in under a minute.
Supplemental Material
Available for Download
Source code and datasets used in our experimental evaluation. This is a modified version of the artefact submitted for evaluation by the review committee. Proprietary software and datasets have been removed.
- R. Agrawal, A. Borgida, and H. V. Jagadish. Efficient management of transitive relationships in large data and knowledge bases. In Proceedings SIGMOD’89. ACM, 1989. Google Scholar
Digital Library
- L. O. Andersen. Program analysis and specialization for the C programming language. PhD thesis, University of Copenhagen, 1994.Google Scholar
- V. Arlazarov, E. Dinic, M. Kronrod, and I. Faradzev. On economic construction of the transitive closure of a directed graph. Soviet Math. Dokl., 11:1209–1210, 1970.Google Scholar
- S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovi´c, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The dacapo benchmarks: Java benchmarking development and analysis. In Proceedings OOPSLA’06. ACM, 2006. Google Scholar
Digital Library
- E. Bodden, A. Sewe, J. Sinschek, M. Mezini, and H. Oueslati. Taming reflection: Aiding static analysis in the presence of reflection and custom class loaders. In Proceeding ICSE ’11. ACM, 2011. Google Scholar
Digital Library
- R. Chatterjee, B. G. Ryder, and W. A. Landi. Relevant context inference. In Proceedings POPL’99. ACM, 1999. Google Scholar
Digital Library
- S. Chaudhuri. Subcubic algorithms for recursive state machines. In Proceedings POPL’08. ACM, 2008. Google Scholar
Digital Library
- E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. SIAM Journal on Computing, 32(5):1338–1355, 2003. Google Scholar
Digital Library
- A. Colantonio and R. Di Pietro. Concise: Compressed ncomposable integer set. Information Processing Letters, 110(16): 644–650, 2010. Google Scholar
Digital Library
- D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progressions. Journal of Symbolic Computation, 9 (3):251–280, 1990. Google Scholar
Digital Library
- F. L. Gall. Powers of tensors and fast matrix multiplication. arXiv preprint arXiv:1401.7714, 2014.Google Scholar
- D. Grove and C. Chambers. A framework for call graph construction algorithms. ACM Transactions on Programming Languages and Systems (TOPLAS), 23(6):685–746, 2001. Google Scholar
Digital Library
- H. Jagadish. A compression technique to materialize transitive closure. ACM Transactions on Database Systems (TODS), 15 (4):558–598, 1990. Google Scholar
Digital Library
- R. Jin, Y. Xiang, N. Ruan, and H. Wang. Efficiently answering reachability queries on very large directed graphs. In Proceedings SIGMOD’2008. ACM, 2008. Google Scholar
Digital Library
- R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-hop: a highcompression indexing scheme for reachability query. In Proceedings SIGMOD’2009. ACM, 2009. Google Scholar
Digital Library
- O. Lhoták. Spark: A flexible points-to analysis framework for java, 2002.Google Scholar
- O. Lhoták and L. Hendren. Evaluating the benefits of context-sensitive points-to analysis using a bdd-based implementation. ACM Transactions on Software Engineering and Methodology (TOSEM), 18(1):3, 2008. Google Scholar
Digital Library
- V. B. Livshits and M. S. Lam. Finding security vulnerabilities in java applications with static analysis. In Proceedings USENIX’05, pages 18–18, Berkeley, CA, USA, 2005. USENIX Association. Google Scholar
Digital Library
- D. Melski and T. Reps. Interconvertibility of a class of set constraints and context-free-language reachability. Theoretical Computer Science, 248(1):29–98, 2000. Google Scholar
Digital Library
- A. Milanova, A. Rountev, and B. G. Ryder. Parameterized object sensitivity for points-to and side-effect analyses for java. In ACM SIGSOFT Software Engineering Notes, volume 27, pages 1–11. ACM, 2002. Google Scholar
Digital Library
- E. Nuutila. Efficient transitive closure computation in large digraphs. PhD thesis, PhD thesis, Helsinki University of Technology, 1995. Acta Polytechnica Scandinavica, Mathematics and Computing in Engineering Series, 1995. Google Scholar
Digital Library
- D. J. Pearce, P. H. J. Kelly, and C. Hankin. Online cycle detection and difference propagation: Applications to pointer analysis. volume 12, pages 311–337. Kluwer, Dec. 2004.. Google Scholar
Digital Library
- T. Reps. Program analysis via graph reachability. Information and Software Technology, 40(11):701–726, 1998.Google Scholar
Cross Ref
- B. G. Ryder. Dimensions of precision in reference analysis of object-oriented programming languages. In Proceedings CC’03. Springer, 2003. Google Scholar
Digital Library
- L. Shang, X. Xie, and J. Xue. On-demand dynamic summarybased points-to analysis. In Proceedings CGO’12. ACM, 2012. Google Scholar
Digital Library
- M. Sharir and A. Pnueli. Two approaches to interprocedural data flow analysis. New York University, 1978.Google Scholar
- O. Shivers. Control-flow analysis of higher-order languages. PhD thesis, Carnegie Mellon University Pittsburgh, 1991. Google Scholar
Digital Library
- Y. Smaragdakis and M. Bravenboer. Using datalog for fast and easy program analysis. In O. de Moor, G. Gottlob, T. Furche, and A. Sellers, editors, Datalog Reloaded, volume 6702 of LNCS, pages 245–251. Springer Berlin Heidelberg, 2011. ISBN 978-3-642-24205-2.. Google Scholar
Digital Library
- Y. Smaragdakis, M. Bravenboer, and O. Lhoták. Pick your contexts well: understanding object-sensitivity. ACM SIGPLAN Notices, 46(1):17–30, 2011. Google Scholar
Digital Library
- Y. Smaragdakis, G. Kastrinis, and G. Balatsouras. Introspective analysis: context-sensitivity, across the board. In Proceedings PLDI’2014. ACM, 2014. Google Scholar
Digital Library
- M. Sridharan and R. Bod´ık. Refinement-based contextsensitive points-to analysis for java. In Proceedings PLDI’06. ACM, 2006. Google Scholar
Digital Library
- M. Sridharan and S. J. Fink. The complexity of Andersens analysis in practice. In Static Analysis, pages 205–221. Springer, 2009. Google Scholar
Digital Library
- M. Sridharan, D. Gopan, L. Shan, and R. Bod´ık. Demanddriven points-to analysis for java. In Proceedings OOPSLA’05. ACM, 2005. Google Scholar
Digital Library
- B. Steensgaard. Points-to analysis in almost linear time. In Proceedings POPL ’96. ACM, 1996. Google Scholar
Digital Library
- V. Strassen. Gaussian elimination is not optimal. Numerische Mathematik, 13(4):354–356, 1969. Google Scholar
Digital Library
- R. Tarjan. Depth-first search and linear graph algorithms. SIAM journal on computing, 1(2):146–160, 1972.Google Scholar
- E. Tempero. How fields are used in Java: An empirical study. In Proceedings ASWEC’09. IEEE, 2009. Google Scholar
Digital Library
- F. Tip and J. Palsberg. Scalable propagation-based call graph construction algorithms. In Proceedings OOPSLA’00. ACM, 2000. Google Scholar
Digital Library
- R. Vallée-Rai, P. Co, E. Gagnon, L. Hendren, P. Lam, and V. Sundaresan. Soot - a java bytecode optimization framework. In Proceedings CASCON ’99. IBM Press, 1999. Google Scholar
Digital Library
- S. J. van Schaik and O. de Moor. A memory efficient reachability data structure through bit vector compression. In Proceedings SIGMOD’11. ACM, 2011. Google Scholar
Digital Library
- J. Whaley and M. S. Lam. An efficient inclusion-based pointsto analysis for strictly-typed languages. In Static Analysis, pages 180–195. Springer, 2002. Google Scholar
Digital Library
- K. Wu, E. J. Otoo, and A. Shoshani. Optimizing bitmap indices with efficient compression. ACM Transactions on Database Systems (TODS), 31(1):1–38, 2006. Google Scholar
Digital Library
- G. Xu and A. Rountev. Merging equivalent contexts for scalable heap-cloning-based context-sensitive points-to analysis. In Proceedings ISSTA ’08. ACM, 2008. Google Scholar
Digital Library
- G. Xu, A. Rountev, and M. Sridharan. Scaling cflreachability-based points-to analysis using context-sensitive must-not-alias analysis. In Proceedings ECOOP’09. Springer, 2009. Google Scholar
Digital Library
- D. Yan, G. Xu, and A. Rountev. Demand-driven contextsensitive alias analysis for java. In Proceedings ISSTA’11. ACM, 2011. Google Scholar
Digital Library
- M. Yannakakis. Graph-theoretic methods in database theory. In Proceedings PODS’90. ACM, 1990. Google Scholar
Digital Library
- H. Yildirim, V. Chaoji, and M. J. Zaki. Grail: Scalable reachability index for large graphs. Proceedings of the VLDB Endowment, 3(1-2):276–284, 2010. Google Scholar
Digital Library
- Q. Zhang, M. R. Lyu, H. Yuan, and Z. Su. Fast algorithms for dyck-cfl-reachability with applications to alias analysis. In Proceedings PLDI’13. ACM, 2013. Google Scholar
Digital Library
Index Terms
Giga-scale exhaustive points-to analysis for Java in under a minute
Recommendations
Giga-scale exhaustive points-to analysis for Java in under a minute
OOPSLA 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsComputing a precise points-to analysis for very large Java programs remains challenging despite the large body of research on points-to analysis. Any approach must solve an underlying dynamic graph reachability problem, for which the best algorithms ...
An Ahead-of-time Yet Context-Sensitive Points-to Analysis for Java
Points-to analysis is a prerequisite of program verification and static analysis on Java programs. It is known that call graph is typically constructed on-the-fly when points-to analysis proceeds for a better precision. In this work, we propose an ahead-...
Probabilistic points-to analysis for java
CC'11/ETAPS'11: Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of softwareProbabilistic points-to analysis is an analysis technique for defining the probabilities on the points-to relations in programs. It provides the compiler with some optimization chances such as speculative dead store elimination, speculative redundancy ...






Comments