Abstract
The core challenge in designing an effective static program analysis is to find a good program abstraction -- one that retains only details relevant to a given query. In this paper, we present a new approach for automatically finding such an abstraction. Our approach uses a pessimistic strategy, which can optionally use guidance from a probabilistic model. Our approach applies to parametric static analyses implemented in Datalog, and is based on counterexample-guided abstraction refinement. For each untried abstraction, our probabilistic model provides a probability of success, while the size of the abstraction provides an estimate of its cost in terms of analysis time. Combining these two metrics, probability and cost, our refinement algorithm picks an optimal abstraction. Our probabilistic model is a variant of the Erdos--Renyi random graph model, and it is tunable by what we call hyperparameters. We present a method to learn good values for these hyperparameters, by observing past runs of the analysis on an existing codebase. We evaluate our approach on an object sensitive pointer analysis for Java programs, with two client analyses (PolySite and Downcast).
- Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles A. Sutton. Learning natural coding conventions. In FSE, 2014. Google Scholar
Digital Library
- Miltiadis Allamanis, Daniel Tarlow, Andrew D. Gordon, and Yi Wei. Bimodal modelling of source code and natural language. In ICML, 2015.Google Scholar
Digital Library
- Carlos Ans´otegui, Maria Luisa Bonet, and Jordi Levy. SAT-based MaxSAT algorithms. Artif. Intell., 196:77–105, 2013. Google Scholar
Digital Library
- T. Ball and S. Rajamani. The SLAM project: Debugging system software via static analysis. In POPL, 2002. Google Scholar
Digital Library
- Nels E. Beckman and Aditya V. Nori. Probabilistic, modular and scalable inference of typestate specifications. In PLDI, 2011. Google Scholar
Digital Library
- M. Bravenboer and Y. Smaragdakis. Strictly declarative specification of sophisticated points-to analyses. In OOPSLA, 2009. Google Scholar
Digital Library
- S. Chaki, E. Clarke, A. Groce, S. Jha, and H. Veith. Modular verification of software components in C. In ICSE, 2003. Google Scholar
Digital Library
- E. Clarke, O. Grumberg, S. Jha, Y. Lu, and H. Veith. Counterexampleguided abstraction refinement for symbolic model checking. JACM, 50(5), 2003. Google Scholar
Digital Library
- E. Clarke, A. Gupta, J. Kukula, and O. Shrichman. SAT based abstraction-refinement using ILP and machine learning techniques. In CAV, 2002. Google Scholar
Digital Library
- Patrick Cousot and Radhia Cousot. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In POPL, 1977. Google Scholar
Digital Library
- Paul Erd˝os and Alfréd Rényi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 1960.Google Scholar
- Daan Fierens, Guy Van den Broeck, Joris Renkens, Dimitar Sht. Shterionov, Bernd Gutmann, Ingo Thon, Gerda Janssens, and Luc De Raedt. Inference and learning in probabilistic logic programs using weighted boolean formulas. To appear in Theory and Practice of Logic Programming, 2013.Google Scholar
- Lise Getoor and Ben Taskar, editors. Introduction to Statistical Relational Learning. The MIT Press, 2007. Google Scholar
Digital Library
- Roberto Giacobazzi, Francesco Logozzo, and Francesco Ranzato. Analyzing program analyses. In POPL, 2015. Google Scholar
Digital Library
- Roberto Giacobazzi, Francesco Ranzato, and Francesca Scozzari. Making abstract interpretations complete. J. ACM, 2000. Google Scholar
Digital Library
- S. Grebenshchikov, A. Gupta, N. Lopes, C. Popeea, and A. Rybalchenko. HSF(C): A software verifier based on Horn clauses. In TACAS, 2012. Google Scholar
Digital Library
- Sergey Grebenshchikov, Nuno P. Lopes, Corneliu Popeea, and Andrey Rybalchenko. Synthesizing software verifiers from proof rules. In PLDI, 2012. Google Scholar
Digital Library
- Anubhav Gupta. Learning Abstractions for Model Checking. PhD thesis, Carnegie Mellon, 2006. Google Scholar
Digital Library
- Bernd Gutmann, Ingo Thon, and Luc De Raedt. Learning the parameters of probabilistic logic programs from interpretations. In ECML PKDD, 2011. Google Scholar
Digital Library
- T. Henzinger, R. Jhala, R. Majumdar, and K. McMillan. Abstractions from proofs. In POPL, 2004. Google Scholar
Digital Library
- Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar T. Devanbu. On the naturalness of software. In ICSE, 2012. Google Scholar
Digital Library
- Mikoláˇs Janota. MiFuMax — a literate MaxSat solver. http: //sat.inesc-id.pt/~mikolas/sw/mifumax/, 2013.Google Scholar
- Eric Jones, Travis Oliphant, Pearu Peterson, et al. SciPy: Open source scientific tools for Python, 2001–. {Online; accessed 2015-07-06}.Google Scholar
- Michael I. Jordan, Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul. An introduction to variational methods for graphical models. Machine Learning, 1999. Google Scholar
Digital Library
- Svetoslav Karaivanov, Veselin Raychev, and Martin T. Vechev. Phrasebased statistical translation of programming languages. In Onward!, 2014. Google Scholar
Digital Library
- Richard M. Karp. Reducibility among combinatorial problems. In Complexity of Computer Computations, 1972.Google Scholar
Cross Ref
- Ted Kremenek, Andrew Y. Ng, and Dawson R. Engler. A factor graph model for software bug finding. In IJCAI, 2007. Google Scholar
Digital Library
- Joohyung Lee. A model-theoretic counterpart of loop formulas. In IJCAI, 2005. Google Scholar
Digital Library
- Percy Liang and Mayur Naik. Scaling abstraction refinement via pruning. In PLDI, 2011. Google Scholar
Digital Library
- Ben Liblit, Alexander Aiken, Alice X. Zheng, and Michael I. Jordan. Bug isolation via remote program sampling. In PLDI, 2003. Google Scholar
Digital Library
- Ben Liblit, Mayur Naik, Alice X. Zheng, Alexander Aiken, and Michael I. Jordan. Scalable statistical bug isolation. In PLDI, 2005. Google Scholar
Digital Library
- Vladimir Lifschitz and Alexander A. Razborov. Why are there so many loop formulas? ACM Trans. Comput. Log., 7(2):261–268, 2006. Google Scholar
Digital Library
- V. Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani, and Anindya Banerjee. Merlin: specification inference for explicit information flow problems. In PLDI, 2009. Google Scholar
Digital Library
- Alexey Loginov, Thomas W. Reps, and Shmuel Sagiv. Abstraction refinement via inductive learning. In CAV, 2005. Google Scholar
Digital Library
- Chris J. Maddison and Daniel Tarlow. Structured generative models of natural source code. In ICML, 2014.Google Scholar
- Jo˜ao Marques-Silva, Federico Heras, Mikolás Janota, Alessandro Previti, and Anton Belov. On computing minimal correction subsets. In IJCAI, 2013.Google Scholar
Digital Library
- Alon Mishne, Sharon Shoham, and Eran Yahav. Typestate-based semantic code search over partial programs. In OOPSLA, 2012. Google Scholar
Digital Library
- Ant´onio Morgado, Federico Heras, Mark H. Liffiton, Jordi Planes, and Jo˜ao Marques-Silva. Iterative and core-guided MaxSAT solving: A survey and assessment. Constraints, 2013. Google Scholar
Digital Library
- Mayur Naik. jChord. https://bitbucket.org/pag-lab/jchord.Google Scholar
- Mayur Naik, Hongseok Yang, Ghila Castelnuovo, and Mooly Sagiv. Abstractions from tests. In POPL, 2012. Google Scholar
Digital Library
- Aditya V. Nori and Rahul Sharma. Termination proofs from tests. In FSE, 2013. Google Scholar
Digital Library
- Hakjoo Oh, Wonchan Lee, Kihong Heo, Hongseok Yang, and Kwangkeun Yi. Selective context-sensitivity guided by impact preanalysis. In PLDI, 2014. Google Scholar
Digital Library
- Veselin Raychev, Martin T. Vechev, and Andreas Krause. Predicting program properties from “Big Code”. In POPL, 2015. Google Scholar
Digital Library
- Veselin Raychev, Martin T. Vechev, and Eran Yahav. Code completion with statistical language models. In PLDI, 2014. Google Scholar
Digital Library
- Andrey Rybalchenko and Viorica Sofronie-Stokkermans. Constraint solving for interpolation. In VMCAI, 2007. Google Scholar
Digital Library
- Sriram Sankaranarayanan, Franjo Ivancic, and Aarti Gupta. Mining library specifications using inductive logic programming. In ICSE, 2008. Google Scholar
Digital Library
- Taisuke Sato. A statistical learning method for logic programs with distribution semantics. In ICLP, 1995.Google Scholar
- Rahul Sharma, Aditya V. Nori, and Alex Aiken. Interpolants as classifiers. In CAV, 2012. Google Scholar
Digital Library
- Y. Smaragdakis and M. Bravenboer. Using Datalog for fast and easy program analysis. In Datalog 2.0 Workshop, 2010. Google Scholar
Digital Library
- Yannis Smaragdakis, George Kastrinis, and George Balatsouras. Introspective analysis: context-sensitivity, across the board. In PLDI, 2014. Google Scholar
Digital Library
- Tachio Terauchi. Explaining the effectiveness of small refinement heuristics in program verification with CEGAR. In SAS, 2015.Google Scholar
Cross Ref
- Martin J. Wainwright and Michael I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 2008. Google Scholar
Digital Library
- J. Whaley, D. Avots, M. Carbin, and M. Lam. Using Datalog with binary decision diagrams for program analysis. In APLAS, 2005. Google Scholar
Digital Library
- Kwangkeun Yi, Hosik Choi, Jaehwang Kim, and Yongdai Kim. An empirical study on classification methods for alarms from a bug-finding static C analyzer. Inf. Process. Lett., 2007. Google Scholar
Digital Library
- X. Zhang, R. Mangal, R. Grigore, M. Naik, and H. Yang. On abstraction refinement for program analyses in Datalog. In PLDI, 2014. Google Scholar
Digital Library
- Xin Zhang, Mayur Naik, and Hongseok Yang. Finding optimum abstractions in parametric dataflow analysis. In PLDI, 2013. Google Scholar
Digital Library
- Alice X. Zheng, Michael I. Jordan, Ben Liblit, Mayur Naik, and Alex Aiken. Statistical debugging: simultaneous identification of multiple bugs. In ICML, 2006. Google Scholar
Digital Library
Index Terms
Abstraction refinement guided by a learnt probabilistic model
Recommendations
Abstraction refinement guided by a learnt probabilistic model
POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming LanguagesThe core challenge in designing an effective static program analysis is to find a good program abstraction -- one that retains only details relevant to a given query. In this paper, we present a new approach for automatically finding such an ...
Probabilistic points-to analysis for java
CC'11/ETAPS'11: Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of softwareProbabilistic points-to analysis is an analysis technique for defining the probabilities on the points-to relations in programs. It provides the compiler with some optimization chances such as speculative dead store elimination, speculative redundancy ...
Counterexample-Guided Abstraction Refinement for Component-Based Systems
COMPSAC '14: Proceedings of the 2014 IEEE 38th Annual Computer Software and Applications ConferenceCompositional verification of invariants is a technique for alleviating the state explosion problem in component-based verification. The efficiency of these methods depends on abstraction, which leads to verification incompleteness. In this paper, we ...






Comments