skip to main content
research-article
Free Access

Adaptive Static Analysis via Learning with Bayesian Optimization

Published:16 November 2018Publication History
Skip Abstract Section

Abstract

Building a cost-effective static analyzer for real-world programs is still regarded an art. One key contributor to this grim reputation is the difficulty in balancing the cost and the precision of an analyzer. An ideal analyzer should be adaptive to a given analysis task and avoid using techniques that unnecessarily improve precision and increase analysis cost. However, achieving this ideal is highly nontrivial, and it requires a large amount of engineering efforts.

In this article, we present a new learning-based approach for adaptive static analysis. In our approach, the analysis includes a sophisticated parameterized strategy that decides, for each part of a given program, whether to apply a precision-improving technique to that part or not. We present a method for learning a good parameter for such a strategy from an existing codebase via Bayesian optimization. The learnt strategy is then used for new, unseen programs. Using our approach, we developed partially flow- and context-sensitive variants of a realistic C static analyzer. The experimental results demonstrate that using Bayesian optimization is crucial for learning from an existing codebase. Also, they show that among all program queries that require flow- or context-sensitivity, our partially flow- and context-sensitive analysis answers 75% of them, while increasing the analysis cost only by 3.3× of the baseline flow- and context-insensitive analysis, rather than 40× or more of the fully sensitive version.

References

  1. T. Ball and S. Rajamani. 2002. The SLAM project: Debugging system software via static analysis. In Proceedings of the POPL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Nels E. Beckman and Aditya V. Nori. 2011. Probabilistic, modular and scalable inference of typestate specifications. In Proceedings of the PLDI. 211--221. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Eric Brochu, Vlad M. Cora, and Nando de Freitas. 2010. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. CoRR abs/1012.2599. Retrieved from http://arxiv.org/abs/1012.2599.Google ScholarGoogle Scholar
  4. S. Chaki, E. Clarke, A. Groce, S. Jha, and H. Veith. 2003. Modular verification of software components in C. In Proceedings of the ICSE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Patrick Cousot and Radhia Cousot. 1977. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Proceedings of the POPL. 238--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Dai. 1996. Convergence properties of ordinal comparison in the simulation of discrete event dynamic systems. J. Optim. Theory Appl. 91, 2 (1996), 363--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Grebenshchikov, A. Gupta, N. Lopes, C. Popeea, and A. Rybalchenko. 2012. HSF(C): A software verifier based on horn clauses. In Proceedings of the TACAS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Gulavani, S. Chakraborty, A. Nori, and S. Rajamani. 2008. Automatically refining abstract interpretations. In Proceedings of the TACAS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ashutosh Gupta, Rupak Majumdar, and Andrey Rybalchenko. 2013. From tests to proofs. STTT 15, 4 (2013), 291--303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. Henzinger, R. Jhala, R. Majumdar, and K. McMillan. 2004. Abstractions from proofs. In Proceedings of the POPL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Henzinger, R. Jhala, R. Majumdar, and G. Sutre. 2003. Software verification with BLAST. In Proceedings of the SPIN Workshop on Model Checking of Software. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yu-Chi Ho. 1999. An explanation of ordinal optimization: Soft computing for hard problems. Info. Sci. 113, 34 (1999), 169--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2011. Sequential model-based optimization for general algorithm configuration. In Proceedings of the LION. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ted Kremenek, Andrew Y. Ng, and Dawson R. Engler. 2007. A factor graph model for software bug finding. In Proceedings of the IJCAI. 2510--2516. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Percy Liang, Omer Tripp, and Mayur Naik. 2011. Learning minimal abstractions. In Proceedings of the POPL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. V. Benjamin Livshits, Aditya V. Nori, Sriram K. Rajamani, and Anindya Banerjee. 2009. Merlin: Specification inference for explicit information flow problems. In Proceedings of the PLDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Alon Mishne, Sharon Shoham, and Eran Yahav. 2012. Typestate-based semantic code search over partial programs. In Proceedings of the OOPSLA. 997--1016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jonas Mockus. 1994. Application of Bayesian approach to numerical methods of global and stochastic optimization. J. Global Optim. 4, 4 (1994).Google ScholarGoogle ScholarCross RefCross Ref
  19. Mayur Naik, Hongseok Yang, Ghila Castelnuovo, and Mooly Sagiv. 2012. Abstractions from tests. In Proceedings of the POPL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. George C. Necula, Scott McPeak, Shree Prakash Rahul, and Westley Weimer. 2002. CIL: Intermediate language and tools for analysis and transformation of C programs. In Proceedings of theCC. Springer-Verlag, London, 213--228. Retrieved from http://dl.acm.org/citation.cfm?id=647478.727796. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Aditya V. Nori and Rahul Sharma. 2013. Termination proofs from tests. In Proceedings of the FSE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Hakjoo Oh, Wonchan Lee, Kihong Heo, Hongseok Yang, and Kwangkeun Yi. 2014. Selective context-sensitivity guided by impact pre-analysis. In Proceedings of the PLDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hakjoo Oh, Kihong Heo, Wonchan Lee, Woosuk Lee, and Kwangkeun Yi. 2012. Design and implementation of sparse global analyses for C-like languages. In Proceedings of the PLDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Hakjoo Oh, Kihong Heo, Wonchan Lee, Woosuk Lee, and Kwangkeun Yi. 2014. Sparrow. Retrieved from http://ropas.snu.ac.kr/sparrow.Google ScholarGoogle Scholar
  25. Hakjoo Oh, Hongseok Yang, and Kwangkeun Yi. 2015. Learning a strategy for adapting a program analysis via Bayesian optimisation. In Proceedings of the OOPSLA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Carl Edward Rasmussen and Christopher K. I. Williams. 2005. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting program properties from “big code.” In Proceedings of the POPL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Code completion with statistical language models. In Proceedings of the PLDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Sriram Sankaranarayanan, Swarat Chaudhuri, Franjo Ivancic, and Aarti Gupta. 2008. Dynamic inference of likely data preconditions over predicates by tree learning. In Proceedings of the ISSTA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sriram Sankaranarayanan, Franjo Ivancic, and Aarti Gupta. 2008. Mining library specifications using inductive logic programming. In Proceedings of the ICSE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Eric Schkufza, Rahul Sharma, and Alex Aiken. 2014. Stochastic optimization of floating-point programs with tunable precision. In Proceedings of thePLDI. ACM, New York, NY, 53--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Rahul Sharma, Saurabh Gupta, Bharath Hariharan, Alex Aiken, Percy Liang, and Aditya V. Nori. 2013. A data driven approach for algebraic loop invariants. In Proceedings of the ESOP. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Rahul Sharma, Saurabh Gupta, Bharath Hariharan, Alex Aiken, and Aditya V. Nori. 2013. Verification as learning geometric concepts. In Proceedings of the SAS.Google ScholarGoogle Scholar
  34. Rahul Sharma, Aditya V. Nori, and Alex Aiken. 2012. Interpolants as classifiers. In Proceedings of the CAV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Yannis Smaragdakis, George Kastrinis, and George Balatsouras. 2014. Introspective analysis: Context-sensitivity, across the board. In Proceedings of the PLDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. 2012. Practical Bayesian optimization of machine-learning algorithms. In Proceedings of the NIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Rainer Storn and Kenneth Price. 1997. Differential evolution—A simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11 (1997), 341--359. Issue 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. David J. Wales and Jonathan P. K. Doye. 1997. Global optimization by basin-hopping and the lowest energy structures of lennard-jones clusters containing up to 110 atoms. J. Phys. Chem. A 101, 28 (1997), 5111--5116.Google ScholarGoogle ScholarCross RefCross Ref
  39. Xin Zhang, Ravi Mangal, Radu Grigore, Mayur Naik, and Hongseok Yang. 2014. On abstraction refinement for program analyses in datalog. In Proceedings of the PLDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Xin Zhang, Mayur Naik, and Hongseok Yang. 2013. Finding optimum abstractions in parametric dataflow analysis. In Proceedings of the PLDI. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Adaptive Static Analysis via Learning with Bayesian Optimization

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Programming Languages and Systems
        ACM Transactions on Programming Languages and Systems  Volume 40, Issue 4
        December 2018
        191 pages
        ISSN:0164-0925
        EISSN:1558-4593
        DOI:10.1145/3292525
        Issue’s Table of Contents

        Copyright © 2018 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 16 November 2018
        • Accepted: 1 June 2017
        • Revised: 1 April 2017
        • Received: 1 March 2016
        Published in toplas Volume 40, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!