Abstract
It is often the case that increasing the precision of a program analysis leads to worse results. It is our thesis that this phenomenon is the result of fundamental limits on the ability to use precise abstract domains as the basis for inferring strong invariants of programs. We show that bias-variance tradeoffs, an idea from learning theory, can be used to explain why more precise abstractions do not necessarily lead to better results and also provides practical techniques for coping with such limitations. Learning theory captures precision using a combinatorial quantity called the VC dimension. We compute the VC dimension for different abstractions and report on its usefulness as a precision metric for program analyses. We evaluate cross validation, a technique for addressing bias-variance tradeoffs, on an industrial strength program verification tool called YOGI. The tool produced using cross validation has significantly better running time, finds new defects, and has fewer time-outs than the current production version. Finally, we make some recommendations for tackling bias-variance tradeoffs in program analysis.
Supplemental Material
- G. Amato, M. Parton, and F. Scozzari. Discovering invariants via simple component analysis. J. Symb. Comput., 47(12):1533--1560, 2012. Google Scholar
Digital Library
- S. Arlot and A. Celisse. A survey of cross-validation procedures for model selection. Statistics Surveys, 4:40--79, 2010.Google Scholar
Cross Ref
- R. Bagnara, P. M. Hill, E. Ricci, and E. Zaffanella. Precise widening operators for convex polyhedra. Sci. Comput. Program., 58(1-2):28--56, 2005. Google Scholar
Digital Library
- D. Beyer. Second competition on software verification - (summary of SV-COMP 2013). In TACAS, pages 594--609, 2013. Google Scholar
Digital Library
- D. Beyer, T. A. Henzinger, and G. Théoduloz. Program analysis with dynamic precision adjustment. In ASE, pages 29--38, 2008. Google Scholar
Digital Library
- C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., 2006. ISBN 0387310738. Google Scholar
Digital Library
- N. Bjørner, K. L. McMillan, and A. Rybalchenko. On solving universally quantified horn clauses. In SAS, pages 105--125, 2013.Google Scholar
Cross Ref
- A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth. Learnability and the Vapnik-Chervonenkis dimension. J. ACM, 36(4):929--965, 1989. Google Scholar
Digital Library
- N. H. Bshouty, S. A. Goldman, H. D. Mathias, S. Suri, and H. Tamaki. Noise-tolerant distribution-free learning of general geometric concepts. J. ACM, 45(5):863--890, 1998. Google Scholar
Digital Library
- C. Calcagno, D. Distefano, P. W. O'Hearn, and H. Yang. Compositional shape analysis by means of bi-abduction. In POPL, pages 289--300, 2009. Google Scholar
Digital Library
- G. C. Cawley and N. L. C. Talbot. On over-fitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research, 11:2079--2107, 2010. Google Scholar
Digital Library
- S. Chaki, E. M. Clarke, A. Groce, and O. Strichman. Predicate abstraction with minimum predicates. In CHARME, pages 19--34, 2003.Google Scholar
Cross Ref
- E. M. Clarke, O. Grumberg, S. Jha, Y. Lu, and H. Veith. Counterexample-guided abstraction refinement. In CAV, pages 154--169, 2000. Google Scholar
Digital Library
- E. M. Clarke, D. Kroening, N. Sharygina, and K. Yorav. Predicate abstraction of ansi-c programs using sat. Formal Methods in System Design, 25(2-3):105--127, 2004. Google Scholar
Digital Library
- M. Colón, S. Sankaranarayanan, and H. Sipma. Linear invariant generation using non-linear constraint solving. In CAV, pages 420--432, 2003.Google Scholar
- P. Cousot and R. Cousot. Static determination of dynamic properties of programs. In ISOP, pages 106--130, 1976.Google Scholar
- P. Cousot and R. Cousot. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In POPL, pages 238--252, 1977. Google Scholar
Digital Library
- P. Cousot and R. Cousot. Comparing the Galois connection and widening/narrowing approaches to abstract interpretation. In PLILP, pages 269--295, 1992. Google Scholar
Digital Library
- P. Cousot and N. Halbwachs. Automatic discovery of linear restraints among variables of a program. In POPL, pages 84--96, 1978. Google Scholar
Digital Library
- P. Cousot, R. Cousot, J. Feret, L. Mauborgne, A. Miné, and X. Rival. Why does Astrée scale up? Formal Methods in System Design, 35(3): 229--264, 2009. Google Scholar
Digital Library
- P. Domingos. A few useful things to know about machine learning. Commun. ACM, 55(10):78--87, 2012. Google Scholar
Digital Library
- T. Gawlitza and H. Seidl. Precise fixpoint computation through strategy iteration. In ESOP, pages 300--315, 2007. Google Scholar
Digital Library
- T. Gawlitza and H. Seidl. Precise relational invariants through strategy iteration. In CSL, pages 23--40, 2007. Google Scholar
Digital Library
- S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural Computation, 4(1):1--58, 1992. Google Scholar
Digital Library
- P. Godefroid, A. V. Nori, S. K. Rajamani, and S. Tetali. Compositional may-must program analysis: unleashing the power of alternation. In POPL, pages 43--56, 2010. Google Scholar
Digital Library
- B. S. Gulavani and S. Gulwani. A numerical abstract domain based on expression abstraction and max operator with application in timing analysis. In CAV, pages 370--384, 2008. Google Scholar
Digital Library
- S. Gulwani, S. Srivastava, and R. Venkatesan. Program analysis as constraint solving. In PLDI, pages 281--292, 2008. Google Scholar
Digital Library
- A. Gupta, R. Majumdar, and A. Rybalchenko. From tests to proofs. In TACAS, pages 262--276, 2009. Google Scholar
Digital Library
- J. Henry, D. Monniaux, and M. Moy. Pagai: A path sensitive static analyser. Electr. Notes Theor. Comput. Sci., 289:15--25, 2012. Google Scholar
Digital Library
- J. Henry, D. Monniaux, and M. Moy. Succinct representations for abstract interpretation - combined analysis algorithms and experimental evaluation. In SAS, pages 283--299, 2012. Google Scholar
Digital Library
- T. A. Henzinger, R. Jhala, R. Majumdar, and K. L. McMillan. Abstractions from proofs. In POPL, pages 232--244, 2004. Google Scholar
Digital Library
- R. Jhala and K. L. McMillan. A practical and complete approach to predicate refinement. In TACAS, pages 459--473, 2006. Google Scholar
Digital Library
- M. Kearns and D. Ron. Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. Neural Computation, 11:152--162, 1997. Google Scholar
Digital Library
- M. J. Kearns and U. V. Vazirani. An introduction to computational learning theory. MIT Press, Cambridge, MA, USA, 1994. ISBN 0-262-11193-4. Google Scholar
Digital Library
- G. Klein, J. Andronick, K. Elphinstone, G. Heiser, D. Cock, P. Derrin, D. Elkaduwe, K. Engelhardt, R. Kolanski, M. Norrish, T. Sewell, H. Tuch, and S. Winwood. seL4: formal verification of an operatingsystem kernel. Commun. ACM, 53(6):107--115, 2010. Google Scholar
Digital Library
- G. Lalire, M. Argoud, and B. Jeannet. The Interproc Analyzer. http://pop-art.inrialpes.fr/people/bjeannet/bjeannetforge/interproc/index.html.Google Scholar
- P. Liang, O. Tripp, and M. Naik. Learning minimal abstractions. In POPL, pages 31--42, 2011. Google Scholar
Digital Library
- K. L. McMillan. An interpolating theorem prover. Theoretical Computer Science, 345(1):101--121, 2005. Google Scholar
Digital Library
- A. Minè. The octagon abstract domain. Higher-Order and Symbolic Computation, 19(1):31--100, 2006. Google Scholar
Digital Library
- D. Monniaux and L. Gonnord. Using bounded model checking to focus fixpoint iterations. In SAS, pages 369--385, 2011. Google Scholar
Digital Library
- D. Monniaux and J. L. Guen. Stratified static analysis based on variable dependencies. Electr. Notes Theor. Comput. Sci., 288:61--74, 2012. Google Scholar
Digital Library
- A. Y. Ng. Preventing "overfitting" of cross-validation data. In ICML, pages 245--253, 1997. Google Scholar
Digital Library
- A. V. Nori and S. K. Rajamani. An empirical study of optimizations in YOGI. In ICSE (1), pages 355--364, 2010. Google Scholar
Digital Library
- J. C. Reynolds. Separation logic: A logic for shared mutable data structures. In LICS, pages 55--74, 2002. Google Scholar
Digital Library
- S. Sankaranarayanan, H. B. Sipma, and Z. Manna. Scalable analysis of linear systems using mathematical programming. In VMCAI, pages 25--41, 2005. Google Scholar
Digital Library
- S. Sankaranarayanan, F. Ivancic, I. Shlyakhter, and A. Gupta. Static analysis in disjunctive numerical domains. In SAS, pages 3--17, 2006. Google Scholar
Digital Library
- R. Sharma, S. Gupta, B. Hariharan, A. Aiken, and A. V. Nori. Verification as learning geometric concepts. In SAS, pages 388--411, 2013.Google Scholar
Cross Ref
- L. G. Valiant. A theory of the learnable. Commun. ACM, 27(11): 1134--1142, 1984. Google Scholar
Digital Library
- X. Zhang, M. Naik, and H. Yang. Finding optimum abstractions in parametric dataflow analysis. In PLDI, pages 365--376, 2013. Google Scholar
Digital Library
Index Terms
Bias-variance tradeoffs in program analysis
Recommendations
Bias-variance tradeoffs in program analysis
POPL '14: Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming LanguagesIt is often the case that increasing the precision of a program analysis leads to worse results. It is our thesis that this phenomenon is the result of fundamental limits on the ability to use precise abstract domains as the basis for inferring strong ...
SapientML: synthesizing machine learning pipelines by learning from human-writen solutions
ICSE '22: Proceedings of the 44th International Conference on Software EngineeringAutomatic machine learning, or AutoML, holds the promise of truly democratizing the use of machine learning (ML), by substantially automating the work of data scientists. However, the huge combinatorial search space of candidate pipelines means that ...
Gaussian Wiretap Channel With Amplitude and Variance Constraints
We consider the Gaussian wiretap channel with amplitude and variance constraints on the channel input. We first show that the entire rate-equivocation region of the Gaussian wiretap channel with an amplitude constraint is obtained by discrete input ...







Comments