Abstract
Program analyses necessarily make approximations that often lead them to report true alarms interspersed with many false alarms. We propose a new approach to leverage user feedback to guide program analyses towards true alarms and away from false alarms. Our approach associates each alarm with a confidence value by performing Bayesian inference on a probabilistic model derived from the analysis rules. In each iteration, the user inspects the alarm with the highest confidence and labels its ground truth, and the approach recomputes the confidences of the remaining alarms given this feedback. It thereby maximizes the return on the effort by the user in inspecting each alarm. We have implemented our approach in a tool named Bingo for program analyses expressed in Datalog. Experiments with real users and two sophisticated analyses---a static datarace analysis for Java programs and a static taint analysis for Android apps---show significant improvements on a range of metrics, including false alarm rates and number of bugs found.
Supplemental Material
Available for Download
This is the appendix which accompanies the paper, ``User-Guided Program Reasoning using Bayesian Inference''.
- Serge Abiteboul, Richard Hull, and Victor Vianu. 1994. Foundations of databases: The logical level (1st ed.). Pearson. Google Scholar
Digital Library
- Michael Arntzenius and Neelakantan Krishnaswami. 2016. Datafun: A functional Datalog. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming (ICFP 2016) . ACM, 214–227. Google Scholar
Digital Library
- Pavel Avgustinov, Oege de Moor, Michael Peyton Jones, and Max Schäfer. 2016. QL: Object-oriented queries on relational data. In 30th European Conference on Object-Oriented Programming (ECOOP 2016), Shriram Krishnamurthi and Benjamin S. Lerner (Eds.), Vol. 56. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2:1–2:25.Google Scholar
- Stephen Blackburn, Robin Garner, Chris Hoffmann, Asjad Khang, Kathryn McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Guyer, Martin Hirzel, Anthony Hosking, Maria Jump, Han Lee, Eliot Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo benchmarks: Java benchmarking development and analysis. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications (OOPSLA 2006) . ACM, 169–190. Google Scholar
Digital Library
- Sam Blackshear and Shuvendu Lahiri. 2013. Almost-correct specifications: A modular semantic framework for assigning confidence to warnings. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2013) . ACM, 209–218. Google Scholar
Digital Library
- Eric Bodden, Andreas Sewe, Jan Sinschek, Hela Oueslati, and Mira Mezini. 2011. Taming reflection: Aiding static analysis in the presence of reflection and custom class loaders. In Proceedings of the 33rd International Conference on Software Engineering (ICSE 2011) . ACM, 241–250. Google Scholar
Digital Library
- Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly declarative specification of sophisticated points-to analyses. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA 2009) . ACM, 243–262. Google Scholar
Digital Library
- Kwonsoo Chae, Hakjoo Oh, Kihong Heo, and Hongseok Yang. 2017. Automatically generating features for learning program analysis heuristics for C-like languages. Proceedings of the ACM on Programming Languages 1, OOPSLA, Article 101 (Oct. 2017), 25 pages. Google Scholar
Digital Library
- Supratik Chakraborty, Daniel Fremont, Kuldeep Meel, Sanjit Seshia, and Moshe Vardi. 2014. Distribution-aware sampling and weighted model counting for SAT. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI 2014) . AAAI Press, 1722–1730. Google Scholar
Digital Library
- Andy Chou. 2014. On detecting Heartbleed with static analysis. https://www.synopsys.com/blogs/software-security/ detecting-heartbleed-with-static-analysis/ . (2014).Google Scholar
- Nilesh Dalvi and Dan Suciu. 2004. Efficient query evaluation on probabilistic databases. In Proceedings of the 30th International Conference on Very Large Data Bases . VLDB Endowment, 864–875. Google Scholar
Digital Library
- Isil Dillig, Thomas Dillig, and Alex Aiken. 2012. Automated error diagnosis using abductive inference. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2012) . ACM, 181–192. Google Scholar
Digital Library
- Mahdi Eslamimehr and Jens Palsberg. 2014. Race directed scheduling of concurrent programs. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2014) . ACM, 301–314. Google Scholar
Digital Library
- Tom Fawcett. 2006. An introduction to ROC analysis. Pattern Recognition Letters 27, 8 (2006), 861 – 874. Google Scholar
Digital Library
- Yu Feng, Saswat Anand, Isil Dillig, and Alex Aiken. 2014. Apposcopy: Semantics-based detection of Android malware through static analysis. In Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014) . ACM, 576–587. Google Scholar
Digital Library
- Dan Fierens, Guy Van den Broeck, Joris Renkens, Dimitar Shterionov, Bernd Gutmann, Ingo Thon, Gerda Janssens, and Luc De Raedt. 2015. Inference and learning in probabilistic logic programs using weighted Boolean formulas. Theory and Practice of Logic Programming 15, 3 (2015), 358–401.Google Scholar
Cross Ref
- Cormac Flanagan and Stephen Freund. 2009. FastTrack: Efficient and precise dynamic race detection. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2009) . ACM, 121–133. Google Scholar
Digital Library
- Norbert Fuhr. 1995. Probabilistic Datalog: A logic for powerful retrieval methods. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1995) . ACM, 282–290. Google Scholar
Digital Library
- Michael Garey and David Johnson. 1979. Computers and intractability: A guide to the theory of NP-completeness . W. H. Freeman. Google Scholar
Digital Library
- Timon Gehr, Sasa Misailovic, and Martin Vechev. 2016. PSI: Exact symbolic inference for probabilistic programs. In 28th International Conference on Computer Aided Verification (CAV 2016), Swarat Chaudhuri and Azadeh Farzan (Eds.). Springer, 62–83.Google Scholar
Cross Ref
- Lise Getoor, Nir Friedman, Daphne Koller, Avi Pfeffer, and Ben Taskar. 2007. Probabilistic relational models. In Introduction to Statistical Relational Learning, Lise Getoor and Ben Taskar (Eds.). MIT Press, 129–174.Google Scholar
- Patrice Godefroid. 2005. The soundness of bugs is what matters. In Proceedings of BUGS 2005 .Google Scholar
- Radu Grigore and Hongseok Yang. 2016. Abstraction refinement guided by a learnt probabilistic model. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2016) . ACM, 485–498. Google Scholar
Digital Library
- Kihong Heo, Hakjoo Oh, and Kwangkeun Yi. 2017. Machine-learningguided selectively unsound static analysis. In Proceedings of the 39th International Conference on Software Engineering (ICSE 2017) . IEEE Press, 519–529. Google Scholar
Digital Library
- Yungbum Jung, Jaehwang Kim, Jaeho Shin, and Kwangkeun Yi. 2005. Taming false alarms from a domain-unaware C Analyzer by a Bayesian statistical post analysis. In Static Analysis: 12th International Symposium (SAS 2005), Chris Hankin and Igor Siveroni (Eds.). Springer, 203–217. Google Scholar
Digital Library
- Kristian Kersting and Luc De Raedt. 2007. Bayesian logic programming: Theory and tool. In Introduction to Statistical Relational Learning, Lise Getoor and Ben Taskar (Eds.). MIT Press, 291–322.Google Scholar
- Davis King. 2009. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research 10 (2009), 1755–1758. Google Scholar
Digital Library
- Ugur Koc, Parsa Saadatpanah, Jeffrey Foster, and Adam Porter. 2017. Learning a classifier for false positive error reports emitted by static code analysis tools. In Proceedings of the 1st ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL 2017) . ACM, 35–42. Google Scholar
Digital Library
- Daphne Koller and Nir Friedman. 2009. Probabilistic graphical models: Principles and techniques . The MIT Press. Google Scholar
Digital Library
- Ted Kremenek, Ken Ashcraft, Junfeng Yang, and Dawson Engler. 2004. Correlation exploitation in error ranking. In Proceedings of the 12th ACM SIGSOFT International Symposium on Foundations of Software Engineering (SIGSOFT 2004/FSE-12) . ACM, 83–93. Google Scholar
Digital Library
- Ted Kremenek and Dawson Engler. 2003. Z-Ranking: Using statistical analysis to counter the impact of static analysis approximations. In Static Analysis: 10th International Symposium (SAS 2003), Radhia Cousot (Ed.). Springer, 295–315. Google Scholar
Digital Library
- Ted Kremenek, Andrew Ng, and Dawson Engler. 2007. A factor graph model for software bug finding. In Proceedings of the 20th International Joint Conference on Artifical Intelligence (IJCAI 2007) . Morgan Kaufmann, 2510–2516. Google Scholar
Digital Library
- Frank Kschischang, Brendan Frey, and Hans-Andrea Loeliger. 2001. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory 47, 2 (Feb 2001), 498–519. Google Scholar
Digital Library
- Sulekha Kulkarni, Ravi Mangal, Xin Zhang, and Mayur Naik. 2016. Accelerating program analyses by cross-program training. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2016) . ACM, 359–377. Google Scholar
Digital Library
- Wei Le and Mary Lou Soffa. 2010. Path-based fault correlations. In Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2010) . ACM, 307–316. Google Scholar
Digital Library
- Woosuk Lee, Wonchan Lee, and Kwangkeun Yi. 2012. Sound nonstatistical clustering of static analysis alarms. In Verification, Model Checking, and Abstract Interpretation: 13th International Conference (VMCAI 2012), Viktor Kuncak and Andrey Rybalchenko (Eds.). Springer, 299–314. Google Scholar
Digital Library
- Benjamin Livshits and Shuvendu Lahiri. 2014. In defense of probabilistic analysis. In 1st SIGPLAN Workshop on Probabilistic and Approximate Computing .Google Scholar
- Benjamin Livshits, Aditya Nori, Sriram Rajamani, and Anindya Banerjee. 2009. Merlin: Specification inference for explicit information flow problems. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2009) . ACM, 75–86. Google Scholar
Digital Library
- Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondřej Lhoták, J. Nelson Amaral, Bor-Yuh Evan Chang, Samuel Guyer, Uday Khedker, Anders Møller, and Dimitrios Vardoulakis. 2015. In defense of soundiness: A manifesto. Commun. ACM 58, 2 (Jan. 2015), 44–46. Google Scholar
Digital Library
- Magnus Madsen, Ming-Ho Yee, and Ondřej Lhoták. 2016. From Datalog to Flix: A declarative language for fixed points on lattices. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2016) . ACM, 194–208. Google Scholar
Digital Library
- Ravi Mangal, Xin Zhang, Aditya Nori, and Mayur Naik. 2015. A userguided approach to program analysis. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015) . ACM, 462–473. Google Scholar
Digital Library
- Ana Milanova, Atanas Rountev, and Barbara Ryder. 2005. Parameterized object sensitivity for points-to analysis for Java. ACM Transactions on Software Engineering and Methodology 14, 1 (Jan. 2005), 1–41. Google Scholar
Digital Library
- Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel Ong, and Andrey Kolobov. 2005. BLOG: Probabilistic models with unknown objects. In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI 2005) . Morgan Kaufmann, 1352–1359. Google Scholar
Digital Library
- Thomas Minka, John Winn, John Guiver, Sam Webster, Yordan Zaykov, Boris Yangel, Alexander Spengler, and John Bronskill. 2014. Infer.NET 2.6. (2014). Microsoft Research Cambridge. http://research. microsoft.com/infernet .Google Scholar
- Joris Mooij. 2010. libDAI: A free and open source C++ library for discrete approximate inference in graphical models. Journal of Machine Learning Research 11 (Aug 2010), 2169–2173. Google Scholar
Digital Library
- Vijayaraghavan Murali, Swarat Chaudhuri, and Chris Jermaine. 2017. Bayesian specification learning for finding API usage errors. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017) . ACM, 151–162. Google Scholar
Digital Library
- Kevin Murphy, Yair Weiss, and Michael Jordan. 1999. Loopy belief propagation for approximate inference: An empirical study. In Proceedings of the 15th Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI 1999) . Morgan Kaufmann, 467–476. Google Scholar
Digital Library
- Mayur Naik. 2006. Chord: A program analysis platform for Java. http://jchord.googlecode.com/ . (2006).Google Scholar
- Mayur Naik, Alex Aiken, and John Whaley. 2006. Effective static race detection for Java. In Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2006) . ACM, 308–319. Google Scholar
Digital Library
- Feng Niu, Christopher Ré, AnHai Doan, and Jude Shavlik. 2011. Tuffy: Scaling up statistical inference in Markov logic networks using an RDBMS. Proceedings of the VLDB Endowment 4, 6 (March 2011), 373– 384. Google Scholar
Digital Library
- Oded Padon, Kenneth McMillan, Aurojit Panda, Mooly Sagiv, and Sharon Shoham. 2016. Ivy: Safety verification by interactive generalization. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2016) . ACM, 614–630. Google Scholar
Digital Library
- Judea Pearl. 1985. Bayesian networks: A model of self-activated memory for evidential reasoning . Technical Report CSD-850017. University of California Los Angeles.Google Scholar
- Judea Pearl. 1988. Probabilistic reasoning in intelligent systems: Networks of plausible inference . Morgan Kaufmann. Google Scholar
Digital Library
- Martin Puterman. 1994. Markov decision processes: Discrete stochastic dynamic programming (1st ed.). Wiley. Google Scholar
Digital Library
- Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting program properties from “Big Code”. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2015) . ACM, 111–124. Google Scholar
Digital Library
- Thomas Reps. 1995. Demand interprocedural program analysis using logic databases . Springer, 163–196.Google Scholar
- Matthew Richardson and Pedro Domingos. 2006. Markov logic networks. Machine Learning 62, 1 (01 Feb 2006), 107–136. Google Scholar
Digital Library
- Omer Tripp, Salvatore Guarnieri, Marco Pistoia, and Aleksandr Aravkin. 2014. Aletheia: Improving the usability of static security analysis. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS 2014) . ACM, 762–774. Google Scholar
Digital Library
- Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie Hendren, Patrick Lam, and Vijay Sundaresan. 1999. Soot: A Java bytecode optimization framework. In Proceedings of the 1999 Conference of the Centre for Advanced Studies on Collaborative Research (CASCON 1999) . IBM Press. Google Scholar
Digital Library
- John Whaley, Dzintars Avots, Michael Carbin, and Monica Lam. 2005. Using Datalog with binary decision diagrams for program analysis. In Programming Languages and Systems: Third Asian Symposium. Proceedings, Kwangkeun Yi (Ed.). Springer, 97–118. Google Scholar
Digital Library
- Xin Zhang, Radu Grigore, Xujie Si, and Mayur Naik. 2017. Effective interactive resolution of static analysis alarms. Proceedings of the ACM on Programming Languages 1, OOPSLA, Article 57 (Oct. 2017), 30 pages. Google Scholar
Digital Library
Index Terms
User-guided program reasoning using Bayesian inference
Recommendations
User-guided program reasoning using Bayesian inference
PLDI 2018: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and ImplementationProgram analyses necessarily make approximations that often lead them to report true alarms interspersed with many false alarms. We propose a new approach to leverage user feedback to guide program analyses towards true alarms and away from false ...
Continuously reasoning about programs using differential Bayesian inference
PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and ImplementationPrograms often evolve by continuously integrating changes from multiple programmers. The effective adoption of program analysis tools in this continuous integration setting is hindered by the need to only report alarms relevant to a particular program ...
Boosting static analysis accuracy with instrumented test executions
ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringThe two broad approaches to discover properties of programs---static and dynamic analyses---have complementary strengths: static techniques perform exhaustive exploration and prove upper bounds on program behaviors, while the dynamic analysis of test ...







Comments