skip to main content
research-article
Public Access
Artifacts Evaluated & Functional

User-guided program reasoning using Bayesian inference

Published:11 June 2018Publication History
Skip Abstract Section

Abstract

Program analyses necessarily make approximations that often lead them to report true alarms interspersed with many false alarms. We propose a new approach to leverage user feedback to guide program analyses towards true alarms and away from false alarms. Our approach associates each alarm with a confidence value by performing Bayesian inference on a probabilistic model derived from the analysis rules. In each iteration, the user inspects the alarm with the highest confidence and labels its ground truth, and the approach recomputes the confidences of the remaining alarms given this feedback. It thereby maximizes the return on the effort by the user in inspecting each alarm. We have implemented our approach in a tool named Bingo for program analyses expressed in Datalog. Experiments with real users and two sophisticated analyses---a static datarace analysis for Java programs and a static taint analysis for Android apps---show significant improvements on a range of metrics, including false alarm rates and number of bugs found.

Skip Supplemental Material Section

Supplemental Material

p722-raghothaman.webm

References

  1. Serge Abiteboul, Richard Hull, and Victor Vianu. 1994. Foundations of databases: The logical level (1st ed.). Pearson. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Michael Arntzenius and Neelakantan Krishnaswami. 2016. Datafun: A functional Datalog. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming (ICFP 2016) . ACM, 214–227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Pavel Avgustinov, Oege de Moor, Michael Peyton Jones, and Max Schäfer. 2016. QL: Object-oriented queries on relational data. In 30th European Conference on Object-Oriented Programming (ECOOP 2016), Shriram Krishnamurthi and Benjamin S. Lerner (Eds.), Vol. 56. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2:1–2:25.Google ScholarGoogle Scholar
  4. Stephen Blackburn, Robin Garner, Chris Hoffmann, Asjad Khang, Kathryn McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Guyer, Martin Hirzel, Anthony Hosking, Maria Jump, Han Lee, Eliot Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo benchmarks: Java benchmarking development and analysis. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications (OOPSLA 2006) . ACM, 169–190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Sam Blackshear and Shuvendu Lahiri. 2013. Almost-correct specifications: A modular semantic framework for assigning confidence to warnings. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2013) . ACM, 209–218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Eric Bodden, Andreas Sewe, Jan Sinschek, Hela Oueslati, and Mira Mezini. 2011. Taming reflection: Aiding static analysis in the presence of reflection and custom class loaders. In Proceedings of the 33rd International Conference on Software Engineering (ICSE 2011) . ACM, 241–250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly declarative specification of sophisticated points-to analyses. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA 2009) . ACM, 243–262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kwonsoo Chae, Hakjoo Oh, Kihong Heo, and Hongseok Yang. 2017. Automatically generating features for learning program analysis heuristics for C-like languages. Proceedings of the ACM on Programming Languages 1, OOPSLA, Article 101 (Oct. 2017), 25 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Supratik Chakraborty, Daniel Fremont, Kuldeep Meel, Sanjit Seshia, and Moshe Vardi. 2014. Distribution-aware sampling and weighted model counting for SAT. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI 2014) . AAAI Press, 1722–1730. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Andy Chou. 2014. On detecting Heartbleed with static analysis. https://www.synopsys.com/blogs/software-security/ detecting-heartbleed-with-static-analysis/ . (2014).Google ScholarGoogle Scholar
  11. Nilesh Dalvi and Dan Suciu. 2004. Efficient query evaluation on probabilistic databases. In Proceedings of the 30th International Conference on Very Large Data Bases . VLDB Endowment, 864–875. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Isil Dillig, Thomas Dillig, and Alex Aiken. 2012. Automated error diagnosis using abductive inference. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2012) . ACM, 181–192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Mahdi Eslamimehr and Jens Palsberg. 2014. Race directed scheduling of concurrent programs. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2014) . ACM, 301–314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Tom Fawcett. 2006. An introduction to ROC analysis. Pattern Recognition Letters 27, 8 (2006), 861 – 874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yu Feng, Saswat Anand, Isil Dillig, and Alex Aiken. 2014. Apposcopy: Semantics-based detection of Android malware through static analysis. In Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014) . ACM, 576–587. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Dan Fierens, Guy Van den Broeck, Joris Renkens, Dimitar Shterionov, Bernd Gutmann, Ingo Thon, Gerda Janssens, and Luc De Raedt. 2015. Inference and learning in probabilistic logic programs using weighted Boolean formulas. Theory and Practice of Logic Programming 15, 3 (2015), 358–401.Google ScholarGoogle ScholarCross RefCross Ref
  17. Cormac Flanagan and Stephen Freund. 2009. FastTrack: Efficient and precise dynamic race detection. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2009) . ACM, 121–133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Norbert Fuhr. 1995. Probabilistic Datalog: A logic for powerful retrieval methods. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1995) . ACM, 282–290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Michael Garey and David Johnson. 1979. Computers and intractability: A guide to the theory of NP-completeness . W. H. Freeman. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Timon Gehr, Sasa Misailovic, and Martin Vechev. 2016. PSI: Exact symbolic inference for probabilistic programs. In 28th International Conference on Computer Aided Verification (CAV 2016), Swarat Chaudhuri and Azadeh Farzan (Eds.). Springer, 62–83.Google ScholarGoogle ScholarCross RefCross Ref
  21. Lise Getoor, Nir Friedman, Daphne Koller, Avi Pfeffer, and Ben Taskar. 2007. Probabilistic relational models. In Introduction to Statistical Relational Learning, Lise Getoor and Ben Taskar (Eds.). MIT Press, 129–174.Google ScholarGoogle Scholar
  22. Patrice Godefroid. 2005. The soundness of bugs is what matters. In Proceedings of BUGS 2005 .Google ScholarGoogle Scholar
  23. Radu Grigore and Hongseok Yang. 2016. Abstraction refinement guided by a learnt probabilistic model. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2016) . ACM, 485–498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kihong Heo, Hakjoo Oh, and Kwangkeun Yi. 2017. Machine-learningguided selectively unsound static analysis. In Proceedings of the 39th International Conference on Software Engineering (ICSE 2017) . IEEE Press, 519–529. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Yungbum Jung, Jaehwang Kim, Jaeho Shin, and Kwangkeun Yi. 2005. Taming false alarms from a domain-unaware C Analyzer by a Bayesian statistical post analysis. In Static Analysis: 12th International Symposium (SAS 2005), Chris Hankin and Igor Siveroni (Eds.). Springer, 203–217. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Kristian Kersting and Luc De Raedt. 2007. Bayesian logic programming: Theory and tool. In Introduction to Statistical Relational Learning, Lise Getoor and Ben Taskar (Eds.). MIT Press, 291–322.Google ScholarGoogle Scholar
  27. Davis King. 2009. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research 10 (2009), 1755–1758. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ugur Koc, Parsa Saadatpanah, Jeffrey Foster, and Adam Porter. 2017. Learning a classifier for false positive error reports emitted by static code analysis tools. In Proceedings of the 1st ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL 2017) . ACM, 35–42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Daphne Koller and Nir Friedman. 2009. Probabilistic graphical models: Principles and techniques . The MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ted Kremenek, Ken Ashcraft, Junfeng Yang, and Dawson Engler. 2004. Correlation exploitation in error ranking. In Proceedings of the 12th ACM SIGSOFT International Symposium on Foundations of Software Engineering (SIGSOFT 2004/FSE-12) . ACM, 83–93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ted Kremenek and Dawson Engler. 2003. Z-Ranking: Using statistical analysis to counter the impact of static analysis approximations. In Static Analysis: 10th International Symposium (SAS 2003), Radhia Cousot (Ed.). Springer, 295–315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ted Kremenek, Andrew Ng, and Dawson Engler. 2007. A factor graph model for software bug finding. In Proceedings of the 20th International Joint Conference on Artifical Intelligence (IJCAI 2007) . Morgan Kaufmann, 2510–2516. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Frank Kschischang, Brendan Frey, and Hans-Andrea Loeliger. 2001. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory 47, 2 (Feb 2001), 498–519. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Sulekha Kulkarni, Ravi Mangal, Xin Zhang, and Mayur Naik. 2016. Accelerating program analyses by cross-program training. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2016) . ACM, 359–377. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Wei Le and Mary Lou Soffa. 2010. Path-based fault correlations. In Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2010) . ACM, 307–316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Woosuk Lee, Wonchan Lee, and Kwangkeun Yi. 2012. Sound nonstatistical clustering of static analysis alarms. In Verification, Model Checking, and Abstract Interpretation: 13th International Conference (VMCAI 2012), Viktor Kuncak and Andrey Rybalchenko (Eds.). Springer, 299–314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Benjamin Livshits and Shuvendu Lahiri. 2014. In defense of probabilistic analysis. In 1st SIGPLAN Workshop on Probabilistic and Approximate Computing .Google ScholarGoogle Scholar
  38. Benjamin Livshits, Aditya Nori, Sriram Rajamani, and Anindya Banerjee. 2009. Merlin: Specification inference for explicit information flow problems. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2009) . ACM, 75–86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondřej Lhoták, J. Nelson Amaral, Bor-Yuh Evan Chang, Samuel Guyer, Uday Khedker, Anders Møller, and Dimitrios Vardoulakis. 2015. In defense of soundiness: A manifesto. Commun. ACM 58, 2 (Jan. 2015), 44–46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Magnus Madsen, Ming-Ho Yee, and Ondřej Lhoták. 2016. From Datalog to Flix: A declarative language for fixed points on lattices. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2016) . ACM, 194–208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Ravi Mangal, Xin Zhang, Aditya Nori, and Mayur Naik. 2015. A userguided approach to program analysis. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015) . ACM, 462–473. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Ana Milanova, Atanas Rountev, and Barbara Ryder. 2005. Parameterized object sensitivity for points-to analysis for Java. ACM Transactions on Software Engineering and Methodology 14, 1 (Jan. 2005), 1–41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel Ong, and Andrey Kolobov. 2005. BLOG: Probabilistic models with unknown objects. In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI 2005) . Morgan Kaufmann, 1352–1359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Thomas Minka, John Winn, John Guiver, Sam Webster, Yordan Zaykov, Boris Yangel, Alexander Spengler, and John Bronskill. 2014. Infer.NET 2.6. (2014). Microsoft Research Cambridge. http://research. microsoft.com/infernet .Google ScholarGoogle Scholar
  45. Joris Mooij. 2010. libDAI: A free and open source C++ library for discrete approximate inference in graphical models. Journal of Machine Learning Research 11 (Aug 2010), 2169–2173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Vijayaraghavan Murali, Swarat Chaudhuri, and Chris Jermaine. 2017. Bayesian specification learning for finding API usage errors. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017) . ACM, 151–162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Kevin Murphy, Yair Weiss, and Michael Jordan. 1999. Loopy belief propagation for approximate inference: An empirical study. In Proceedings of the 15th Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI 1999) . Morgan Kaufmann, 467–476. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Mayur Naik. 2006. Chord: A program analysis platform for Java. http://jchord.googlecode.com/ . (2006).Google ScholarGoogle Scholar
  49. Mayur Naik, Alex Aiken, and John Whaley. 2006. Effective static race detection for Java. In Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2006) . ACM, 308–319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Feng Niu, Christopher Ré, AnHai Doan, and Jude Shavlik. 2011. Tuffy: Scaling up statistical inference in Markov logic networks using an RDBMS. Proceedings of the VLDB Endowment 4, 6 (March 2011), 373– 384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Oded Padon, Kenneth McMillan, Aurojit Panda, Mooly Sagiv, and Sharon Shoham. 2016. Ivy: Safety verification by interactive generalization. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2016) . ACM, 614–630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Judea Pearl. 1985. Bayesian networks: A model of self-activated memory for evidential reasoning . Technical Report CSD-850017. University of California Los Angeles.Google ScholarGoogle Scholar
  53. Judea Pearl. 1988. Probabilistic reasoning in intelligent systems: Networks of plausible inference . Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Martin Puterman. 1994. Markov decision processes: Discrete stochastic dynamic programming (1st ed.). Wiley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting program properties from “Big Code”. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2015) . ACM, 111–124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Thomas Reps. 1995. Demand interprocedural program analysis using logic databases . Springer, 163–196.Google ScholarGoogle Scholar
  57. Matthew Richardson and Pedro Domingos. 2006. Markov logic networks. Machine Learning 62, 1 (01 Feb 2006), 107–136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Omer Tripp, Salvatore Guarnieri, Marco Pistoia, and Aleksandr Aravkin. 2014. Aletheia: Improving the usability of static security analysis. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS 2014) . ACM, 762–774. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie Hendren, Patrick Lam, and Vijay Sundaresan. 1999. Soot: A Java bytecode optimization framework. In Proceedings of the 1999 Conference of the Centre for Advanced Studies on Collaborative Research (CASCON 1999) . IBM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. John Whaley, Dzintars Avots, Michael Carbin, and Monica Lam. 2005. Using Datalog with binary decision diagrams for program analysis. In Programming Languages and Systems: Third Asian Symposium. Proceedings, Kwangkeun Yi (Ed.). Springer, 97–118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Xin Zhang, Radu Grigore, Xujie Si, and Mayur Naik. 2017. Effective interactive resolution of static analysis alarms. Proceedings of the ACM on Programming Languages 1, OOPSLA, Article 57 (Oct. 2017), 30 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. User-guided program reasoning using Bayesian inference

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!