Abstract
In C programs, error specifications, which specify the value range that each function returns to indicate failures, are widely used to check and propagate errors for the sake of reliability and security. Various kinds of C analyzers employ error specifications for different purposes, e.g., to detect error handling bugs, yet a general approach for generating precise specifications is still missing. This limits the applicability of those tools.
In this paper, we solve this problem by developing a machine learning-based approach named MLPEx. It generates error specifications by analyzing only the source code, and is thus general. We propose a novel machine learning paradigm based on transfer learning, enabling MLPEx to require only one-time minimal data labeling from us (as the tool developers) and zero manual labeling efforts from users. To improve the accuracy of generated error specifications, MLPEx extracts and exploits project-specific information. We evaluate MLPEx on 10 projects, including 6 libraries and 4 applications. An investigation of 3,443 functions and 17,750 paths reveals that MLPEx generates error specifications with a precision of 91% and a recall of 94%, significantly higher than those of state-of-the-art approaches. To further demonstrate the usefulness of the generated error specifications, we use them to detect 57 bugs in 5 tested projects.
Supplemental Material
- 2007. OWASP TOP 10. https://www.owasp.org/images/e/e8/OWASP_Top_10_2007.pdfGoogle Scholar
- 2019. CVE-2019-12818. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-12818Google Scholar
- 2019. MongoDB. https://www.mongodb.com/Google Scholar
- 2019. Neo4j. https://neo4j.com/Google Scholar
- Mithun Acharya and Tao Xie. 2009. Mining API Error-Handling Specifications from Source Code. In Proceedings of the 12th International Conference on Fundamental Approaches to Software Engineering: Held As Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009 (FASE ’09). Springer-Verlag, Berlin, Heidelberg, 370–384.Google Scholar
- Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2018a. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR) 51, 4 (2018), 81.Google Scholar
Digital Library
- Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018b. Learning to represent programs with graphs. In Proceedings of the 6th International Conference on Learning Representations (ICLR ’18).Google Scholar
- Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2018. A General Path-based Representation for Predicting Program Properties. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). ACM, New York, NY, USA, 404–419. Google Scholar
Digital Library
- Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages 3, POPL (2019), 40.Google Scholar
Digital Library
- Ethem Alpaydin. 2009. Introduction to machine learning. MIT press.Google Scholar
Digital Library
- Robert S Arnold. 1996. Software change impact analysis. IEEE Computer Society Press.Google Scholar
Digital Library
- Matej Balog, Alexander L Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2017. Deepcoder: Learning to write programs. In 5th International Conference on Learning Representations (ICLR ’17).Google Scholar
- Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35, 8 (2013), 1798–1828.Google Scholar
Digital Library
- Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5–32.Google Scholar
Digital Library
- Leo Breiman. 2017. Classification and regression trees. Routledge.Google Scholar
- Kwonsoo Chae, Hakjoo Oh, Kihong Heo, and Hongseok Yang. 2017. Automatically Generating Features for Learning Program Analysis Heuristics for C-like Languages. Proc. ACM Program. Lang. 1, OOPSLA, Article 101 (Oct. 2017), 25 pages. Google Scholar
Digital Library
- Yoonsik Cheon. 2007. Automated random testing to detect specification-code inconsistencies. Technical Report UTEP-CS-07-07 (2007).Google Scholar
- Flaviu Cristian. 1982. Exception handling and software fault tolerance. IEEE Trans. Comput. 6 (1982), 531–540.Google Scholar
Digital Library
- Yaniv David, Nimrod Partush, and Eran Yahav. 2016. Statistical Similarity of Binaries. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16). ACM, New York, NY, USA, 266–280. Google Scholar
Digital Library
- Daniel DeFreez, Aditya V Thakur, and Cindy Rubio-González. 2018. Path-based function embedding and its application to error-handling specification mining. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 423–433.Google Scholar
Digital Library
- Isil Dillig, Thomas Dillig, and Alex Aiken. 2007. Static Error Detection Using Semantic Inconsistency Inference. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’07). ACM, New York, NY, USA, 435–445. Google Scholar
Digital Library
- Isil Dillig, Thomas Dillig, and Alex Aiken. 2008. Sound, Complete and Scalable Path-sensitive Analysis. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’08). ACM, New York, NY, USA, 270–280. Google Scholar
Digital Library
- Stuart Geman, Elie Bienenstock, and René Doursat. 1992. Neural networks and the bias/variance dilemma. Neural computation 4, 1 (1992), 1–58.Google Scholar
- John B. Goodenough. 1975. Structured Exception Handling. In Proceedings of the 2Nd ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL ’75). ACM, New York, NY, USA, 204–224. Google Scholar
Digital Library
- Haryadi S Gunawi, Cindy Rubio-González, Andrea C Arpaci-Dusseau, Remzi H Arpaci-Dusseau, and Ben Liblit. 2008. EIO: Error Handling is Occasionally Correct. In 6th USENIX Conference on File and Storage Technologies (FAST ’08), Vol. 8. 1–16.Google Scholar
- Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. Overview of supervised learning. In The elements of statistical learning. Springer, 9–41.Google Scholar
- Jianping Hua, Zixiang Xiong, James Lowey, Edward Suh, and Edward R Dougherty. 2004. Optimal number of features as a function of sample size for various classification rules. Bioinformatics 21, 8 (2004), 1509–1515.Google Scholar
Digital Library
- Jiayuan Huang, Arthur Gretton, Karsten Borgwardt, Bernhard Schölkopf, and Alex J Smola. 2007. Correcting sample selection bias by unlabeled data. In Advances in neural information processing systems. 601–608.Google Scholar
- Suman Jana, Yuan Jochen Kang, Samuel Roth, and Baishakhi Ray. 2016. Automatically Detecting Error Handling Bugs Using Error Specifications. In USENIX Security Symposium. 345–362.Google Scholar
- Yuan Kang, Baishakhi Ray, and Suman Jana. 2016. APEx: Automated Inference of Error Specifications for C APIs. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE 2016). ACM, New York, NY, USA, 472–482. Google Scholar
Digital Library
- David G Kleinbaum, K Dietz, M Gail, Mitchel Klein, and Mitchell Klein. 2002. Logistic regression. Springer.Google Scholar
- Ioannis Kopanas, Nikolaos Avouris, and Sophia Daskalaki. 2002. The role of domain knowledge in a large scale data mining project. Methods and Applications of Artificial Intelligence (2002), 746–746.Google Scholar
- Eric L. Seidel, Huma Sibghat, Kamalika Chaudhuri, Westley Weimer, and Ranjit Jhala. 2017. Learning to Blame: Localizing Novice Type Errors with Data-Driven Diagnosis. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA ’17). ACM.Google Scholar
Digital Library
- Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P Trevino, Jiliang Tang, and Huan Liu. 2018. Feature selection: A data perspective. ACM Computing Surveys (CSUR) 50, 6 (2018), 94.Google Scholar
Digital Library
- Bin Liang, Pan Bian, Yan Zhang, Wenchang Shi, Wei You, and Yan Cai. 2016. AntMiner: mining more bugs by reducing noise interference. In Software Engineering (ICSE), 2016 IEEE/ACM 38th International Conference on. IEEE, 333–344.Google Scholar
Digital Library
- Huan Liu and Rudy Setiono. 1995. Chi2: Feature selection and discretization of numeric attributes. In Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence. IEEE, 388–391.Google Scholar
- Fan Long and Martin Rinard. 2016. Automatic Patch Generation by Learning Correct Code. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16). ACM, New York, NY, USA, 298–312. Google Scholar
Digital Library
- Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579–2605.Google Scholar
- Paul D Marinescu and George Candea. 2011. Efficient testing of recovery code using fault injection. ACM Transactions on Computer Systems (TOCS) 29, 4 (2011), 11.Google Scholar
Digital Library
- Steven S Muchnick. 1997. Advanced compiler design implementation. Morgan Kaufmann.Google Scholar
- Kevin P Murphy. 2012. Machine learning: a probabilistic perspective. MIT press.Google Scholar
Digital Library
- Brad A Myers and Jeffrey Stylos. 2016. Improving API usability. Commun. ACM 59, 6 (2016), 62–69.Google Scholar
Digital Library
- Meiyappan Nagappan, Romain Robbes, Yasutaka Kamei, Éric Tanter, Shane McIntosh, Audris Mockus, and Ahmed E. Hassan. 2015. An Empirical Study of Goto in C Code from GitHub Repositories. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). ACM, New York, NY, USA, 404–414. Google Scholar
Digital Library
- Brian A. Nejmeh. 1988. NPATH: A Measure of Execution Path Complexity and Its Applications. Commun. ACM 31, 2 (Feb. 1988), 188–200. Google Scholar
Digital Library
- Kamal Nigam, John Lafferty, and Andrew McCallum. 1999. Using maximum entropy for text classification. In IJCAI-99 workshop on machine learning for information filtering, Vol. 1. 61–67.Google Scholar
- Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Transactions on knowledge and data engineering 22, 10 (2010), 1345–1359.Google Scholar
Digital Library
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.Google Scholar
Digital Library
- Suzette Person, Guowei Yang, Neha Rungta, and Sarfraz Khurshid. 2011. Directed Incremental Symbolic Execution. In Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’11). ACM, New York, NY, USA, 504–515. Google Scholar
Digital Library
- Michael Pradel and Koushik Sen. 2018. DeepBugs: A learning approach to name-based bug detection. Proceedings of the ACM on Programming Languages 2, OOPSLA (2018), 147.Google Scholar
Digital Library
- Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Code Completion with Statistical Language Models. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, New York, NY, USA, 419–428. Google Scholar
Digital Library
- Herbert Robbins and Sutton Monro. 1951. A stochastic approximation method. The annals of mathematical statistics (1951), 400–407.Google Scholar
- Cindy Rubio-González, Haryadi S. Gunawi, Ben Liblit, Remzi H. Arpaci-Dusseau, and Andrea C. Arpaci-Dusseau. 2009. Error Propagation Analysis for File Systems. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’09). ACM, New York, NY, USA, 270–280. Google Scholar
Digital Library
- Cindy Rubio-González and Ben Liblit. 2010. Expect the unexpected: error code mismatches between documentation and the real world. In Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering. ACM, 73–80.Google Scholar
Digital Library
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision 115, 3 (2015), 211–252.Google Scholar
- Victor S. Sheng, Foster Provost, and Panagiotis G. Ipeirotis. 2008. Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’08). ACM, New York, NY, USA, 614–622. Google Scholar
Digital Library
- Hidetoshi Shimodaira. 2000. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of statistical planning and inference 90, 2 (2000), 227–244.Google Scholar
- Masashi Sugiyama, Shinichi Nakajima, Hisashi Kashima, Paul V Buenau, and Motoaki Kawanabe. 2008. Direct importance estimation with model selection and its application to covariate shift adaptation. In Advances in neural information processing systems. 1433–1440.Google Scholar
- Martin Susskraut and Christof Fetzer. 2006. Automatically finding and patching bad error handling. In Dependable Computing Conference, 2006. EDCC’06. Sixth European. IEEE, 13–22.Google Scholar
Digital Library
- Yuchi Tian and Baishakhi Ray. 2017. Automatically Diagnosing and Repairing Error Handling Bugs in C. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017). ACM, New York, NY, USA, 752–762. Google Scholar
Digital Library
- Vladimir Vapnik. 2013. The nature of statistical learning theory. Springer science & business media.Google Scholar
Digital Library
- Wei Wang, Vincent W Zheng, Han Yu, and Chunyan Miao. 2019. A survey of zero-shot learning: Settings, methods, and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10, 2 (2019), 13.Google Scholar
Cross Ref
- Westley Weimer and George Necula. 2005. Mining temporal specifications for error detection. Tools and Algorithms for the Construction and Analysis of Systems (2005), 461–476.Google Scholar
- Baijun Wu, John Peter Campora III, Yi He, Alexander Schlecht, and Sheng Chen. 2019a. Generating Precise Error Specifications for C: A Zero Shot Learning Approach. Available at https://people.cmix.louisiana.edu/schen/ws/techreport/ errspec.pdf .Google Scholar
- Baijun Wu, John P. Campora III, and Sheng Chen. 2017. Learning User Friendly Type Error Messages. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA ’17). ACM.Google Scholar
- Mike Wu, Milan Mosse, Noah Goodman, and Chris Piech. 2019b. Zero shot learning for code education: Rubric sampling with deep learning inference. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19).Google Scholar
Digital Library
- Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. 2014. Modeling and discovering vulnerabilities with code property graphs. In Security and Privacy (SP), 2014 IEEE Symposium on. IEEE, 590–604.Google Scholar
Digital Library
- Bianca Zadrozny. 2004. Learning and Evaluating Classifiers Under Sample Selection Bias. In Proceedings of the Twenty-first International Conference on Machine Learning (ICML ’04). ACM, New York, NY, USA, 114–. Google Scholar
Digital Library
- Ying Zhang and Chen Ling. 2018. A strategy to apply machine learning to small datasets in materials science. Npj Computational Materials 4, 1 (2018), 25.Google Scholar
Cross Ref
- He Zhu, Gustavo Petri, and Suresh Jagannathan. 2016. Automatically Learning Shape Specifications. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16). ACM, New York, NY, USA, 491–507. Google Scholar
Digital Library
- Xiaojin Zhu and Andrew B Goldberg. 2009. Introduction to semi-supervised learning. Synthesis lectures on artificial intelligence and machine learning 3, 1 (2009), 1–130.Google Scholar
Index Terms
Generating precise error specifications for C: a zero shot learning approach
Recommendations
Learning user friendly type-error messages
Type inference is convenient by allowing programmers to elide type annotations, but this comes at the cost of often generating very confusing and opaque type error messages that are of little help to fix type errors. Though there have been many ...
Generating optimized code from SCR specifications
Proceedings of the 2006 LCTES ConferenceA promising trend in software development is the increasing adoption of model-driven design. In this approach, a developer first constructs an abstract model of the required program behavior in a language, such as Statecharts or Stateflow, and then uses ...
Generating Maude formal specifications from AUML diagrams
Selected papers from the International Conference on Computer Science, Software Engineering, Information Technology, e-Business, and Applications, 2004In this paper, we present a formal and systematic approach allowing translating the specification of the interactions between agents, described using AUML formalism, in a Maude specification. Based on rewriting logic, the formal and object-oriented ...






Comments