skip to main content
research-article
Open Access
Results Reproduced

DeepBugs: a learning approach to name-based bug detection

Published:24 October 2018Publication History
Skip Abstract Section

Abstract

Natural language elements in source code, e.g., the names of variables and functions, convey useful information. However, most existing bug detection tools ignore this information and therefore miss some classes of bugs. The few existing name-based bug detection approaches reason about names on a syntactic level and rely on manually designed and tuned algorithms to detect bugs. This paper presents DeepBugs, a learning approach to name-based bug detection, which reasons about names based on a semantic representation and which automatically learns bug detectors instead of manually writing them. We formulate bug detection as a binary classification problem and train a classifier that distinguishes correct from incorrect code. To address the challenge that effectively learning a bug detector requires examples of both correct and incorrect code, we create likely incorrect code examples from an existing corpus of code through simple code transformations. A novel insight learned from our work is that learning from artificially seeded bugs yields bug detectors that are effective at finding bugs in real-world code. We implement our idea into a framework for learning-based and name-based bug detection. Three bug detectors built on top of the framework detect accidentally swapped function arguments, incorrect binary operators, and incorrect operands in binary operations. Applying the approach to a corpus of 150,000 JavaScript files yields bug detectors that have a high accuracy (between 89% and 95%), are very efficient (less than 20 milliseconds per analyzed file), and reveal 102 programming mistakes (with 68% true positive rate) in real-world code.

Skip Supplemental Material Section

Supplemental Material

a147-pradel.webm

References

  1. Edward Aftandilian, Raluca Sauciuc, Siddharth Priya, and Sundaresan Krishnan. 2012. Building Useful Program Analysis Tools Using an Extensible Java Compiler. In 12th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2012, Riva del Garda, Italy, September 23-24, 2012 . 14–23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles A. Sutton. 2014. Learning natural coding conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, (FSE-22), Hong Kong, China, November 16 - 22, 2014 . 281–293. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, and Charles Sutton. 2017a. A Survey of Machine Learning for Big Code and Naturalness. arXiv:1709.06182 (2017).Google ScholarGoogle Scholar
  4. Miltiadis Allamanis and Marc Brockschmidt. 2017. SmartPaste: Learning to Adapt Source Code. CoRR abs/1705.07867 (2017). http://arxiv.org/abs/1705.07867Google ScholarGoogle Scholar
  5. Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2017b. Learning to Represent Programs with Graphs. CoRR abs/1711.00740 (2017). arXiv: 1711.00740 http://arxiv.org/abs/1711.00740Google ScholarGoogle Scholar
  6. Miltiadis Allamanis, Hao Peng, and Charles A. Sutton. 2016. A Convolutional Attention Network for Extreme Summarization of Source Code. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016 . 2091–2100.Google ScholarGoogle Scholar
  7. Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2018. A General Path-Based Representation for Predicting Program Properties. In PLDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Glenn Ammons, Rastislav Bodík, and James R. Larus. 2002. Mining specifications. In Symposium on Principles of Programming Languages (POPL) . ACM, 4–16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Amodio, S. Chaudhuri, and T. Reps. 2017. Neural Attribute Machines for Program Generation. ArXiv e-prints (May 2017). arXiv: cs.AI/1705.09231Google ScholarGoogle Scholar
  10. Esben Andreasen, Liang Gong, Anders Møller, Michael Pradel, Marija Selakovic, Koushik Sen, and Cristian alexandru Staicu. 2017. A Survey of Dynamic Analysis and Test Generation for JavaScript. Comput. Surveys (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Sahil Bhatia and Rishabh Singh. 2016. Automated Correction for Syntax Errors in Programming Assignments using Recurrent Neural Networks. CoRR abs/1603.06129 (2016).Google ScholarGoogle Scholar
  12. Pavol Bielik, Veselin Raychev, and Martin T. Vechev. 2016. PHOG: Probabilistic Model for Code. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016 . 2933–2942. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. David Bingham Brown, Michael Vaughn, Ben Liblit, and Thomas W. Reps. 2017. The care and feeding of wild-caught mutants. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, Paderborn, Germany, September 4-8, 2017 . 511–522. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Simon Butler, Michel Wermelinger, Yijun Yu, and Helen Sharp. 2010. Exploring the Influence of Identifier Names on Code Quality: An Empirical Study. In European Conference on Software Maintenance and Reengineering (CSMR). IEEE, 156–165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Cristiano Calcagno, Dino Distefano, Jérémy Dubreil, Dominik Gabi, Pieter Hooimeijer, Martino Luca, Peter OâĂŹHearn, Irene Papakonstantinou, Jim Purbrick, and Dulma Rodriguez. 2015. Moving fast with software verification. In NASA Formal Methods Symposium . 3–11.Google ScholarGoogle ScholarCross RefCross Ref
  16. Daniel DeFreez, Aditya V. Thakur, and Cindy Rubio-González. 2018. Path-Based Function Embedding and its Application to Specification Mining. CoRR abs/1802.07779 (2018).Google ScholarGoogle Scholar
  17. Brendan Dolan-Gavitt, Patrick Hulin, Engin Kirda, Tim Leek, Andrea Mambretti, William K. Robertson, Frederick Ulrich, and Ryan Whelan. 2016. LAVA: Large-Scale Automated Vulnerability Addition. In IEEE Symposium on Security and Privacy, SP 2016, San Jose, CA, USA, May 22-26, 2016 . 110–121.Google ScholarGoogle Scholar
  18. ECMA. 2011. Standard ECMA-262, ECMAScript Language Specification, 5.1 Edition. (June 2011).Google ScholarGoogle Scholar
  19. Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf. 2001. Bugs as Deviant Behavior: A General Approach to Inferring Errors in Systems Code. In Symposium on Operating Systems Principles (SOSP). ACM, 57–72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Patrice Godefroid, Hila Peleg, and Rishabh Singh. 2017. Learn&Fuzz: Machine Learning for Input Fuzzing. CoRR abs/1701.07232 (2017).Google ScholarGoogle Scholar
  21. Liang Gong, Michael Pradel, and Koushik Sen. 2015. JITProf: Pinpointing JIT-unfriendly JavaScript Code. In European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) . 357–368. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. 2016. Deep API learning. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, Seattle, WA, USA, November 13-18, 2016 . 631–642. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. DeepFix: Fixing Common C Language Errors by Deep Learning. In AAAI.Google ScholarGoogle Scholar
  24. Michael Gutmann and Aapo Hyvärinen. 2010. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. 297–304.Google ScholarGoogle Scholar
  25. Andrew Habib and Michael Pradel. 2018. Is This Class Thread-Safe? Inferring Documentation using Graph-based Learning. In ASE. ASE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Quinn Hanam, Fernando Santos De Mattos Brito, and Ali Mesbah. 2016. Discovering bug patterns in JavaScript. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, Seattle, WA, USA, November 13-18, 2016 . 144–156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sudheendra Hangal and Monica S. Lam. 2002. Tracking down software bugs using automatic anomaly detection. In International Conference on Software Engineering (ICSE) . ACM, 291–301. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar T. Devanbu. 2012. On the naturalness of software. In 34th International Conference on Software Engineering, ICSE 2012, June 2-9, 2012, Zurich, Switzerland. 837–847. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Einar W. Høst and Bjarte M. Østvold. 2009. Debugging Method Names. In European Conference on Object-Oriented Programming (ECOOP) . Springer, 294–317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. David Hovemeyer and William Pugh. 2004. Finding bugs is easy. In Companion to the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA) . ACM, 132–136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Min je Choi, Sehun Jeong, Hakjoo Oh, and Jaegul Choo. 2017. End-to-End Prediction of Buffer Overruns from Raw Source Code via Neural Memory Networks. CoRR abs/1703.02458 (2017).Google ScholarGoogle Scholar
  32. Simon Holm Jensen, Anders Møller, and Peter Thiemann. 2009. Type Analysis for JavaScript. In Symposium on Static Analysis (SAS) . Springer, 238–255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yue Jia and Mark Harman. 2011. An analysis and survey of the development of mutation testing. IEEE transactions on software engineering 37, 5 (2011), 649–678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Dawn Lawrie, Christopher Morrell, Henry Feild, and David Binkley. 2006. What’s in a Name? A Study of Identifiers. In International Conference on Program Comprehension (ICPC) . IEEE, 3–12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Zhen Li, Shouhuai Xu Deqing Zou and, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. In NDSS.Google ScholarGoogle Scholar
  36. Bin Liang, Pan Bian, Yan Zhang, Wenchang Shi, Wei You, and Yan Cai. 2016. AntMiner: Mining More Bugs by Reducing Noise Interference. In ICSE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Hui Liu, Qiurong Liu, Cristian-Alexandru Staicu, Michael Pradel, and Yue Luo. 2016. Nomen Est Omen: Exploring and Exploiting Similarities between Argument and Parameter Names. In International Conference on Software Engineering (ICSE) . 1063–1073. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Peng Liu, Xiangyu Zhang, Marco Pistoia, Yunhui Zheng, Manoel Marques, and Lingfei Zeng. 2017. Automatic Text Input Generation for Mobile Testing. In ICSE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient Estimation of Word Representations in Vector Space. CoRR abs/1301.3781 (2013). http://arxiv.org/abs/1301.3781Google ScholarGoogle Scholar
  40. Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013b. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States. 3111–3119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Martin Monperrus, Marcel Bruch, and Mira Mezini. 2010. Detecting Missing Method Calls in Object-Oriented Software. In European Conference on Object-Oriented Programming (ECOOP) . Springer, 2–25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional Neural Networks over Tree Structures for Programming Language Processing. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA. 1287–1293. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Vijayaraghavan Murali, Swarat Chaudhuri, and Chris Jermaine. 2017. Finding Likely Errors with Bayesian Specifications. CoRR abs/1703.01370 (2017). http://arxiv.org/abs/1703.01370Google ScholarGoogle Scholar
  44. Anh Tuan Nguyen and Tien N. Nguyen. 2015. Graph-Based Statistical Language Model for Code. In 37th IEEE/ACM International Conference on Software Engineering, ICSE 2015, Florence, Italy, May 16-24, 2015, Volume 1 . 858–868. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Trong Duc Nguyen, Anh Tuan Nguyen, Hung Dang Phan, and Tien N. Nguyen. 2017. Exploring API embedding for API usages and applications. In Proceedings of the 39th International Conference on Software Engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017 . 438–449. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Jibesh Patra and Michael Pradel. 2016. Learning to Fuzz: Application-Independent Fuzz Testing with Probabilistic, Generative Models of Input Data . Technical Report TUD-CS-2016-14664. TU Darmstadt.Google ScholarGoogle Scholar
  47. Jannik Pewny and Thorsten Holz. 2016. EvilCoder: automated bug insertion. In Annual Conference on Computer Security Applications (ACSAC) . 214–225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Michael Pradel and Thomas R. Gross. 2011. Detecting anomalies in the order of equally-typed method arguments. In International Symposium on Software Testing and Analysis (ISSTA) . 232–242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Michael Pradel, Ciera Jaspan, Jonathan Aldrich, and Thomas R. Gross. 2012. Statically Checking API Protocol Conformance with Mined Multi-Object Specifications. In International Conference on Software Engineering (ICSE). 925–935. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Michael Pradel, Parker Schuh, and Koushik Sen. 2015. TypeDevil: Dynamic Type Inconsistency Analysis for JavaScript. In International Conference on Software Engineering (ICSE) . Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, and Premkumar T. Devanbu. 2016. On the "naturalness" of buggy code. In Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016 . 428–439. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Veselin Raychev, Pavol Bielik, and Martin Vechev. 2016a. Probabilistic Model for Code with Decision Trees. In OOPSLA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Veselin Raychev, Pavol Bielik, Martin T. Vechev, and Andreas Krause. 2016b. Learning programs from noisy data. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA, January 20 - 22, 2016 . 761–774. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Veselin Raychev, Martin T. Vechev, and Andreas Krause. 2015. Predicting Program Properties from "Big Code".. In Principles of Programming Languages (POPL) . 111–124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Veselin Raychev, Martin T. Vechev, and Eran Yahav. 2014. Code completion with statistical language models. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, Edinburgh, United Kingdom - June 09 - 11, 2014 . 44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Andrew Rice, Edward Aftandilian, Ciera Jaspan, Emily Johnston, Michael Pradel, and Yulissa Arroyo-Paredes. 2017. Detecting Argument Selection Defects. In OOPSLA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Peter Thiemann. 2005. Towards a Type System for Analyzing JavaScript Programs. In European Symposium on Programming (ESOP) . 408–422. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Bogdan Vasilescu, Casey Casalnuovo, and Premkumar T. Devanbu. 2017. Recovering clear, natural identifiers from obfuscated JS names. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, Paderborn, Germany, September 4-8, 2017 . 683–693. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically learning semantic features for defect prediction. In Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016 . 297–308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Andrzej Wasylkowski and Andreas Zeller. 2009. Mining Temporal Specifications from Object Usage. In International Conference on Automated Software Engineering (ASE) . IEEE, 295–306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Jason Weston, Sumit Chopra, and Antoine Bordes. 2014. Memory Networks. CoRR abs/1410.3916 (2014).Google ScholarGoogle Scholar
  62. Martin White, Michele Tufano, Christopher Vendome, and Denys Poshyvanyk. 2016. Deep learning code fragments for code clone detection. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, Singapore, September 3-7, 2016 . 87–98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Xin Ye, Hui Shen, Xiao Ma, Razvan C. Bunescu, and Chang Liu. 2016. From word embeddings to document similarities for improved information retrieval in software engineering. In Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016 . 404–415. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. DeepBugs: a learning approach to name-based bug detection

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!