skip to main content
research-article
Open Access

Getafix: learning to fix bugs automatically

Published:10 October 2019Publication History
Skip Abstract Section

Abstract

Static analyzers help find bugs early by warning about recurring bug categories. While fixing these bugs still remains a mostly manual task in practice, we observe that fixes for a specific bug category often are repetitive. This paper addresses the problem of automatically fixing instances of common bugs by learning from past fixes. We present Getafix, an approach that produces human-like fixes while being fast enough to suggest fixes in time proportional to the amount of time needed to obtain static analysis results in the first place.

Getafix is based on a novel hierarchical clustering algorithm that summarizes fix patterns into a hierarchy ranging from general to specific patterns. Instead of an expensive exploration of a potentially large space of candidate fixes, Getafix uses a simple yet effective ranking technique that uses the context of a code change to select the most appropriate fix for a given bug.

Our evaluation applies Getafix to 1,268 bug fixes for six bug categories reported by popular static analyzers for Java, including null dereferences, incorrect API calls, and misuses of particular language constructs. The approach predicts exactly the human-written fix as the top-most suggestion between 12% and 91% of the time, depending on the bug category. The top-5 suggestions contain fixes for 526 of the 1,268 bugs. Moreover, we report on deploying the approach within Facebook, where it contributes to the reliability of software used by billions of people. To the best of our knowledge, Getafix is the first industrially-deployed automated bug-fixing tool that learns fix patterns from past, human-written fixes to produce human-like fixes.

References

  1. Edward Aftandilian, Raluca Sauciuc, Siddharth Priya, and Sundaresan Krishnan. 2012. Building Useful Program Analysis Tools Using an Extensible Java Compiler. In 12th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2012, Riva del Garda, Italy, September 23-24, 2012. 14–23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR) 51, 4 (2018), 81.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jean-Paul Benzécri. 1982. Construction d’une classification ascendante hiérarchique par la recherche en chaîne des voisins réciproques. Cahiers de l’analyse des données 7, 2 (1982), 209–218. http://www.numdam.org/item/CAD_1982__7_2_209_0Google ScholarGoogle Scholar
  4. David Bingham Brown, Michael Vaughn, Ben Liblit, and Thomas W. Reps. 2017. The care and feeding of wild-caught mutants. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, Paderborn, Germany, September 4-8, 2017. 511–522.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cristiano Calcagno, Dino Distefano, Jérémy Dubreil, Dominik Gabi, Pieter Hooimeijer, Martino Luca, Peter O’Hearn, Irene Papakonstantinou, Jim Purbrick, and Dulma Rodriguez. 2015. Moving fast with software verification. In NASA Formal Methods Symposium. Springer, 3–11.Google ScholarGoogle ScholarCross RefCross Ref
  6. Maria Christakis and Christian Bird. 2016. What developers want and need from program analysis: an empirical study. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, Singapore, September 3-7, 2016. 332–343. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Benoit Cornu, Thomas Durieux, Lionel Seinturier, and Martin Monperrus. 2015. Npefix: Automatic runtime repair of null pointer exceptions in java. arXiv preprint arXiv:1512.07423 (2015).Google ScholarGoogle Scholar
  8. Jacob Devlin, Jonathan Uesato, Rishabh Singh, and Pushmeet Kohli. 2017. Semantic Code Repair using Neuro-Symbolic Transformation Networks. CoRR abs/1710.11054 (2017). arXiv: 1710.11054 http://arxiv.org/abs/1710.11054Google ScholarGoogle Scholar
  9. Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. Fine-grained and Accurate Source Code Differencing. In Proceedings of the International Conference on Automated Software Engineering. Västeras, Sweden, 313–324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest, and Westley Weimer. 2012. GenProg: A Generic Method for Automatic Software Repair. IEEE Transactions on Software Engineering 38 (2012), 54–72.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Claire Le Goues, Michael Pradel, and Abhik Roychoudhury. 2019. Automated Program Repair. Commun. ACM (2019). To appear.Google ScholarGoogle Scholar
  12. Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. DeepFix: Fixing Common C Language Errors by Deep Learning. In AAAI.Google ScholarGoogle Scholar
  13. Jinru Hua, Mengshi Zhang, Kaiyuan Wang, and Sarfraz Khurshid. 2018. Towards Practical Program Repair with On-demand Candidate Generation. 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE) (2018), 12–23.Google ScholarGoogle Scholar
  14. Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. 2013. Automatic patch generation learned from humanwritten patches. In Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, 802–811.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Temur Kutsia, Jordi Levy, and Mateu Villaret. 2014. Anti-unification for Unranked Terms and Hedges. Journal of Automated Reasoning 52, 2 (01 Feb 2014), 155–190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Xuan-Bach D. Le, David Lo, and Claire Le Goues. 2016. History Driven Program Repair. 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER) 1 (2016), 213–224.Google ScholarGoogle Scholar
  17. Fan Long, Peter Amidon, and Martin Rinard. 2017. Automatic inference of code transforms for patch generation. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, 727–739.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Fan Long and Martin Rinard. 2016. Automatic Patch Generation by Learning Correct Code. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16). ACM, New York, NY, USA, 298–312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Alexandru Marginean, Johannes Bader, Satish Chandra, Mark Harman, Yue Jia, Ke Mao, Alexander Mols, and Andrew Scott. 2019. SapFix: Automated End-to-End Repair at Scale (ICSE-SEIP ’19).Google ScholarGoogle Scholar
  20. Matias Martinez and Martin Monperrus. 2012. Mining repair actions for guiding automated program fixing. Ph.D. Dissertation. Inria.Google ScholarGoogle Scholar
  21. Matias Martinez and Martin Monperrus. 2015. Mining Software Repair Models for Reasoning on the Search Space of Automated Program Fixing. Empirical Softw. Engg. 20, 1 (Feb. 2015), 176–205. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Matias Martinez and Martin Monperrus. 2018. Coming: a Tool for Mining Change Pattern Instances from Git Commits. arXiv: arXiv:1810.08532Google ScholarGoogle Scholar
  23. Oleksandr Polozov and Sumit Gulwani. 2015. FlashMeta: A Framework for Inductive Program Synthesis. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015). ACM, New York, NY, USA, 107–126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Michael Pradel and Koushik Sen. 2018. DeepBugs: A Learning Approach to Name-based Bug Detection. CoRR abs/1805.11683 (2018). arXiv: 1805.11683 http://arxiv.org/abs/1805.11683Google ScholarGoogle Scholar
  25. Reudismam Rolim, Gustavo Soares, Loris D’Antoni, Oleksandr Polozov, Sumit Gulwani, Rohit Gheyi, Ryo Suzuki, and Björn Hartmann. 2017. Learning Syntactic Program Transformations from Examples. In Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). IEEE Press, Piscataway, NJ, USA, 404–415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Reudismam Rolim, Gustavo Soares, Rohit Gheyi, and Loris D’Antoni. 2018. Learning Quick Fixes from Code Repositories. CoRR abs/1803.03806 (2018). arXiv: 1803.03806 http://arxiv.org/abs/1803.03806Google ScholarGoogle Scholar
  27. M. Soto and C. Le Goues. 2018. Using a probabilistic model to predict bug fixes. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), Vol. 00. 221–231. Google ScholarGoogle ScholarCross RefCross Ref
  28. Mauricio Soto, Ferdian Thung, Chu-Pan Wong, Claire Le Goues, and David Lo. 2016. A Deeper Look into Bug Fixes: Patterns, Replacements, Deletions, and Additions. In Proceedings of the 13th International Conference on Mining Software Repositories (MSR ’16). ACM, New York, NY, USA, 512–515. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ke Wang, Rishabh Singh, and Zhendong Su. 2018. Search, align, and repair: data-driven feedback generation for introductory programming exercises. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018. 481–495.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ming Wen, Junjie Chen, Rongxin Wu, Dan Hao, and Shing-Chi Cheung. 2018. Context-Aware Patch Generation for Better Automated Program Repair. 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE) (2018), 1–11.Google ScholarGoogle Scholar
  31. Pengcheng Yin, Graham Neubig, Marc Brockschmidt Miltiadis Allamanis and, and Alexander L. Gaunt. 2018. Learning to Represent Edits. CoRR 1810.13337 (2018).Google ScholarGoogle Scholar

Index Terms

  1. Getafix: learning to fix bugs automatically

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the ACM on Programming Languages
      Proceedings of the ACM on Programming Languages  Volume 3, Issue OOPSLA
      October 2019
      2077 pages
      EISSN:2475-1421
      DOI:10.1145/3366395
      Issue’s Table of Contents

      Copyright © 2019 Owner/Author

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 October 2019
      Published in pacmpl Volume 3, Issue OOPSLA

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!