skip to main content
research-article
Open Access
Artifacts Available
Artifacts Evaluated & Functional

Learning to blame: localizing novice type errors with data-driven diagnosis

Published:12 October 2017Publication History
Skip Abstract Section

Abstract

Localizing type errors is challenging in languages with global type inference, as the type checker must make assumptions about what the programmer intended to do. We introduce Nate, a data-driven approach to error localization based on supervised learning. Nate analyzes a large corpus of training data -- pairs of ill-typed programs and their "fixed" versions -- to automatically learn a model of where the error is most likely to be found. Given a new ill-typed program, Nate executes the model to generate a list of potential blame assignments ranked by likelihood. We evaluate Nate by comparing its precision to the state of the art on a set of over 5,000 ill-typed OCaml programs drawn from two instances of an introductory programming course. We show that when the top-ranked blame assignment is considered, Nate's data-driven model is able to correctly predict the exact sub-expression that should be changed 72% of the time, 28 points higher than OCaml and 16 points higher than the state-of-the-art SHErrLoc tool. Furthermore, Nate's accuracy surpasses 85% when we consider the top two locations and reaches 91% if we consider the top three.

Skip Supplemental Material Section

Supplemental Material

References

  1. Rui Abreu, Peter Zoeteweij, and Arjan J C van Gemund. 2006. An Evaluation of Similarity Coefficients for Software Fault Localization. In 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC ’06). 39–46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Rui Abreu, Peter Zoeteweij, and Arjan J C van Gemund. 2007. On the Accuracy of Spectrum-based Fault Localization. In Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007). 89–98. Google ScholarGoogle ScholarCross RefCross Ref
  3. Mike Beaven and Ryan Stansifer. 1993. Explaining Type Errors in Polymorphic Languages. ACM Lett. Program. Lang. Syst. 2, 1-4 (March 1993), 17–30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Karen L Bernstein and Eugene W Stark. 1995. Debugging Type Errors. Technical Report. State University of New York at Stony Brook.Google ScholarGoogle Scholar
  5. Pavol Bielik, Veselin Raychev, and Martin Vechev. 2016. PHOG: Probabilistic Model for Code. In Proceedings of the 33rd International Conference on Machine Learning (ICML ’16).Google ScholarGoogle Scholar
  6. Leo Breiman. 2001. Random Forests. Mach. Learn. 45, 1 (1 Oct. 2001), 5–32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Leo Breiman, Jerome Friedman, Charles J Stone, and Richard A Olshen. 1984. Classification and regression trees. CRC press.Google ScholarGoogle Scholar
  8. M Y Chen, E Kiciman, E Fratkin, A Fox, and E Brewer. 2002. Pinpoint: problem determination in large, dynamic Internet services. In Proceedings International Conference on Dependable Systems and Networks. 595–604. Google ScholarGoogle ScholarCross RefCross Ref
  9. Sheng Chen and Martin Erwig. 2014a. Counter-factual Typing for Debugging Type Errors. In Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’14). ACM, New York, NY, USA, 583–594. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Sheng Chen and Martin Erwig. 2014b. Guided Type Debugging. In Functional and Logic Programming, Michael Codish and Eijiro Sumii (Eds.). Springer International Publishing, 35–51. Google ScholarGoogle ScholarCross RefCross Ref
  11. Olaf Chitil. 2001. Compositional Explanation of Types and Algorithmic Debugging of Type Errors. In Proceedings of the Sixth ACM SIGPLAN International Conference on Functional Programming (ICFP ’01). ACM, New York, NY, USA, 193–204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. David Raymond Christiansen. 2014. Reflect on your mistakes! Lightweight domain-specific error messages. In Preproceedings of the 15th Symposium on Trends in Functional Programming.Google ScholarGoogle Scholar
  13. P. Cousot and R. Cousot. 1977. Abstract interpretation: a unified lattice model for the static analysis of programs. In POPL 77. ACM, 238–252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Dominic Duggan and Frederick Bent. 1996. Explaining type inference. Science of Computer Programming 27, 1 (July 1996), 37–83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Cormac Flanagan, Matthew Flatt, Shriram Krishnamurthi, Stephanie Weirich, and Matthias Felleisen. 1996. Catching bugs in the web of program invariants. In Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation (PLDI ’96), Vol. 31. ACM, 23–32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychol. Bull. 76, 5 (Nov. 1971), 378. Google ScholarGoogle ScholarCross RefCross Ref
  17. Mark Gabel and Zhendong Su. 2010. A Study of the Uniqueness of Source Code. In Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE ’10). ACM, New York, NY, USA, 147–156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Holger Gast. 2004. Explaining ML Type Errors by Data Flows. In Implementation and Application of Functional Languages. Springer Berlin Heidelberg, 72–89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Christian Haack and J B Wells. 2003. Type Error Slicing in Implicitly Typed Higher-Order Languages. In Programming Languages and Systems. Springer Berlin Heidelberg, 284–301. Google ScholarGoogle ScholarCross RefCross Ref
  20. Jurriaan Hage and Bastiaan Heeren. 2006. Heuristics for Type Error Discovery and Recovery. In Implementation and Application of Functional Languages. Springer Berlin Heidelberg, 199–216. Google ScholarGoogle ScholarCross RefCross Ref
  21. Alon Halevy, Peter Norvig, and Fernando Pereira. 2009. The unreasonable effectiveness of data. IEEE Intelligent Systems 24, 2 (2009), 8–12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer New York. Google ScholarGoogle ScholarCross RefCross Ref
  23. Bastiaan Heeren, Jurriaan Hage, and S Doaitse Swierstra. 2003. Scripting the type inference process. In Proceedings of the eighth ACM SIGPLAN international conference on Functional programming, Vol. 38. ACM, 3–13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012a. On the Naturalness of Software. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12). IEEE Press, Piscataway, NJ, USA, 837–847. http://dl.acm.org/citation.cfm?id=2337223.2337322Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Abram Hindle, Earl T Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012b. On the Naturalness of Software. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12). IEEE Press, Piscataway, NJ, USA, 837–847. Google ScholarGoogle ScholarCross RefCross Ref
  26. James A Jones, Mary Jean Harrold, and John Stasko. 2002. Visualization of Test Information to Assist Fault Localization. In Proceedings of the 24th International Conference on Software Engineering (ICSE ’02). ACM, New York, NY, USA, 467–477. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Stef Joosten, Klaas Van Den Berg, and Gerrit Van Der Hoeven. 1993. Teaching functional programming to first-year students. J. Funct. Programming 3, 01 (Jan. 1993), 49–65. Google ScholarGoogle ScholarCross RefCross Ref
  28. Manu Jose and Rupak Majumdar. 2011. Cause Clue Clauses: Error Localization Using Maximum Satisfiability. SIGPLAN Not. 46, 6 (June 2011), 437–446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Diederik P Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. (22 Dec. 2014). arXiv: 1412.6980Google ScholarGoogle Scholar
  30. Pavneet Singh Kochhar, Xin Xia, David Lo, and Shanping Li. 2016. Practitioners’ Expectations on Automated Fault Localization. In Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA 2016). ACM, New York, NY, USA, 165–176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S B Kotsiantis. 2007. Supervised Machine Learning: A Review of Classification Techniques. Informatica 31, 3 (2007), 249–268.Google ScholarGoogle Scholar
  32. Ted Kremenek and Dawson Engler. 2003. Z-Ranking: Using Statistical Analysis to Counter the Impact of Static Analysis Approximations. In Static Analysis, Radhia Cousot (Ed.). Lecture Notes in Computer Science, Vol. 2694. Springer Berlin Heidelberg, Berlin, Heidelberg, 295–315. Google ScholarGoogle ScholarCross RefCross Ref
  33. K Krippendorff. 2012. Content Analysis: An Introduction to Its Methodology. SAGE Publications.Google ScholarGoogle Scholar
  34. J R Landis and G G Koch. 1977. The measurement of observer agreement for categorical data. Biometrics 33, 1 (March 1977), 159–174. Google ScholarGoogle ScholarCross RefCross Ref
  35. Oukseh Lee and Kwangkeun Yi. 1998. Proofs About a Folklore Let-polymorphic Type Inference Algorithm. ACM Trans. Program. Lang. Syst. 20, 4 (July 1998), 707–723. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Eelco Lempsink. 2009. Generic type-safe diff and patch for families of datatypes. Master’s thesis. Universiteit Utrecht.Google ScholarGoogle Scholar
  37. Benjamin S Lerner, Matthew Flower, Dan Grossman, and Craig Chambers. 2007. Searching for Type-error Messages. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’07). ACM, New York, NY, USA, 425–434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Calvin Loncaric, Satish Chandra, Cole Schlesinger, and Manu Sridharan. 2016. A practical framework for type inference error explanation. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM, 781–799. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. H B Mann and D R Whitney. 1947. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. Ann. Math. Stat. 18, 1 (March 1947), 50–60. Google ScholarGoogle ScholarCross RefCross Ref
  40. Bruce J McAdam. 1998. On the Unification of Substitutions in Type Inference. In Implementation of Functional Languages (Lecture Notes in Computer Science), Kevin Hammond, Tony Davie, and Chris Clack (Eds.). Springer Berlin Heidelberg, 137–152. Google ScholarGoogle ScholarCross RefCross Ref
  41. Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10). 807–814.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Greg Nelson and Derek C Oppen. 1979. Simplification by Cooperating Decision Procedures. ACM Trans. Program. Lang. Syst. 1, 2 (Oct. 1979), 245–257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Michael A Nielsen. 2015. Neural Networks and Deep Learning. Determination Press.Google ScholarGoogle Scholar
  44. Zvonimir Pavlinovic, Tim King, and Thomas Wies. 2014. Finding Minimum Type Error Sources. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA ’14). ACM, New York, NY, USA, 525–542. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. John Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann.Google ScholarGoogle Scholar
  46. Vincent Rahli, J B Wells, and Fairouz Kamareddine. 2010. A constraint system for a SML type error slicer. Technical Report HW-MACS-TR-0079. Herriot Watt University.Google ScholarGoogle Scholar
  47. Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting Program Properties from “Big Code”. In Proceedings of the 42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’15). ACM, New York, NY, USA, 111–124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Code Completion with Statistical Language Models. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, New York, NY, USA, 419–428. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Thomas Schilling. 2011. Constraint-Free Type Error Slicing. In Trends in Functional Programming. Springer Berlin Heidelberg, 1–16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Eric L Seidel and Ranjit Jhala. 2017. A Collection of Novice Interactions with the OCaml Top-Level System. (June 2017). Google ScholarGoogle ScholarCross RefCross Ref
  51. Eric L Seidel, Ranjit Jhala, and Westley Weimer. 2016. Dynamic Witnesses for Static Type Errors (or, Ill-typed Programs Usually Go Wrong). In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming (ICFP 2016). ACM, New York, NY, USA, 228–242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Eric L. Seidel, Huma Sibghat, Kamalika Chaudhuri, Westley Weimer, and Ranjit Jhala. 2017. Learning to Blame: Localizing Novice Type Errors with Data-Driven Diagnosis. (Aug. 2017). arXiv: 1708.07583Google ScholarGoogle Scholar
  53. Alejandro Serrano and Jurriaan Hage. 2016. Type Error Diagnosis for Embedded DSLs by Two-Stage Specialized Type Rules. In Programming Languages and Systems. Springer Berlin Heidelberg, 672–698. Google ScholarGoogle ScholarCross RefCross Ref
  54. F Tip and T B Dinesh. 2001. A Slicing-based Approach for Locating Type Errors. ACM Trans. Softw. Eng. Methodol. 10, 1 (Jan. 2001), 5–55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Mitchell Wand. 1986. Finding the Source of Type Errors. In Proceedings of the 13th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL ’86). ACM, New York, NY, USA, 38–43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. W Eric Wong and Vidroha Debroy. 2009. A survey of software fault localization. Technical Report UTDCS-45-09. University of Texas at Dallas.Google ScholarGoogle Scholar
  57. Jun Yang. 1999. Explaining Type Errors by Finding the Source of a Type Conflict. In Selected Papers from the 1st Scottish Functional Programming Workshop (SFP ’99). Intellect Books, Exeter, UK, 59–67.Google ScholarGoogle Scholar
  58. Shin Yoo, Mark Harman, and David Clark. 2013. Fault Localization Prioritization: Comparing Information-theoretic and Coverage-based Approaches. ACM Trans. Softw. Eng. Methodol. 22, 3 (July 2013), 19:1–19:29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Danfeng Zhang and Andrew C Myers. 2014. Toward General Diagnosis of Static Errors. In Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’14). ACM, New York, NY, USA, 569–581. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Learning to blame: localizing novice type errors with data-driven diagnosis

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!