Abstract
Localizing type errors is challenging in languages with global type inference, as the type checker must make assumptions about what the programmer intended to do. We introduce Nate, a data-driven approach to error localization based on supervised learning. Nate analyzes a large corpus of training data -- pairs of ill-typed programs and their "fixed" versions -- to automatically learn a model of where the error is most likely to be found. Given a new ill-typed program, Nate executes the model to generate a list of potential blame assignments ranked by likelihood. We evaluate Nate by comparing its precision to the state of the art on a set of over 5,000 ill-typed OCaml programs drawn from two instances of an introductory programming course. We show that when the top-ranked blame assignment is considered, Nate's data-driven model is able to correctly predict the exact sub-expression that should be changed 72% of the time, 28 points higher than OCaml and 16 points higher than the state-of-the-art SHErrLoc tool. Furthermore, Nate's accuracy surpasses 85% when we consider the top two locations and reaches 91% if we consider the top three.
Supplemental Material
Available for Download
- Rui Abreu, Peter Zoeteweij, and Arjan J C van Gemund. 2006. An Evaluation of Similarity Coefficients for Software Fault Localization. In 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC ’06). 39–46. Google Scholar
Digital Library
- Rui Abreu, Peter Zoeteweij, and Arjan J C van Gemund. 2007. On the Accuracy of Spectrum-based Fault Localization. In Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007). 89–98. Google Scholar
Cross Ref
- Mike Beaven and Ryan Stansifer. 1993. Explaining Type Errors in Polymorphic Languages. ACM Lett. Program. Lang. Syst. 2, 1-4 (March 1993), 17–30. Google Scholar
Digital Library
- Karen L Bernstein and Eugene W Stark. 1995. Debugging Type Errors. Technical Report. State University of New York at Stony Brook.Google Scholar
- Pavol Bielik, Veselin Raychev, and Martin Vechev. 2016. PHOG: Probabilistic Model for Code. In Proceedings of the 33rd International Conference on Machine Learning (ICML ’16).Google Scholar
- Leo Breiman. 2001. Random Forests. Mach. Learn. 45, 1 (1 Oct. 2001), 5–32. Google Scholar
Digital Library
- Leo Breiman, Jerome Friedman, Charles J Stone, and Richard A Olshen. 1984. Classification and regression trees. CRC press.Google Scholar
- M Y Chen, E Kiciman, E Fratkin, A Fox, and E Brewer. 2002. Pinpoint: problem determination in large, dynamic Internet services. In Proceedings International Conference on Dependable Systems and Networks. 595–604. Google Scholar
Cross Ref
- Sheng Chen and Martin Erwig. 2014a. Counter-factual Typing for Debugging Type Errors. In Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’14). ACM, New York, NY, USA, 583–594. Google Scholar
Digital Library
- Sheng Chen and Martin Erwig. 2014b. Guided Type Debugging. In Functional and Logic Programming, Michael Codish and Eijiro Sumii (Eds.). Springer International Publishing, 35–51. Google Scholar
Cross Ref
- Olaf Chitil. 2001. Compositional Explanation of Types and Algorithmic Debugging of Type Errors. In Proceedings of the Sixth ACM SIGPLAN International Conference on Functional Programming (ICFP ’01). ACM, New York, NY, USA, 193–204. Google Scholar
Digital Library
- David Raymond Christiansen. 2014. Reflect on your mistakes! Lightweight domain-specific error messages. In Preproceedings of the 15th Symposium on Trends in Functional Programming.Google Scholar
- P. Cousot and R. Cousot. 1977. Abstract interpretation: a unified lattice model for the static analysis of programs. In POPL 77. ACM, 238–252. Google Scholar
Digital Library
- Dominic Duggan and Frederick Bent. 1996. Explaining type inference. Science of Computer Programming 27, 1 (July 1996), 37–83. Google Scholar
Digital Library
- Cormac Flanagan, Matthew Flatt, Shriram Krishnamurthi, Stephanie Weirich, and Matthias Felleisen. 1996. Catching bugs in the web of program invariants. In Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation (PLDI ’96), Vol. 31. ACM, 23–32. Google Scholar
Digital Library
- Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychol. Bull. 76, 5 (Nov. 1971), 378. Google Scholar
Cross Ref
- Mark Gabel and Zhendong Su. 2010. A Study of the Uniqueness of Source Code. In Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE ’10). ACM, New York, NY, USA, 147–156. Google Scholar
Digital Library
- Holger Gast. 2004. Explaining ML Type Errors by Data Flows. In Implementation and Application of Functional Languages. Springer Berlin Heidelberg, 72–89. Google Scholar
Digital Library
- Christian Haack and J B Wells. 2003. Type Error Slicing in Implicitly Typed Higher-Order Languages. In Programming Languages and Systems. Springer Berlin Heidelberg, 284–301. Google Scholar
Cross Ref
- Jurriaan Hage and Bastiaan Heeren. 2006. Heuristics for Type Error Discovery and Recovery. In Implementation and Application of Functional Languages. Springer Berlin Heidelberg, 199–216. Google Scholar
Cross Ref
- Alon Halevy, Peter Norvig, and Fernando Pereira. 2009. The unreasonable effectiveness of data. IEEE Intelligent Systems 24, 2 (2009), 8–12. Google Scholar
Digital Library
- Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer New York. Google Scholar
Cross Ref
- Bastiaan Heeren, Jurriaan Hage, and S Doaitse Swierstra. 2003. Scripting the type inference process. In Proceedings of the eighth ACM SIGPLAN international conference on Functional programming, Vol. 38. ACM, 3–13. Google Scholar
Digital Library
- Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012a. On the Naturalness of Software. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12). IEEE Press, Piscataway, NJ, USA, 837–847. http://dl.acm.org/citation.cfm?id=2337223.2337322Google Scholar
Digital Library
- Abram Hindle, Earl T Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012b. On the Naturalness of Software. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12). IEEE Press, Piscataway, NJ, USA, 837–847. Google Scholar
Cross Ref
- James A Jones, Mary Jean Harrold, and John Stasko. 2002. Visualization of Test Information to Assist Fault Localization. In Proceedings of the 24th International Conference on Software Engineering (ICSE ’02). ACM, New York, NY, USA, 467–477. Google Scholar
Digital Library
- Stef Joosten, Klaas Van Den Berg, and Gerrit Van Der Hoeven. 1993. Teaching functional programming to first-year students. J. Funct. Programming 3, 01 (Jan. 1993), 49–65. Google Scholar
Cross Ref
- Manu Jose and Rupak Majumdar. 2011. Cause Clue Clauses: Error Localization Using Maximum Satisfiability. SIGPLAN Not. 46, 6 (June 2011), 437–446. Google Scholar
Digital Library
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. (22 Dec. 2014). arXiv: 1412.6980Google Scholar
- Pavneet Singh Kochhar, Xin Xia, David Lo, and Shanping Li. 2016. Practitioners’ Expectations on Automated Fault Localization. In Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA 2016). ACM, New York, NY, USA, 165–176. Google Scholar
Digital Library
- S B Kotsiantis. 2007. Supervised Machine Learning: A Review of Classification Techniques. Informatica 31, 3 (2007), 249–268.Google Scholar
- Ted Kremenek and Dawson Engler. 2003. Z-Ranking: Using Statistical Analysis to Counter the Impact of Static Analysis Approximations. In Static Analysis, Radhia Cousot (Ed.). Lecture Notes in Computer Science, Vol. 2694. Springer Berlin Heidelberg, Berlin, Heidelberg, 295–315. Google Scholar
Cross Ref
- K Krippendorff. 2012. Content Analysis: An Introduction to Its Methodology. SAGE Publications.Google Scholar
- J R Landis and G G Koch. 1977. The measurement of observer agreement for categorical data. Biometrics 33, 1 (March 1977), 159–174. Google Scholar
Cross Ref
- Oukseh Lee and Kwangkeun Yi. 1998. Proofs About a Folklore Let-polymorphic Type Inference Algorithm. ACM Trans. Program. Lang. Syst. 20, 4 (July 1998), 707–723. Google Scholar
Digital Library
- Eelco Lempsink. 2009. Generic type-safe diff and patch for families of datatypes. Master’s thesis. Universiteit Utrecht.Google Scholar
- Benjamin S Lerner, Matthew Flower, Dan Grossman, and Craig Chambers. 2007. Searching for Type-error Messages. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’07). ACM, New York, NY, USA, 425–434. Google Scholar
Digital Library
- Calvin Loncaric, Satish Chandra, Cole Schlesinger, and Manu Sridharan. 2016. A practical framework for type inference error explanation. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM, 781–799. Google Scholar
Digital Library
- H B Mann and D R Whitney. 1947. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. Ann. Math. Stat. 18, 1 (March 1947), 50–60. Google Scholar
Cross Ref
- Bruce J McAdam. 1998. On the Unification of Substitutions in Type Inference. In Implementation of Functional Languages (Lecture Notes in Computer Science), Kevin Hammond, Tony Davie, and Chris Clack (Eds.). Springer Berlin Heidelberg, 137–152. Google Scholar
Cross Ref
- Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10). 807–814.Google Scholar
Digital Library
- Greg Nelson and Derek C Oppen. 1979. Simplification by Cooperating Decision Procedures. ACM Trans. Program. Lang. Syst. 1, 2 (Oct. 1979), 245–257. Google Scholar
Digital Library
- Michael A Nielsen. 2015. Neural Networks and Deep Learning. Determination Press.Google Scholar
- Zvonimir Pavlinovic, Tim King, and Thomas Wies. 2014. Finding Minimum Type Error Sources. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA ’14). ACM, New York, NY, USA, 525–542. Google Scholar
Digital Library
- John Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann.Google Scholar
- Vincent Rahli, J B Wells, and Fairouz Kamareddine. 2010. A constraint system for a SML type error slicer. Technical Report HW-MACS-TR-0079. Herriot Watt University.Google Scholar
- Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting Program Properties from “Big Code”. In Proceedings of the 42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’15). ACM, New York, NY, USA, 111–124. Google Scholar
Digital Library
- Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Code Completion with Statistical Language Models. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, New York, NY, USA, 419–428. Google Scholar
Digital Library
- Thomas Schilling. 2011. Constraint-Free Type Error Slicing. In Trends in Functional Programming. Springer Berlin Heidelberg, 1–16. Google Scholar
Digital Library
- Eric L Seidel and Ranjit Jhala. 2017. A Collection of Novice Interactions with the OCaml Top-Level System. (June 2017). Google Scholar
Cross Ref
- Eric L Seidel, Ranjit Jhala, and Westley Weimer. 2016. Dynamic Witnesses for Static Type Errors (or, Ill-typed Programs Usually Go Wrong). In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming (ICFP 2016). ACM, New York, NY, USA, 228–242. Google Scholar
Digital Library
- Eric L. Seidel, Huma Sibghat, Kamalika Chaudhuri, Westley Weimer, and Ranjit Jhala. 2017. Learning to Blame: Localizing Novice Type Errors with Data-Driven Diagnosis. (Aug. 2017). arXiv: 1708.07583Google Scholar
- Alejandro Serrano and Jurriaan Hage. 2016. Type Error Diagnosis for Embedded DSLs by Two-Stage Specialized Type Rules. In Programming Languages and Systems. Springer Berlin Heidelberg, 672–698. Google Scholar
Cross Ref
- F Tip and T B Dinesh. 2001. A Slicing-based Approach for Locating Type Errors. ACM Trans. Softw. Eng. Methodol. 10, 1 (Jan. 2001), 5–55. Google Scholar
Digital Library
- Mitchell Wand. 1986. Finding the Source of Type Errors. In Proceedings of the 13th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL ’86). ACM, New York, NY, USA, 38–43. Google Scholar
Digital Library
- W Eric Wong and Vidroha Debroy. 2009. A survey of software fault localization. Technical Report UTDCS-45-09. University of Texas at Dallas.Google Scholar
- Jun Yang. 1999. Explaining Type Errors by Finding the Source of a Type Conflict. In Selected Papers from the 1st Scottish Functional Programming Workshop (SFP ’99). Intellect Books, Exeter, UK, 59–67.Google Scholar
- Shin Yoo, Mark Harman, and David Clark. 2013. Fault Localization Prioritization: Comparing Information-theoretic and Coverage-based Approaches. ACM Trans. Softw. Eng. Methodol. 22, 3 (July 2013), 19:1–19:29. Google Scholar
Digital Library
- Danfeng Zhang and Andrew C Myers. 2014. Toward General Diagnosis of Static Errors. In Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’14). ACM, New York, NY, USA, 569–581. Google Scholar
Digital Library
Index Terms
Learning to blame: localizing novice type errors with data-driven diagnosis
Recommendations
Discriminative sum types locate the source of type errors
We propose a type system for locating the source of type errors in an applied lambda calculus with ML-style polymorphism. The system is based on discriminative sum types---known from work on soft typing---with annotation subtyping and recursive types. ...
Discriminative sum types locate the source of type errors
ICFP '03: Proceedings of the eighth ACM SIGPLAN international conference on Functional programmingWe propose a type system for locating the source of type errors in an applied lambda calculus with ML-style polymorphism. The system is based on discriminative sum types---known from work on soft typing---with annotation subtyping and recursive types. ...
Explaining type errors in polymorphic languages
Strongly-typed languages present programmers with compile-time feedback about the type correctness of programs. Errors during polymorphic type checking take the form of a unification failure for two types. Finding the source of the type error in the ...






Comments