skip to main content
research-article
Open Access

Learning user friendly type-error messages

Published:12 October 2017Publication History
Skip Abstract Section

Abstract

Type inference is convenient by allowing programmers to elide type annotations, but this comes at the cost of often generating very confusing and opaque type error messages that are of little help to fix type errors. Though there have been many successful attempts at making type error messages better in the past thirty years, many classes of errors are still difficult to fix. In particular, current approaches still generate imprecise and uninformative error messages for type errors arising from errors in grouping constructs like parentheses and brackets. Worse, a recent study shows that these errors usually take more than 10 steps to fix and occur quite frequently (around 45% to 60% of all type errors) in programs written by students learning functional programming. We call this class of errors, nonstructural errors.

We solve this problem by developing Learnskell, a type error debugger that uses machine learning to help diagnose and deliver high quality error messages, for programs that contain nonstructural errors. While previous approaches usually report type errors on typing constraints or on the type level, Learnskell generates suggestions on the expression level. We have performed an evaluation on more than 1,500 type errors, and the result shows that Learnskell is quite precise. It can correctly capture 86% of all nonstructural errors and locate the error cause with a precision of 63%/87% with the first 1/3 messages, respectively. This is several times more than the precision of state-of-the-art compilers and debuggers. We have also studied the performance of Learnskell and found out that it scales to large programs.

References

  1. Miltiadis Allamanis and Charles Sutton. 2013. Mining Source Code Repositories at Massive Scale Using Language Modeling. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR ’13). IEEE Press, Piscataway, NJ, USA, Error Class Precision Recall F 1 Base SMO SMO-CV Base SMO SMO-CV Base SMO SMO-CV Delete Brackets 0.769 0.773 0.807 0.671 0.731 0.819 0.800 0.740 0.804 Merge 0.858 0.824 0.853 0.925 0.944 0.906 0.887 0.878 0.873 Pull 0.875 0.825 0.930 0.510 0.570 0.625 0.623 0.662 0.715 Add Brackets 0.795 0.778 0.797 0.853 0.853 0.853 0.814 0.800 0.816 Has $ 0.750 0.723 0.867 0.600 0.850 0.800 0.650 0.741 0.813 Structural 0.847 0.895 0.883 0.868 0.850 0.851 0.856 0.871 0.866 Google ScholarGoogle ScholarCross RefCross Ref
  2. Fig. 12. A comparison of oversampling techniques. We use the short names SMO and SMO-CV for SMOTEGoogle ScholarGoogle Scholar
  3. Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2016. DeepCoder: Learning to Write Programs. CoRR abs/1611.01989 (2016). http://arxiv.org/abs/1611.01989Google ScholarGoogle Scholar
  4. Pavol Bielik, Veselin Raychev, and Martin Vechev. 2016a. PHOG: Probabilistic Model for Code. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 (ICML’16) . JMLR.org, 2933–2942. http://dl.acm.org/citation.cfm?id=3045390.3045699Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Pavol Bielik, Veselin Raychev, and Martin T. Vechev. 2016b. Learning a Static Analyzer from Data. CoRR abs/1611.01752 (2016). http://arxiv.org/abs/1611.01752Google ScholarGoogle Scholar
  6. Christopher Chambers, Sheng Chen, Duc Le, and Christopher Scaffidi. 2012. The function, and dysfunction, of information sources in learning functional programming. Journal of Computing Sciences in Colleges 28, 1 (2012), 220–226.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Int. Res. 16, 1 (June 2002), 321–357. http://dl.acm.org/citation.cfm?id=1622407.1622416 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Sheng Chen and Martin Erwig. 2014a. Counter-factual Typing for Debugging Type Errors. In Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’14) . ACM, New York, NY, USA, 583–594. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Chen and M. Erwig. 2014b. Guided Type Debugging. In Int. Symp. on Functional and Logic Programming (LNCS 8475). 35–51.Google ScholarGoogle Scholar
  10. Sheng Chen, Martin Erwig, and Karl Smeltzer. 2017. Exploiting diversity in type checkers for better error messages. Journal of Visual Languages & Computing 39 (2017), 10 – 21. DOI: Google ScholarGoogle ScholarCross RefCross Ref
  11. Olaf Chitil. 2001. Compositional Explanation of Types and Algorithmic Debugging of Type Errors. In ACM Int. Conf. on Functional Programming . 193–204.Google ScholarGoogle Scholar
  12. Robert H Creecy, Brij M Masand, Stephen J Smith, and David L Waltz. 1992. Trading MIPS and memory for knowledge engineering. Commun. ACM 35, 8 (1992), 48–64.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Pedro Domingos. 2000. A unified bias-variance decomposition. In Proceedings of 17th International Conference on Machine Learning. Stanford CA Morgan Kaufmann . 231–238.Google ScholarGoogle Scholar
  14. Dominic Duggan and Frederick Bent. 1995. Explaining Type Inference. In Science of Computer Programming. 37–83.Google ScholarGoogle Scholar
  15. Charles Elkan. 2001. The foundations of cost-sensitive learning. In International joint conference on artificial intelligence, Vol. 17. Lawrence Erlbaum Associates Ltd, 973–978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kevin Ellis, Armando Solar-Lezama, and Joshua B. Tenenbaum. 2015. Unsupervised Learning by Program Synthesis. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS’15) . MIT Press, Cambridge, MA, USA, 973–981. http://dl.acm.org/citation.cfm?id=2969239.2969348 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hyunjun Eo, Oukseh Lee, and Kwangkeun Yi. 2004. Proofs of a set of hybrid let-polymorphic type inference algorithms. New Generation Computing 22, 1 (2004), 1–36. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Stuart Geman, Elie Bienenstock, and René Doursat. 1992. Neural networks and the bias/variance dilemma. Neural computation 4, 1 (1992), 1–58. Google ScholarGoogle ScholarCross RefCross Ref
  19. Christian Haack and J. B. Wells. 2003. Type error slicing in implicitly typed higher-order languages. In European Symposium on Programming . 284–301. Google ScholarGoogle ScholarCross RefCross Ref
  20. Jurriaan Hage and Bastiaan Heeren. 2007. Heuristics for Type Error Discovery and Recovery. In Implementation and Application of Functional Languages . 199–216.Google ScholarGoogle Scholar
  21. Jurriaan Hage and Peter Van Keeken. 2006. Mining for Helium. Technical report UU-CS 2006-047 (2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jurriaan Hage and Peter van Keeken. 2009. Neon: A Library for Language Usage Analysis. In Software Language Engineering. Lecture Notes in Computer Science, Vol. 5452. 35–53.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Stefan Harmeling. 2000. Solving Satisfiability Problems with Genetic Algorithms. (2000).Google ScholarGoogle Scholar
  24. Haibo He and Edwardo A Garcia. 2009. Learning from imbalanced data. IEEE Transactions on knowledge and data engineering 21, 9 (2009), 1263–1284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Bastiaan Heeren, Daan Leijen, and Arjan van IJzendoorn. 2003. Helium, for learning Haskell. In Proceedings of the 2003 ACM SIGPLAN workshop on Haskell (Haskell ’03) . ACM, New York, NY, USA, 62–71. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Bastiaan J. Heeren. 2005. Top Quality Type Error Messages. Ph.D. Dissertation. Universiteit Utrecht, The Netherlands. http://www.cs.uu.nl/people/bastiaan/phdthesis Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Stefan Heule, Eric Schkufza, Rahul Sharma, and Alex Aiken. 2016. Stratified Synthesis: Automatically Learning the x86-64 Instruction Set. SIGPLAN Not. 51, 6 (June 2016), 237–250. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Taeho Jo and Nathalie Japkowicz. 2004. Class imbalances versus small disjuncts. ACM Sigkdd Explorations Newsletter 6, 1 (2004), 40–49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Gregory F. Johnson and Janet A. Walz. 1986. A maximum-flow approach to anomaly isolation in unification-based incremental type inference. In ACM Symp. on Principles of Programming Languages. 44–57. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Michael J. Kearns and Umesh V. Vazirani. 1994. An Introduction to Computational Learning Theory. MIT Press, Cambridge, MA, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ron Kohavi and others. 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai , Vol. 14. Stanford, CA, 1137–1145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Oukseh Lee and Kwangkeun Yi. 1998. Proofs about a folklore let-polymorphic type inference algorithm. ACM Trans. on Programming Languages and Systems 20, 4 (July 1998), 707–723. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Oukseh Lee and Kwangkeun Yi. 2000. A Generalized Let-Polymorphic Type Inference Algorithm. Technical Report. Technical Memorandum ROPAS-2000-5, Research on Program Analysis System, Korea Advanced Institute of Science and Technology.Google ScholarGoogle Scholar
  34. B. Lerner, M. Flower, Dan Grossman, and Craig Chambers. 2007. Searching for type-error messages. In ACM Int. Conf. on Programming Language Design and Implementation . 425–434. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Xu-Ying Liu, Jianxin Wu, and Zhi-Hua Zhou. 2009. Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39, 2 (2009), 539–550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Calvin Loncaric, Satish Chandra, Cole Schlesinger, and Manu Sridharan. 2016. A Practical Framework for Type Inference Error Explanation. In OOPSLA. 781–799.Google ScholarGoogle Scholar
  37. Bruce McAdam. 2002a. Trends in Functional Programming. Intellect Books, Exeter, UK, UK, Chapter How to Repair Type Errors Automatically, 87–98. http://dl.acm.org/citation.cfm?id=644403.644412Google ScholarGoogle Scholar
  38. Bruce J McAdam. 2002b. Reporting Type Errors in Functional Programs. Ph.D. Dissertation. Larboratory for Foundations of Computer Science, The University of Edinburgh.Google ScholarGoogle Scholar
  39. Takao Mohri and Hidehiko Tanaka. 1994. An Optimal Weighting Criterion of Case Indexing for Both Numeric and Symbolic Attributes. In In D. W. Aha (Ed.), Case-Based Reasoning: Papers from the 1994 Workshop, Technical Report WS-94-01. Menlo Park, CA: AIII . AAAI Press, 123–127.Google ScholarGoogle Scholar
  40. Ankur Moitra. 2014. Algorithmic Aspects of Machine Learning. (2014).Google ScholarGoogle Scholar
  41. Matthias Neubauer and Peter Thiemann. 2003. Discriminative sum types locate the source of type errors. In ACM Int. Conf. on Functional Programming . 15–26. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Zvonimir Pavlinovic, Tim King, and Thomas Wies. 2014. Finding Minimum Type Error Sources. In OOPSLA. 525–542. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Zvonimir Pavlinovic, Tim King, and Thomas Wies. 2015. Practical SMT-based Type Error Localization. In ICFP. 412–423.Google ScholarGoogle Scholar
  44. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Simon Peyton Jones. 2003. Haskell 98 language and libraries: the revised report. Cambridge University Press.Google ScholarGoogle Scholar
  46. Claude E Shannon. 2001. A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review 5, 1 (2001), 3–55.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. A. Sheneamer and J. Kalita. 2016. Semantic Clone Detection Using Machine Learning. In 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA) . 1024–1028. DOI: Google ScholarGoogle ScholarCross RefCross Ref
  48. Peter J. Stuckey, Martin Sulzmann, and Jeremy Wazny. 2003. Interactive type debugging in Haskell. In ACM SIGPLAN Workshop on Haskell . 72–83. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Peter J. Stuckey, Martin Sulzmann, and Jeremy Wazny. 2004. Improving type error diagnosis. In ACM SIGPLAN Workshop on Haskell . 80–91. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Kai Ming Ting. 2002. An instance-weighting method to induce cost-sensitive trees. IEEE Transactions on Knowledge and Data Engineering 14, 3 (2002), 659–665. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Ville Tirronen, SAMUEL UUSI-MÄKELÄ, and VILLE ISOMÖTTÖNEN. 2015. Understanding beginners’ mistakes with Haskell. Journal of Functional Programming 25 (2015), e11. Google ScholarGoogle ScholarCross RefCross Ref
  52. Mitchell Wand. 1986. Finding the source of type errors. In ACM Symp. on Principles of Programming Languages. 38–43. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. BX Wang and Nathalie Japkowicz. 2004. Imbalanced data set learning with synthetic samples. In Proc. IRIS Machine Learning Workshop , Vol. 19.Google ScholarGoogle Scholar
  54. Jeremy Richard Wazny. 2006. Type inference and type error diagnosis for Hindley/Milner with extensions. Ph.D. Dissertation. The University of Melbourne.Google ScholarGoogle Scholar
  55. Baijun Wu and Sheng Chen. 2017. How Type Errors Were Fixed and What Students Did? Proc. ACM Program. Lang. 1, OOPSLA, Article 105 (2017), 1 – 27.Google ScholarGoogle Scholar
  56. Jun Yang. 2000. Explaining Type Errors by Finding the Source of a Type Conflict. In Trends in Functional Programming. Intellect Books, 58–66.Google ScholarGoogle Scholar
  57. Danfeng Zhang, Andrew C. Myers, Dimitrios Vytiniotis, and Simon Peyton-Jones. 2015. Diagnosing Type Errors with Class. In ACM SIGPLAN Conference on Programming Language Design and Implementation. 12–21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. He Zhu, Gustavo Petri, and Suresh Jagannathan. 2016. Automatically Learning Shape Specifications. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16) . ACM, New York, NY, USA, 491–507. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Learning user friendly type-error messages

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!