skip to main content

Seq2Parse: neurosymbolic parse error repair

Published:31 October 2022Publication History
Skip Abstract Section

Abstract

We present Seq2Parse, a language-agnostic neurosymbolic approach to automatically repairing parse errors. Seq2Parse is based on the insight that Symbolic Error Correcting (EC) Parsers can, in principle, synthesize repairs, but, in practice, are overwhelmed by the many error-correction rules that are not relevant to the particular program that requires repair. In contrast, Neural approaches are fooled by the large space of possible sequence level edits, but can precisely pinpoint the set of EC-rules that are relevant to a particular program. We show how to combine their complementary strengths by using neural methods to train a sequence classifier that predicts the small set of relevant EC-rules for an ill-parsed program, after which, the symbolic EC-parsing algorithm can make short work of generating useful repairs. We train and evaluate our method on a dataset of 1,100,000 Python programs, and show that Seq2Parse is accurate and efficient: it can parse 94% of our tests within 2.1 seconds, while generating the exact user fix in 1 out 3 of the cases; and useful: humans perceive both Seq2Parse-generated error locations and repairs to be almost as good as human-generated ones in a statistically-significant manner.

References

  1. Alireza Ahadi, Raymond Lister, Shahil Lal, and Arto Hellas. 2018. Learning Programming, Syntax Errors and Institution-Specific Factors. In Proceedings of the 20th Australasian Computing Education Conference (ACE ’18). Association for Computing Machinery, New York, NY, USA. 90–96. isbn:9781450363402 https://doi.org/10.1145/3160489.3160490 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Toufique Ahmed, Premkumar Devanbu, and Vincent J Hellendoorn. 2021. Learning lenient parsing & typing via indirect supervision. Empirical Software Engineering, 26, 2 (2021), mar, https://doi.org/10.1007/s10664-021-09942-y Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. V. Aho and S. C. Johnson. 1974. LR Parsing. ACM Comput. Surv., 6, 2 (1974), jun, 99–124. issn:0360-0300 https://doi.org/10.1145/356628.356629 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Alfred V. Aho and Thomas G. Peterson. 1972. A Minimum Distance Error-Correcting Parser for Context-Free Languages. SIAM J. Comput., 1 (1972), 305–312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. CoRR, abs/1409.0473 (2015). Google ScholarGoogle Scholar
  6. Sahil Bhatia and Rishabh Singh. 2016. Automated Correction for Syntax Errors in Programming Assignments using Recurrent Neural Networks. https://doi.org/10.48550/ARXIV.1603.06129 Google ScholarGoogle Scholar
  7. Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg. 209–210. isbn:0387310738 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. https://doi.org/10.48550/ARXIV.2005.14165 Google ScholarGoogle Scholar
  9. Michael G. Burke and Gerald A. Fisher. 1987. A Practical Method for LR and LL Syntactic Error Diagnosis and Recovery. ACM Trans. Program. Lang. Syst., 9, 2 (1987), mar, 164–197. issn:0164-0925 https://doi.org/10.1145/22719.22720 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Nigel P Chapman. 1987. LR Parsing: Theory and Practice. Cambridge University Press, New York, NY, USA. isbn:0-521-30413-X Google ScholarGoogle Scholar
  11. Michael Collins. 2013. Probabilistic Context-Free Grammars (PCFGs). Lecture Notes, https://u.cs.biu.ac.il/~89-680/collins-pcfgs.pdf Google ScholarGoogle Scholar
  12. Rafael Corchuelo, José A. Pérez, Antonio Ruiz, and Miguel Toro. 2002. Repairing Syntax Errors in LR Parsers. ACM Trans. Program. Lang. Syst., 24, 6 (2002), nov, 698–710. issn:0164-0925 https://doi.org/10.1145/586088.586092 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Benjamin Cosman, Madeline Endres, Georgios Sakkas, Leon Medvinsky, Yao-Yuan Yang, Ranjit Jhala, Kamalika Chaudhuri, and Westley Weimer. 2020. PABLO: Helping Novices Debug Python Code Through Data-Driven Fault Localization. Association for Computing Machinery, New York, NY, USA. 1047–1053. isbn:9781450367936 https://doi.org/10.1145/3328778.3366860 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Paul Denny, Andrew Luxton-Reilly, and Ewan Tempero. 2012. All Syntax Errors Are Not Equal. In Proceedings of the 17th ACM Annual Conference on Innovation and Technology in Computer Science Education (ITiCSE ’12). Association for Computing Machinery, New York, NY, USA. 75–80. isbn:9781450312462 https://doi.org/10.1145/2325296.2325318 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jay Earley. 1970. An Efficient Context-Free Parsing Algorithm. Commun. ACM, 13, 2 (1970), Feb., 94–102. issn:0001-0782 https://doi.org/10.1145/362007.362035 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Madeline Endres, Georgios Sakkas, Benjamin Cosman, Ranjit Jhala, and Westley Weimer. 2019. InFix: Automatically Repairing Novice Program Inputs. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE ’19). IEEE Press, 399–410. isbn:9781728125084 https://doi.org/10.1109/ASE.2019.00045 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press, 180–184. http://www.deeplearningbook.org Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sumit Gulwani, Ivan Radicek, and Florian Zuleger. 2018. Automated clustering and program repair for introductory programming assignments. Programming Language Design and Implementation, isbn:9781450356985 https://doi.org/10.1145/3192366.3192387 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Philip J. Guo. 2013. Online Python Tutor: Embeddable Web-Based Program Visualization for Cs Education. In Proceeding of the 44th ACM Technical Symposium on Computer Science Education (SIGCSE ’13). Association for Computing Machinery, New York, NY, USA. 579–584. isbn:9781450318686 https://doi.org/10.1145/2445196.2445368 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. DeepFix: Fixing Common C Language Errors by Deep Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 31, 1 (2017), Feb., https://ojs.aaai.org/index.php/AAAI/article/view/10742 Google ScholarGoogle ScholarCross RefCross Ref
  21. Momchil Hardalov, Ivan Koychev, and Preslav Nakov. 2018. Towards Automated Customer Support. In International Conference on Artificial Intelligence: Methodology, Systems, and Applications. 48–59. https://doi.org/10.1007/978-3-319-99344-7_5 Google ScholarGoogle ScholarCross RefCross Ref
  22. Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer New York. isbn:9780387848570, 9780387848587 https://doi.org/10.1007/978-0-387-84858-7 Google ScholarGoogle ScholarCross RefCross Ref
  23. Frederick Jelinek, John D Lafferty, and Robert L Mercer. 1992. Basic Methods of Probabilistic Context Free Grammars. In Speech Recognition and Understanding. Springer, 345–360. https://doi.org/10.1007/978-3-642-76626-8_35 Google ScholarGoogle ScholarCross RefCross Ref
  24. Nan Jiang, Thibaud Lutellier, and Lin Tan. 2021. CURE: Code-Aware Neural Machine Translation for Automatic Program Repair. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE. Google ScholarGoogle Scholar
  25. Yoon Kim, Carl Denton, Luong Hoang, and Alexander M. Rush. 2017. Structured Attention Networks. CoRR, abs/1702.00887 (2017), arXiv:1702.00887. arxiv:1702.00887 Google ScholarGoogle Scholar
  26. Diederik P Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. 22 Dec., arxiv:1412.6980. Google ScholarGoogle Scholar
  27. Donald E. Knuth. 1965. On the translation of languages from left to right. Information and Control, 8, 6 (1965), 607–639. issn:0019-9958 https://doi.org/10.1016/S0019-9958(65)90426-2 Google ScholarGoogle ScholarCross RefCross Ref
  28. Pavneet Singh Kochhar, Xin Xia, David Lo, and Shanping Li. 2016. Practitioners’ Expectations on Automated Fault Localization. In International Symposium on Software Testing and Analysis. ACM, 165–176. isbn:9781450343909 https://doi.org/10.1145/2931037.2931051 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Sarah K. Kummerfeld and Judy Kay. 2003. The Neglected Battle Fields of Syntax Errors. In Proceedings of the Fifth Australasian Conference on Computing Education - Volume 20 (ACE ’03). Australian Computer Society, Inc., AUS. 105–111. isbn:0909925984 Google ScholarGoogle Scholar
  30. Thibaud Lutellier, Hung Viet Pham, Lawrence Pang, Yitong Li, Moshi Wei, and Lin Tan. 2020. CoCoNuT: Combining Context-Aware Neural Translation Models Using Ensemble for Program Repair. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2020). Association for Computing Machinery, New York, NY, USA. 101–114. isbn:9781450380089 https://doi.org/10.1145/3395363.3397369 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Matias Martinez, Laurence Duchien, and Martin Monperrus. 2013. Automatically extracting instances of code change patterns with AST analysis. In 2013 IEEE international conference on software maintenance. 388–391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Philippe McLean and R. Nigel Horspool. 1996. A Faster Earley Parser. In CC. Google ScholarGoogle Scholar
  33. Eugene W Myers. 1986. An O(ND) difference algorithm and its variations. Algorithmica, 1, 1-4 (1986), 251–266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In International Conference on Machine Learning. 807–814. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Michael A Nielsen. 2015. Neural Networks and Deep Learning. Determination Press. Google ScholarGoogle Scholar
  36. Chris Parnin and Alessandro Orso. 2011. Are Automated Debugging Techniques Actually Helping Programmers? In International Symposium on Software Testing and Analysis. ACM, 199–209. isbn:9781450305624 https://doi.org/10.1145/2001420.2001445 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Yewen Pu, Karthik Narasimhan, Armando Solar-Lezama, and Regina Barzilay. 2016. sk_p: a neural program corrector for MOOCs. https://doi.org/10.48550/ARXIV.1607.02902 Google ScholarGoogle Scholar
  38. Yizhou Qian and James Lehman. 2017. Students’ Misconceptions and Other Difficulties in Introductory Programming: A Literature Review. ACM Trans. Comput. Educ., 18, 1 (2017), Article 1, oct, 24 pages. https://doi.org/10.1145/3077618 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Kia Rahmani, Mohammad Raza, Sumit Gulwani, Vu Le, Daniel Morris, Arjun Radhakrishna, Gustavo Soares, and Ashish Tiwari. 2021. Multi-Modal Program Inference: A Marriage of Pre-Trained Language Models and Component-Based Synthesis. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 158, oct, 29 pages. https://doi.org/10.1145/3485535 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Sanguthevar Rajasekaran and Marius Nicolae. 2014. An error correcting parser for context free grammars that takes less than cubic time. https://doi.org/10.48550/ARXIV.1406.3405 Google ScholarGoogle Scholar
  41. David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning Representations by Back-propagating Errors. Nature, 323, 6088 (1986), 533–536. https://doi.org/10.1038/323533a0 Google ScholarGoogle ScholarCross RefCross Ref
  42. Georgios Sakkas, Madeline Endres, Benjamin Cosman, Westley Weimer, and Ranjit Jhala. 2020. Type Error Feedback via Analytic Program Repair. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 16–30. isbn:9781450376136 https://doi.org/10.1145/3385412.3386005 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. George Sakkas, Ranjit Jhala, Westley Weimer, and Madeline Endres. 2022. gsakkas/seq2parse: OOPSLA2 2022 code release. https://doi.org/10.5281/zenodo.7080357 Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural Networks, 61 (2015), Jan, 85–117. issn:0893-6080 https://doi.org/10.1016/j.neunet.2014.09.003 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. arxiv:1409.3215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Richard A. Thompson. 1976. Language Correction Using Probabilistic Grammars. IEEE Trans. Comput., C-25, 3 (1976), 275–286. https://doi.org/10.1109/TC.1976.5009254 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. P. van der Spek, N. Plat, and C. Pronk. 2005. Syntax Error Repair for a Java-Based Parser Generator. SIGPLAN Not., 40, 4 (2005), April, 47–50. issn:0362-1340 https://doi.org/10.1145/1064165.1064173 Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). 30, Curran Associates, Inc., 5998–6008. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf Google ScholarGoogle Scholar
  49. Gust Verbruggen, Vu Le, and Sumit Gulwani. 2021. Semantic Programming by Example with Pre-Trained Models. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 100, oct, 25 pages. https://doi.org/10.1145/3485477 Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Ke Wang, Rishabh Singh, and Zhendong Su. 2018. Search, Align, and Repair: Data-driven Feedback Generation for Introductory Programming Exercises. In Programming Language Design and Implementation. 481–495. isbn:978-1-4503-5698-5 https://doi.org/10.1145/3192366.3192384 Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. P.J. Werbos. 1990. Backpropagation through time: what it does and how to do it. Proc. IEEE, 78, 10 (1990), 1550–1560. https://doi.org/10.1109/5.58337 Google ScholarGoogle ScholarCross RefCross Ref
  52. Liwei Wu, Fei Li, Youhua Wu, and Tao Zheng. 2020. GGF: A Graph-Based Method for Programming Language Syntax Error Correction. Association for Computing Machinery, New York, NY, USA. 139–148. isbn:9781450379588 https://doi.org/10.1145/3387904.3389252 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Seq2Parse: neurosymbolic parse error repair

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Article Metrics

              • Downloads (Last 12 months)219
              • Downloads (Last 6 weeks)31

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!