Abstract
We present Seq2Parse, a language-agnostic neurosymbolic approach to automatically repairing parse errors. Seq2Parse is based on the insight that Symbolic Error Correcting (EC) Parsers can, in principle, synthesize repairs, but, in practice, are overwhelmed by the many error-correction rules that are not relevant to the particular program that requires repair. In contrast, Neural approaches are fooled by the large space of possible sequence level edits, but can precisely pinpoint the set of EC-rules that are relevant to a particular program. We show how to combine their complementary strengths by using neural methods to train a sequence classifier that predicts the small set of relevant EC-rules for an ill-parsed program, after which, the symbolic EC-parsing algorithm can make short work of generating useful repairs. We train and evaluate our method on a dataset of 1,100,000 Python programs, and show that Seq2Parse is accurate and efficient: it can parse 94% of our tests within 2.1 seconds, while generating the exact user fix in 1 out 3 of the cases; and useful: humans perceive both Seq2Parse-generated error locations and repairs to be almost as good as human-generated ones in a statistically-significant manner.
- Alireza Ahadi, Raymond Lister, Shahil Lal, and Arto Hellas. 2018. Learning Programming, Syntax Errors and Institution-Specific Factors. In Proceedings of the 20th Australasian Computing Education Conference (ACE ’18). Association for Computing Machinery, New York, NY, USA. 90–96. isbn:9781450363402 https://doi.org/10.1145/3160489.3160490
Google Scholar
Digital Library
- Toufique Ahmed, Premkumar Devanbu, and Vincent J Hellendoorn. 2021. Learning lenient parsing & typing via indirect supervision. Empirical Software Engineering, 26, 2 (2021), mar, https://doi.org/10.1007/s10664-021-09942-y
Google Scholar
Digital Library
- A. V. Aho and S. C. Johnson. 1974. LR Parsing. ACM Comput. Surv., 6, 2 (1974), jun, 99–124. issn:0360-0300 https://doi.org/10.1145/356628.356629
Google Scholar
Digital Library
- Alfred V. Aho and Thomas G. Peterson. 1972. A Minimum Distance Error-Correcting Parser for Context-Free Languages. SIAM J. Comput., 1 (1972), 305–312.
Google Scholar
Digital Library
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. CoRR, abs/1409.0473 (2015).
Google Scholar
- Sahil Bhatia and Rishabh Singh. 2016. Automated Correction for Syntax Errors in Programming Assignments using Recurrent Neural Networks. https://doi.org/10.48550/ARXIV.1603.06129
Google Scholar
- Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg. 209–210. isbn:0387310738
Google Scholar
Digital Library
- Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. https://doi.org/10.48550/ARXIV.2005.14165
Google Scholar
- Michael G. Burke and Gerald A. Fisher. 1987. A Practical Method for LR and LL Syntactic Error Diagnosis and Recovery. ACM Trans. Program. Lang. Syst., 9, 2 (1987), mar, 164–197. issn:0164-0925 https://doi.org/10.1145/22719.22720
Google Scholar
Digital Library
- Nigel P Chapman. 1987. LR Parsing: Theory and Practice. Cambridge University Press, New York, NY, USA. isbn:0-521-30413-X
Google Scholar
- Michael Collins. 2013. Probabilistic Context-Free Grammars (PCFGs). Lecture Notes, https://u.cs.biu.ac.il/~89-680/collins-pcfgs.pdf
Google Scholar
- Rafael Corchuelo, José A. Pérez, Antonio Ruiz, and Miguel Toro. 2002. Repairing Syntax Errors in LR Parsers. ACM Trans. Program. Lang. Syst., 24, 6 (2002), nov, 698–710. issn:0164-0925 https://doi.org/10.1145/586088.586092
Google Scholar
Digital Library
- Benjamin Cosman, Madeline Endres, Georgios Sakkas, Leon Medvinsky, Yao-Yuan Yang, Ranjit Jhala, Kamalika Chaudhuri, and Westley Weimer. 2020. PABLO: Helping Novices Debug Python Code Through Data-Driven Fault Localization. Association for Computing Machinery, New York, NY, USA. 1047–1053. isbn:9781450367936 https://doi.org/10.1145/3328778.3366860
Google Scholar
Digital Library
- Paul Denny, Andrew Luxton-Reilly, and Ewan Tempero. 2012. All Syntax Errors Are Not Equal. In Proceedings of the 17th ACM Annual Conference on Innovation and Technology in Computer Science Education (ITiCSE ’12). Association for Computing Machinery, New York, NY, USA. 75–80. isbn:9781450312462 https://doi.org/10.1145/2325296.2325318
Google Scholar
Digital Library
- Jay Earley. 1970. An Efficient Context-Free Parsing Algorithm. Commun. ACM, 13, 2 (1970), Feb., 94–102. issn:0001-0782 https://doi.org/10.1145/362007.362035
Google Scholar
Digital Library
- Madeline Endres, Georgios Sakkas, Benjamin Cosman, Ranjit Jhala, and Westley Weimer. 2019. InFix: Automatically Repairing Novice Program Inputs. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE ’19). IEEE Press, 399–410. isbn:9781728125084 https://doi.org/10.1109/ASE.2019.00045
Google Scholar
Digital Library
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press, 180–184. http://www.deeplearningbook.org
Google Scholar
Digital Library
- Sumit Gulwani, Ivan Radicek, and Florian Zuleger. 2018. Automated clustering and program repair for introductory programming assignments. Programming Language Design and Implementation, isbn:9781450356985 https://doi.org/10.1145/3192366.3192387
Google Scholar
Digital Library
- Philip J. Guo. 2013. Online Python Tutor: Embeddable Web-Based Program Visualization for Cs Education. In Proceeding of the 44th ACM Technical Symposium on Computer Science Education (SIGCSE ’13). Association for Computing Machinery, New York, NY, USA. 579–584. isbn:9781450318686 https://doi.org/10.1145/2445196.2445368
Google Scholar
Digital Library
- Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. DeepFix: Fixing Common C Language Errors by Deep Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 31, 1 (2017), Feb., https://ojs.aaai.org/index.php/AAAI/article/view/10742
Google Scholar
Cross Ref
- Momchil Hardalov, Ivan Koychev, and Preslav Nakov. 2018. Towards Automated Customer Support. In International Conference on Artificial Intelligence: Methodology, Systems, and Applications. 48–59. https://doi.org/10.1007/978-3-319-99344-7_5
Google Scholar
Cross Ref
- Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer New York. isbn:9780387848570, 9780387848587 https://doi.org/10.1007/978-0-387-84858-7
Google Scholar
Cross Ref
- Frederick Jelinek, John D Lafferty, and Robert L Mercer. 1992. Basic Methods of Probabilistic Context Free Grammars. In Speech Recognition and Understanding. Springer, 345–360. https://doi.org/10.1007/978-3-642-76626-8_35
Google Scholar
Cross Ref
- Nan Jiang, Thibaud Lutellier, and Lin Tan. 2021. CURE: Code-Aware Neural Machine Translation for Automatic Program Repair. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE.
Google Scholar
- Yoon Kim, Carl Denton, Luong Hoang, and Alexander M. Rush. 2017. Structured Attention Networks. CoRR, abs/1702.00887 (2017), arXiv:1702.00887. arxiv:1702.00887
Google Scholar
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. 22 Dec., arxiv:1412.6980.
Google Scholar
- Donald E. Knuth. 1965. On the translation of languages from left to right. Information and Control, 8, 6 (1965), 607–639. issn:0019-9958 https://doi.org/10.1016/S0019-9958(65)90426-2
Google Scholar
Cross Ref
- Pavneet Singh Kochhar, Xin Xia, David Lo, and Shanping Li. 2016. Practitioners’ Expectations on Automated Fault Localization. In International Symposium on Software Testing and Analysis. ACM, 165–176. isbn:9781450343909 https://doi.org/10.1145/2931037.2931051
Google Scholar
Digital Library
- Sarah K. Kummerfeld and Judy Kay. 2003. The Neglected Battle Fields of Syntax Errors. In Proceedings of the Fifth Australasian Conference on Computing Education - Volume 20 (ACE ’03). Australian Computer Society, Inc., AUS. 105–111. isbn:0909925984
Google Scholar
- Thibaud Lutellier, Hung Viet Pham, Lawrence Pang, Yitong Li, Moshi Wei, and Lin Tan. 2020. CoCoNuT: Combining Context-Aware Neural Translation Models Using Ensemble for Program Repair. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2020). Association for Computing Machinery, New York, NY, USA. 101–114. isbn:9781450380089 https://doi.org/10.1145/3395363.3397369
Google Scholar
Digital Library
- Matias Martinez, Laurence Duchien, and Martin Monperrus. 2013. Automatically extracting instances of code change patterns with AST analysis. In 2013 IEEE international conference on software maintenance. 388–391.
Google Scholar
Digital Library
- Philippe McLean and R. Nigel Horspool. 1996. A Faster Earley Parser. In CC.
Google Scholar
- Eugene W Myers. 1986. An O(ND) difference algorithm and its variations. Algorithmica, 1, 1-4 (1986), 251–266.
Google Scholar
Digital Library
- Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In International Conference on Machine Learning. 807–814.
Google Scholar
Digital Library
- Michael A Nielsen. 2015. Neural Networks and Deep Learning. Determination Press.
Google Scholar
- Chris Parnin and Alessandro Orso. 2011. Are Automated Debugging Techniques Actually Helping Programmers? In International Symposium on Software Testing and Analysis. ACM, 199–209. isbn:9781450305624 https://doi.org/10.1145/2001420.2001445
Google Scholar
Digital Library
- Yewen Pu, Karthik Narasimhan, Armando Solar-Lezama, and Regina Barzilay. 2016. sk_p: a neural program corrector for MOOCs. https://doi.org/10.48550/ARXIV.1607.02902
Google Scholar
- Yizhou Qian and James Lehman. 2017. Students’ Misconceptions and Other Difficulties in Introductory Programming: A Literature Review. ACM Trans. Comput. Educ., 18, 1 (2017), Article 1, oct, 24 pages. https://doi.org/10.1145/3077618
Google Scholar
Digital Library
- Kia Rahmani, Mohammad Raza, Sumit Gulwani, Vu Le, Daniel Morris, Arjun Radhakrishna, Gustavo Soares, and Ashish Tiwari. 2021. Multi-Modal Program Inference: A Marriage of Pre-Trained Language Models and Component-Based Synthesis. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 158, oct, 29 pages. https://doi.org/10.1145/3485535
Google Scholar
Digital Library
- Sanguthevar Rajasekaran and Marius Nicolae. 2014. An error correcting parser for context free grammars that takes less than cubic time. https://doi.org/10.48550/ARXIV.1406.3405
Google Scholar
- David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning Representations by Back-propagating Errors. Nature, 323, 6088 (1986), 533–536. https://doi.org/10.1038/323533a0
Google Scholar
Cross Ref
- Georgios Sakkas, Madeline Endres, Benjamin Cosman, Westley Weimer, and Ranjit Jhala. 2020. Type Error Feedback via Analytic Program Repair. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 16–30. isbn:9781450376136 https://doi.org/10.1145/3385412.3386005
Google Scholar
Digital Library
- George Sakkas, Ranjit Jhala, Westley Weimer, and Madeline Endres. 2022. gsakkas/seq2parse: OOPSLA2 2022 code release. https://doi.org/10.5281/zenodo.7080357
Google Scholar
Digital Library
- Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural Networks, 61 (2015), Jan, 85–117. issn:0893-6080 https://doi.org/10.1016/j.neunet.2014.09.003
Google Scholar
Digital Library
- Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. arxiv:1409.3215.
Google Scholar
Digital Library
- Richard A. Thompson. 1976. Language Correction Using Probabilistic Grammars. IEEE Trans. Comput., C-25, 3 (1976), 275–286. https://doi.org/10.1109/TC.1976.5009254
Google Scholar
Digital Library
- P. van der Spek, N. Plat, and C. Pronk. 2005. Syntax Error Repair for a Java-Based Parser Generator. SIGPLAN Not., 40, 4 (2005), April, 47–50. issn:0362-1340 https://doi.org/10.1145/1064165.1064173
Google Scholar
Digital Library
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). 30, Curran Associates, Inc., 5998–6008. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Google Scholar
- Gust Verbruggen, Vu Le, and Sumit Gulwani. 2021. Semantic Programming by Example with Pre-Trained Models. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 100, oct, 25 pages. https://doi.org/10.1145/3485477
Google Scholar
Digital Library
- Ke Wang, Rishabh Singh, and Zhendong Su. 2018. Search, Align, and Repair: Data-driven Feedback Generation for Introductory Programming Exercises. In Programming Language Design and Implementation. 481–495. isbn:978-1-4503-5698-5 https://doi.org/10.1145/3192366.3192384
Google Scholar
Digital Library
- P.J. Werbos. 1990. Backpropagation through time: what it does and how to do it. Proc. IEEE, 78, 10 (1990), 1550–1560. https://doi.org/10.1109/5.58337
Google Scholar
Cross Ref
- Liwei Wu, Fei Li, Youhua Wu, and Tao Zheng. 2020. GGF: A Graph-Based Method for Programming Language Syntax Error Correction. Association for Computing Machinery, New York, NY, USA. 139–148. isbn:9781450379588 https://doi.org/10.1145/3387904.3389252
Google Scholar
Digital Library
Index Terms
Seq2Parse: neurosymbolic parse error repair
Recommendations
Error repair in shift-reduce parsers
Local error repair of strings during CFG parsing requires the insertion and deletion of symbols in the region of a syntax error to produce a string that is error free. Rather than precalculating tables at parser generation time to assist in finding such ...
Type error feedback via analytic program repair
PLDI 2020: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and ImplementationWe introduce Analytic Program Repair, a data-driven strategy for providing feedback for type-errors via repairs for the erroneous program. Our strategy is based on insight that similar errors have similar repairs. Thus, we show how to use a training ...
Do automated program repair techniques repair hard and important bugs?
Existing evaluations of automated repair techniques focus on the fraction of the defects for which the technique can produce a patch, the time needed to produce patches, and how well patches generalize to the intended specification. However, these ...






Comments