Abstract
We present a new technique for automatically detecting logical errors in functional programming assignments. Compared to syntax or type errors, detecting logical errors remains largely a manual process that requires hand-made test cases. However, designing proper test cases is nontrivial and involves a lot of human effort. Furthermore, manual test cases are unlikely to catch diverse errors because instructors cannot predict all corner cases of diverse student submissions. We aim to reduce this burden by automatically generating test cases for functional programs. Given a reference program and a student's submission, our technique generates a counter-example that captures the semantic difference of the two programs without any manual effort. The key novelty behind our approach is the counter-example generation algorithm that combines enumerative search and symbolic verification techniques in a synergistic way. The experimental results show that our technique is able to detect 88 more errors not found by mature test cases that have been improved over the past few years, and performs better than the existing property-based testing techniques. We also demonstrate the usefulness of our technique in the context of automated program repair, where it effectively helps to eliminate test-suite-overfitted patches.
Supplemental Material
- Aws Albarghouthi, Sumit Gulwani, and Zachary Kincaid. 2013. Recursive Program Synthesis. In Proceedings of the 25th International Conference on Computer Aided Verification (CAV’13) . Springer-Verlag, Berlin, Heidelberg, 934–950. Google Scholar
Cross Ref
- Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2017. DeepCoder: Learning to Write Programs. In ICLR.Google Scholar
- Sahil Bhatia, Pushmeet Kohli, and Rishabh Singh. 2018. Neuro-symbolic Program Corrector for Introductory Programming Assignments. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). ACM, New York, NY, USA, 60–70. Google Scholar
Digital Library
- Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008a. KLEE: Unassisted and Automatic Generation of High-coverage Tests for Complex Systems Programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI’08) . USENIX Association, Berkeley, CA, USA, 209–224. http://dl.acm.org/citation.cfm?id=1855741. 1855756Google Scholar
Digital Library
- Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, and Dawson R. Engler. 2008b. EXE: Automatically Generating Inputs of Death. ACM Trans. Inf. Syst. Secur. 12, 2, Article 10 (Dec. 2008), 38 pages. Google Scholar
Digital Library
- C. Cadar, P. Godefroid, S. Khurshid, C. S. Pasareanu, K. Sen, N. Tillmann, and W. Visser. 2011. Symbolic execution for software testing in practice: preliminary assessment. In 2011 33rd International Conference on Software Engineering (ICSE). 1066–1071. Google Scholar
Digital Library
- Koen Claessen and John Hughes. 2000. QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. In Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming (ICFP ’00) . ACM, New York, NY, USA, 268–279. Google Scholar
Digital Library
- Loris D’Antoni, Roopsha Samanta, and Rishabh Singh. 2016. Qlose: Program Repair with Quantiative Objectives. https: //www.microsoft.com/en-us/research/publication/qlose-program-repair-with-quantiative-objectives/Google Scholar
- Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’08/ETAPS’08) . Springer-Verlag, Berlin, Heidelberg, 337–340. http://dl.acm.org/citation.cfm?id=1792734.1792766Google Scholar
Digital Library
- Yu Feng, Ruben Martins, Yuepeng Wang, Isil Dillig, and Thomas W. Reps. 2017. Component-based Synthesis for Complex APIs. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL 2017). ACM, New York, NY, USA, 599–612. Google Scholar
Digital Library
- John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing Data Structure Transformations from Input-output Examples. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15) . ACM, New York, NY, USA, 229–239. Google Scholar
Digital Library
- Robert Bruce Findler and Matthias Felleisen. 2002. Contracts for Higher-order Functions. In Proceedings of the Seventh ACM SIGPLAN International Conference on Functional Programming (ICFP ’02) . ACM, New York, NY, USA, 48–59. Google Scholar
Digital Library
- Stephanie Forrest, ThanhVu Nguyen, Westley Weimer, and Claire Le Goues. 2009. A Genetic Programming Approach to Automated Software Repair. In Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation (GECCO ’09) . ACM, New York, NY, USA, 947–954. Google Scholar
Digital Library
- Jonathan Frankle, Peter-Michael Osera, David Walker, and Steve Zdancewic. 2016. Example-directed Synthesis: A Typetheoretic Interpretation. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16) . ACM, New York, NY, USA, 802–815. Google Scholar
Digital Library
- Sumit Gulwani, Susmit Jha, Ashish Tiwari, and Ramarathnam Venkatesan. 2011. Synthesis of Loop-free Programs. In Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’11) . ACM, New York, NY, USA, 62–73. Google Scholar
Digital Library
- Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. DeepFix: Fixing Common C Language Errors by Deep Learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI’17). AAAI Press, 1345–1351. http://dl.acm.org/citation.cfm?id=3298239.3298436Google Scholar
Cross Ref
- Phillip Heidegger and Peter Thiemann. 2010. Contract-Driven Testing of JavaScript Code. In Objects, Models, Components, Patterns , Jan Vitek (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 154–172.Google Scholar
- Sarfraz Khurshid, Corina S. Păsăreanu, and Willem Visser. 2003. Generalized Symbolic Execution for Model Checking and Testing. In Proceedings of the 9th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’03) . Springer-Verlag, Berlin, Heidelberg, 553–568. http://dl.acm.org/citation.cfm?id=1765871.1765924Google Scholar
Digital Library
- Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. 2013. Automatic Patch Generation Learned from Humanwritten Patches. In Proceedings of the 2013 International Conference on Software Engineering (ICSE ’13). IEEE Press, Piscataway, NJ, USA, 802–811. http://dl.acm.org/citation.cfm?id=2486788.2486893Google Scholar
Digital Library
- James C. King. 1976. Symbolic Execution and Program Testing. Commun. ACM 19, 7 (July 1976), 385–394. Google Scholar
Digital Library
- Casey Klein, Matthew Flatt, and Robert Bruce Findler. 2010. Random Testing for Higher-order, Stateful Programs. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’10) . ACM, New York, NY, USA, 555–566. Google Scholar
Digital Library
- Etienne Kneuss, Ivan Kuraj, Viktor Kuncak, and Philippe Suter. 2013. Synthesis Modulo Recursive Functions. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA ’13) . ACM, New York, NY, USA, 407–426. Google Scholar
Digital Library
- Pieter Koopman and Rinus Plasmeijer. 2006. Automatic Testing of Higher Order Functions. In Programming Languages and Systems , Naoki Kobayashi (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 148–164.Google Scholar
- Leonidas Lampropoulos, Diane Gallois-Wong, Cătălin Hriţcu, John Hughes, Benjamin C. Pierce, and Li-yao Xia. 2017. Beginner’s Luck: A Language for Property-based Generators. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL 2017) . ACM, New York, NY, USA, 114–129. Google Scholar
Digital Library
- Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. 2012. A Systematic Study of Automated Program Repair: Fixing 55 out of 105 Bugs for $8 Each. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12) . IEEE Press, Piscataway, NJ, USA, 3–13. http://dl.acm.org/citation.cfm?id=2337223.2337225Google Scholar
Digital Library
- Junho Lee, Dowon Song, Sunbeom So, and Hakjoo Oh. 2018b. Automatic Diagnosis and Correction of Logical Errors for Functional Programming Assignments. Proc. ACM Program. Lang. 2, OOPSLA, Article 158 (Oct. 2018), 30 pages. Google Scholar
Digital Library
- Woosuk Lee, Kihong Heo, Rajeev Alur, and Mayur Naik. 2018a. Accelerating Search-based Program Synthesis Using Learned Probabilistic Models. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018) . ACM, New York, NY, USA, 436–449. Google Scholar
Digital Library
- Fan Long and Martin Rinard. 2016. Automatic Patch Generation by Learning Correct Code. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16) . ACM, New York, NY, USA, 298–312. Google Scholar
Digital Library
- Andreas Löscher and Konstantinos Sagonas. 2017. Targeted Property-based Testing. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2017) . ACM, New York, NY, USA, 46–56. Google Scholar
Digital Library
- Sergey Mechtaev, Manh-Dung Nguyen, Yannic Noller, Lars Grunske, and Abhik Roychoudhury. 2018. Semantic Program Repair Using a Reference Implementation. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18) . ACM, New York, NY, USA, 129–139. Google Scholar
Digital Library
- Hoang Duong Thien Nguyen, Dawei Qi, Abhik Roychoudhury, and Satish Chandra. 2013. SemFix: Program Repair via Semantic Analysis. In Proceedings of the 2013 International Conference on Software Engineering (ICSE ’13). IEEE Press, Piscataway, NJ, USA, 772–781. http://dl.acm.org/citation.cfm?id=2486788.2486890Google Scholar
Digital Library
- Phúc C. Nguyen, Sam Tobin-Hochstadt, and David Van Horn. 2014. Soft Contract Verification. In Proceedings of the 19th ACM SIGPLAN International Conference on Functional Programming (ICFP ’14) . ACM, New York, NY, USA, 139–152. Google Scholar
Digital Library
- Phúc C. Nguyen and David Van Horn. 2015. Relatively Complete Counterexamples for Higher-order Programs. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15) . ACM, New York, NY, USA, 446–456. Google Scholar
Digital Library
- Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-example-directed Program Synthesis. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15) . ACM, New York, NY, USA, 619–630. Google Scholar
Digital Library
- Michał H. Pałka, Koen Claessen, Alejandro Russo, and John Hughes. 2011. Testing an Optimising Compiler by Generating Random Lambda Terms. In Proceedings of the 6th International Workshop on Automation of Software Test (AST ’11). ACM, New York, NY, USA, 91–97. Google Scholar
Digital Library
- Phitchaya Mangpo Phothilimthana and Sumukh Sridhara. 2017. High-Coverage Hint Generation for Massive Courses: Do Automated Hints Help CS1 Students?. In Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education (ITiCSE ’17) . ACM, New York, NY, USA, 182–187. Google Scholar
Digital Library
- Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program Synthesis from Polymorphic Refinement Types. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16). ACM, New York, NY, USA, 522–538. Google Scholar
Digital Library
- Yewen Pu, Karthik Narasimhan, Armando Solar-Lezama, and Regina Barzilay. 2016. Sk_P: A Neural Program Corrector for MOOCs. In Companion Proceedings of the 2016 ACM SIGPLAN International Conference on Systems, Programming, Languages and Applications: Software for Humanity (SPLASH Companion 2016) . ACM, New York, NY, USA, 39–40. Google Scholar
Digital Library
- Marija Selakovic, Michael Pradel, Rezwana Karim, and Frank Tip. 2018. Test Generation for Higher-order Functions in Dynamic Languages. Proc. ACM Program. Lang. 2, OOPSLA, Article 161 (Oct. 2018), 27 pages. Google Scholar
Digital Library
- Rishabh Singh, Sumit Gulwani, and Armando Solar-Lezama. 2013. Automated Feedback Generation for Introductory Programming Assignments. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’13) . ACM, New York, NY, USA, 15–26. Google Scholar
Digital Library
- Edward K. Smith, Earl T. Barr, Claire Le Goues, and Yuriy Brun. 2015. Is the Cure Worse Than the Disease? Overfitting in Automated Program Repair. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015) . ACM, New York, NY, USA, 532–543. Google Scholar
Digital Library
- Sunbeom So and Hakjoo Oh. 2017. Synthesizing Imperative Programs from Examples Guided by Static Analysis. In Static Analysis , Francesco Ranzato (Ed.). Springer International Publishing, Cham, 364–381.Google Scholar
- Sunbeom So and Hakjoo Oh. 2018. Synthesizing Pattern Programs from Examples. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18) . AAAI Press, 1618–1624. http://dl.acm.org/citation.cfm?id=3304415.3304645Google Scholar
Digital Library
- Chengnian Sun, Vu Le, and Zhendong Su. 2016. Finding Compiler Bugs via Live Code Mutation. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2016) . ACM, New York, NY, USA, 849–863. Google Scholar
Digital Library
- Sam Tobin-Hochstadt and David Van Horn. 2012. Higher-order Symbolic Execution via Contracts. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’12) . ACM, New York, NY, USA, 537–554. Google Scholar
Digital Library
- Chenglong Wang, Alvin Cheung, and Rastislav Bodik. 2017. Synthesizing Highly Expressive SQL Queries from Input-output Examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017) . ACM, New York, NY, USA, 452–466. Google Scholar
Digital Library
- Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically Finding Patches Using Genetic Programming. In Proceedings of the 31st International Conference on Software Engineering (ICSE ’09). IEEE Computer Society, Washington, DC, USA, 364–374. Google Scholar
Digital Library
- Qi Xin and Steven P. Reiss. 2017. Identifying Test-suite-overfitted Patches Through Test Case Generation. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2017) . ACM, New York, NY, USA, 226–236. Google Scholar
Digital Library
- Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and Thomas Dillig. 2017. SQLizer: Query Synthesis from Natural Language. Proc. ACM Program. Lang. 1, OOPSLA, Article 63 (Oct. 2017), 26 pages. Google Scholar
Digital Library
- Jinqiu Yang, Alexey Zhikhartsev, Yuefei Liu, and Lin Tan. 2017. Better Test Cases for Better Automated Program Repair. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017) . ACM, New York, NY, USA, 831–841. Google Scholar
Digital Library
- Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and Understanding Bugs in C Compilers. In Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’11) . ACM, New York, NY, USA, 283–294. Google Scholar
Digital Library
Index Terms
Automatic and scalable detection of logical errors in functional programming assignments
Recommendations
Automatic diagnosis and correction of logical errors for functional programming assignments
We present FixML, a system for automatically generating feedback on logical errors in functional programming assignments. As functional languages have been gaining popularity, the number of students enrolling functional programming courses has increased ...
Automatically performing weak mutation with the aid of symbolic execution, concolic testing and search-based testing
Automating software testing activities can increase the quality and drastically decrease the cost of software development. Toward this direction, various automated test data generation tools have been developed. The majority of existing tools aim at ...
Towards automating the generation of mutation tests
AST '10: Proceedings of the 5th Workshop on Automation of Software TestAutomating software testing activities can increase the quality and drastically decrease the cost of software development. Towards this direction various automated test data generation tools have been developed. The majority of them aim at branch ...






Comments