skip to main content

Automatic and scalable detection of logical errors in functional programming assignments

Published:10 October 2019Publication History
Skip Abstract Section

Abstract

We present a new technique for automatically detecting logical errors in functional programming assignments. Compared to syntax or type errors, detecting logical errors remains largely a manual process that requires hand-made test cases. However, designing proper test cases is nontrivial and involves a lot of human effort. Furthermore, manual test cases are unlikely to catch diverse errors because instructors cannot predict all corner cases of diverse student submissions. We aim to reduce this burden by automatically generating test cases for functional programs. Given a reference program and a student's submission, our technique generates a counter-example that captures the semantic difference of the two programs without any manual effort. The key novelty behind our approach is the counter-example generation algorithm that combines enumerative search and symbolic verification techniques in a synergistic way. The experimental results show that our technique is able to detect 88 more errors not found by mature test cases that have been improved over the past few years, and performs better than the existing property-based testing techniques. We also demonstrate the usefulness of our technique in the context of automated program repair, where it effectively helps to eliminate test-suite-overfitted patches.

Skip Supplemental Material Section

Supplemental Material

a188-song

Presentation at OOPSLA '19

References

  1. Aws Albarghouthi, Sumit Gulwani, and Zachary Kincaid. 2013. Recursive Program Synthesis. In Proceedings of the 25th International Conference on Computer Aided Verification (CAV’13) . Springer-Verlag, Berlin, Heidelberg, 934–950. Google ScholarGoogle ScholarCross RefCross Ref
  2. Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2017. DeepCoder: Learning to Write Programs. In ICLR.Google ScholarGoogle Scholar
  3. Sahil Bhatia, Pushmeet Kohli, and Rishabh Singh. 2018. Neuro-symbolic Program Corrector for Introductory Programming Assignments. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). ACM, New York, NY, USA, 60–70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008a. KLEE: Unassisted and Automatic Generation of High-coverage Tests for Complex Systems Programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI’08) . USENIX Association, Berkeley, CA, USA, 209–224. http://dl.acm.org/citation.cfm?id=1855741. 1855756Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, and Dawson R. Engler. 2008b. EXE: Automatically Generating Inputs of Death. ACM Trans. Inf. Syst. Secur. 12, 2, Article 10 (Dec. 2008), 38 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Cadar, P. Godefroid, S. Khurshid, C. S. Pasareanu, K. Sen, N. Tillmann, and W. Visser. 2011. Symbolic execution for software testing in practice: preliminary assessment. In 2011 33rd International Conference on Software Engineering (ICSE). 1066–1071. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Koen Claessen and John Hughes. 2000. QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. In Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming (ICFP ’00) . ACM, New York, NY, USA, 268–279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Loris D’Antoni, Roopsha Samanta, and Rishabh Singh. 2016. Qlose: Program Repair with Quantiative Objectives. https: //www.microsoft.com/en-us/research/publication/qlose-program-repair-with-quantiative-objectives/Google ScholarGoogle Scholar
  9. Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’08/ETAPS’08) . Springer-Verlag, Berlin, Heidelberg, 337–340. http://dl.acm.org/citation.cfm?id=1792734.1792766Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yu Feng, Ruben Martins, Yuepeng Wang, Isil Dillig, and Thomas W. Reps. 2017. Component-based Synthesis for Complex APIs. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL 2017). ACM, New York, NY, USA, 599–612. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing Data Structure Transformations from Input-output Examples. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15) . ACM, New York, NY, USA, 229–239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Robert Bruce Findler and Matthias Felleisen. 2002. Contracts for Higher-order Functions. In Proceedings of the Seventh ACM SIGPLAN International Conference on Functional Programming (ICFP ’02) . ACM, New York, NY, USA, 48–59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Stephanie Forrest, ThanhVu Nguyen, Westley Weimer, and Claire Le Goues. 2009. A Genetic Programming Approach to Automated Software Repair. In Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation (GECCO ’09) . ACM, New York, NY, USA, 947–954. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jonathan Frankle, Peter-Michael Osera, David Walker, and Steve Zdancewic. 2016. Example-directed Synthesis: A Typetheoretic Interpretation. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16) . ACM, New York, NY, USA, 802–815. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Sumit Gulwani, Susmit Jha, Ashish Tiwari, and Ramarathnam Venkatesan. 2011. Synthesis of Loop-free Programs. In Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’11) . ACM, New York, NY, USA, 62–73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. DeepFix: Fixing Common C Language Errors by Deep Learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI’17). AAAI Press, 1345–1351. http://dl.acm.org/citation.cfm?id=3298239.3298436Google ScholarGoogle ScholarCross RefCross Ref
  17. Phillip Heidegger and Peter Thiemann. 2010. Contract-Driven Testing of JavaScript Code. In Objects, Models, Components, Patterns , Jan Vitek (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 154–172.Google ScholarGoogle Scholar
  18. Sarfraz Khurshid, Corina S. Păsăreanu, and Willem Visser. 2003. Generalized Symbolic Execution for Model Checking and Testing. In Proceedings of the 9th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’03) . Springer-Verlag, Berlin, Heidelberg, 553–568. http://dl.acm.org/citation.cfm?id=1765871.1765924Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. 2013. Automatic Patch Generation Learned from Humanwritten Patches. In Proceedings of the 2013 International Conference on Software Engineering (ICSE ’13). IEEE Press, Piscataway, NJ, USA, 802–811. http://dl.acm.org/citation.cfm?id=2486788.2486893Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. James C. King. 1976. Symbolic Execution and Program Testing. Commun. ACM 19, 7 (July 1976), 385–394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Casey Klein, Matthew Flatt, and Robert Bruce Findler. 2010. Random Testing for Higher-order, Stateful Programs. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’10) . ACM, New York, NY, USA, 555–566. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Etienne Kneuss, Ivan Kuraj, Viktor Kuncak, and Philippe Suter. 2013. Synthesis Modulo Recursive Functions. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA ’13) . ACM, New York, NY, USA, 407–426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Pieter Koopman and Rinus Plasmeijer. 2006. Automatic Testing of Higher Order Functions. In Programming Languages and Systems , Naoki Kobayashi (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 148–164.Google ScholarGoogle Scholar
  24. Leonidas Lampropoulos, Diane Gallois-Wong, Cătălin Hriţcu, John Hughes, Benjamin C. Pierce, and Li-yao Xia. 2017. Beginner’s Luck: A Language for Property-based Generators. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL 2017) . ACM, New York, NY, USA, 114–129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. 2012. A Systematic Study of Automated Program Repair: Fixing 55 out of 105 Bugs for $8 Each. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12) . IEEE Press, Piscataway, NJ, USA, 3–13. http://dl.acm.org/citation.cfm?id=2337223.2337225Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Junho Lee, Dowon Song, Sunbeom So, and Hakjoo Oh. 2018b. Automatic Diagnosis and Correction of Logical Errors for Functional Programming Assignments. Proc. ACM Program. Lang. 2, OOPSLA, Article 158 (Oct. 2018), 30 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Woosuk Lee, Kihong Heo, Rajeev Alur, and Mayur Naik. 2018a. Accelerating Search-based Program Synthesis Using Learned Probabilistic Models. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018) . ACM, New York, NY, USA, 436–449. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Fan Long and Martin Rinard. 2016. Automatic Patch Generation by Learning Correct Code. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16) . ACM, New York, NY, USA, 298–312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Andreas Löscher and Konstantinos Sagonas. 2017. Targeted Property-based Testing. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2017) . ACM, New York, NY, USA, 46–56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sergey Mechtaev, Manh-Dung Nguyen, Yannic Noller, Lars Grunske, and Abhik Roychoudhury. 2018. Semantic Program Repair Using a Reference Implementation. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18) . ACM, New York, NY, USA, 129–139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Hoang Duong Thien Nguyen, Dawei Qi, Abhik Roychoudhury, and Satish Chandra. 2013. SemFix: Program Repair via Semantic Analysis. In Proceedings of the 2013 International Conference on Software Engineering (ICSE ’13). IEEE Press, Piscataway, NJ, USA, 772–781. http://dl.acm.org/citation.cfm?id=2486788.2486890Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Phúc C. Nguyen, Sam Tobin-Hochstadt, and David Van Horn. 2014. Soft Contract Verification. In Proceedings of the 19th ACM SIGPLAN International Conference on Functional Programming (ICFP ’14) . ACM, New York, NY, USA, 139–152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Phúc C. Nguyen and David Van Horn. 2015. Relatively Complete Counterexamples for Higher-order Programs. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15) . ACM, New York, NY, USA, 446–456. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-example-directed Program Synthesis. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15) . ACM, New York, NY, USA, 619–630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Michał H. Pałka, Koen Claessen, Alejandro Russo, and John Hughes. 2011. Testing an Optimising Compiler by Generating Random Lambda Terms. In Proceedings of the 6th International Workshop on Automation of Software Test (AST ’11). ACM, New York, NY, USA, 91–97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Phitchaya Mangpo Phothilimthana and Sumukh Sridhara. 2017. High-Coverage Hint Generation for Massive Courses: Do Automated Hints Help CS1 Students?. In Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education (ITiCSE ’17) . ACM, New York, NY, USA, 182–187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program Synthesis from Polymorphic Refinement Types. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16). ACM, New York, NY, USA, 522–538. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Yewen Pu, Karthik Narasimhan, Armando Solar-Lezama, and Regina Barzilay. 2016. Sk_P: A Neural Program Corrector for MOOCs. In Companion Proceedings of the 2016 ACM SIGPLAN International Conference on Systems, Programming, Languages and Applications: Software for Humanity (SPLASH Companion 2016) . ACM, New York, NY, USA, 39–40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Marija Selakovic, Michael Pradel, Rezwana Karim, and Frank Tip. 2018. Test Generation for Higher-order Functions in Dynamic Languages. Proc. ACM Program. Lang. 2, OOPSLA, Article 161 (Oct. 2018), 27 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Rishabh Singh, Sumit Gulwani, and Armando Solar-Lezama. 2013. Automated Feedback Generation for Introductory Programming Assignments. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’13) . ACM, New York, NY, USA, 15–26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Edward K. Smith, Earl T. Barr, Claire Le Goues, and Yuriy Brun. 2015. Is the Cure Worse Than the Disease? Overfitting in Automated Program Repair. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015) . ACM, New York, NY, USA, 532–543. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Sunbeom So and Hakjoo Oh. 2017. Synthesizing Imperative Programs from Examples Guided by Static Analysis. In Static Analysis , Francesco Ranzato (Ed.). Springer International Publishing, Cham, 364–381.Google ScholarGoogle Scholar
  43. Sunbeom So and Hakjoo Oh. 2018. Synthesizing Pattern Programs from Examples. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18) . AAAI Press, 1618–1624. http://dl.acm.org/citation.cfm?id=3304415.3304645Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Chengnian Sun, Vu Le, and Zhendong Su. 2016. Finding Compiler Bugs via Live Code Mutation. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2016) . ACM, New York, NY, USA, 849–863. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Sam Tobin-Hochstadt and David Van Horn. 2012. Higher-order Symbolic Execution via Contracts. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’12) . ACM, New York, NY, USA, 537–554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Chenglong Wang, Alvin Cheung, and Rastislav Bodik. 2017. Synthesizing Highly Expressive SQL Queries from Input-output Examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017) . ACM, New York, NY, USA, 452–466. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically Finding Patches Using Genetic Programming. In Proceedings of the 31st International Conference on Software Engineering (ICSE ’09). IEEE Computer Society, Washington, DC, USA, 364–374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Qi Xin and Steven P. Reiss. 2017. Identifying Test-suite-overfitted Patches Through Test Case Generation. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2017) . ACM, New York, NY, USA, 226–236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and Thomas Dillig. 2017. SQLizer: Query Synthesis from Natural Language. Proc. ACM Program. Lang. 1, OOPSLA, Article 63 (Oct. 2017), 26 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Jinqiu Yang, Alexey Zhikhartsev, Yuefei Liu, and Lin Tan. 2017. Better Test Cases for Better Automated Program Repair. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017) . ACM, New York, NY, USA, 831–841. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and Understanding Bugs in C Compilers. In Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’11) . ACM, New York, NY, USA, 283–294. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automatic and scalable detection of logical errors in functional programming assignments

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!