skip to main content
research-article
Public Access

Apex: automatic programming assignment error explanation

Published:19 October 2016Publication History
Skip Abstract Section

Abstract

This paper presents Apex, a system that can automatically generate explanations for programming assignment bugs, regarding where the bugs are and how the root causes led to the runtime failures. It works by comparing the passing execution of a correct implementation (provided by the instructor) and the failing execution of the buggy implementation (submitted by the student). The technique overcomes a number of technical challenges caused by syntactic and semantic differences of the two implementations. It collects the symbolic traces of the executions and matches assignment statements in the two execution traces by reasoning about symbolic equivalence. It then matches predicates by aligning the control dependences of the matched assignment statements, avoiding direct matching of path conditions which are usually quite different. Our evaluation shows that Apex is every effective for 205 buggy real world student submissions of 4 programming assignments, and a set of 15 programming assignment type of buggy programs collected from stackoverflow.com, precisely pinpointing the root causes and capturing the causality for 94.5% of them. The evaluation on a standard benchmark set with over 700 student bugs shows similar results. A user study in the classroom shows that Apex has substantially improved student productivity.

References

  1. What is wrong with my binary search implementation? http: //stackoverflow.com/questions/21709124.Google ScholarGoogle Scholar
  2. Dijkstra’s algorithm not working. http://stackoverflow. com/questions/14135999,.Google ScholarGoogle Scholar
  3. Logical error in my implementation of dijkstra’s algorithm. http://stackoverflow.com/questions/10432682,.Google ScholarGoogle Scholar
  4. Apex benchmarks. http://apexpub.altervista.org/.Google ScholarGoogle Scholar
  5. Euclid algorithm incorrect results. http://stackoverflow. com/questions/16567505,.Google ScholarGoogle Scholar
  6. Inverse function works properly, but if works after while loops it produces wrong answers. http://stackoverflow.com/ questions/22921661,.Google ScholarGoogle Scholar
  7. Bug in my floyd-warshall c ++ implementation. http://st ackoverflow.com/questions/3027216.Google ScholarGoogle Scholar
  8. Is this an incorrect implementation of kadane’s algorithm? http://stackoverflow.com/questions/22927720.Google ScholarGoogle Scholar
  9. Knapsack algorithm for two bags. http://stackoverflow. com/questions/20255319,.Google ScholarGoogle Scholar
  10. Is there something wrong with my knapsack. http://stac koverflow.com/questions/21360767,.Google ScholarGoogle Scholar
  11. Incorrect result in matrix multiplication in c. http://stacko verflow.com/questions/15512963.Google ScholarGoogle Scholar
  12. Merge sort implementation. http://stackoverflow.com/ questions/18141065.Google ScholarGoogle Scholar
  13. Prims alghoritm. http://stackoverflow.com/question s/24145687.Google ScholarGoogle Scholar
  14. What is wrong with this algorithm? http://stackoverflo w.com/questions/18794190,.Google ScholarGoogle Scholar
  15. Project euler problem 4. http://stackoverflow.com/qu estions/7000168,.Google ScholarGoogle Scholar
  16. Project euler 8, i don’t understand where i’m going wrong. http://stackoverflow.com/questions/23824570,.Google ScholarGoogle Scholar
  17. Stackoverflow. http://www.stackoverflow.com.Google ScholarGoogle Scholar
  18. Analysis: The exploding demand for computer science education, and why america needs to keep up. http://www.geekwire.com/2014/analysis-examini ng-computer-science-education-explosion/, 2014.Google ScholarGoogle Scholar
  19. A. Adam and J.-P. Laurent. Laura, a system to debug student programs. Artificial Intelligence, 15(1):75–122, 1980.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Ansótegui, F. Didier, and J. Gabàs. Exploiting the structure of unsatisfiable cores in maxsat. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI ’15, pages 283–289. AAAI Press, 2015. ISBN 978-1-57735-738-4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Artzi, J. Dolby, F. Tip, and M. Pistoia. Directed test generation for e ffective fault localization. In Proceedings of the 19th International Symposium on Software Testing and Analysis, ISSTA ’10, pages 49–60, New York, NY, USA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. ACM. ISBN 978-1-60558-823-0.Google ScholarGoogle Scholar
  23. A. Banerjee, A. Roychoudhury, J. A. Harlie, and Z. Liang. Golden implementation driven software debugging. In Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE ’10, pages 177– 186, New York, NY, USA, 2010. ACM. ISBN 978-1-60558- 791-2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. Cleve and A. Zeller. Locating causes of program failures. In Proceedings of the 27th International Conference on Software Engineering, ICSE ’05, pages 342–351, New York, NY, USA, 2005. ACM. ISBN 1-58113-963-2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. De Moura and N. Bjørner. Z3: An e fficient smt solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS’08 /ETAPS’08, pages 337–340, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. ISBN 3-540-78799-2, 978-3-540-78799-0.Google ScholarGoogle Scholar
  27. A. Groce, S. Chaki, D. Kroening, and O. Strichman. Error explanation with distance metrics. International Journal on Software Tools for Technology Transfer, 8(3):229–247, June 2006. ISSN 1433-2779.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Gulwani, I. Radiˇcek, and F. Zuleger. Feedback generation for performance problems in introductory programming assignments. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE ’14, pages 41–51, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. S. Hirschberg. Algorithms for the longest common subsequence problem. Journal of ACM, 24(4):664–675, Oct. 1977. ISSN 0004-5411. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. J. Ho ffman, P. Eugster, and S. Jagannathan. Semanticsaware trace analysis. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’09, pages 453–464, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-392-1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Jose and R. Majumdar. Cause clue clauses: Error localization using maximum satisfiability. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’11, pages 437–446, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0663-8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Kaleeswaran, V. Tulsian, A. Kanade, and A. Orso. Minthint: Automated synthesis of repair hints. In Proceedings of the 36th International Conference on Software Engineering, ICSE ’14, pages 266–276, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-2756-5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. Könighofer and R. Bloem. Automated error localization and correction for imperative programs. In Proceedings of the International Conference on Formal Methods in Computer-Aided Design, FMCAD ’11, pages 91–100, Austin, TX, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. FMCAD Inc. ISBN 978-0-9835678-1-3.Google ScholarGoogle Scholar
  35. S. Lahiri, R. Sinha, and C. Hawblitzel. Automatic rootcausing for program equivalence failures in binaries. In Proceedings of the 27th International Conference on Computer Aided Verification, CAV’15, pages 362–379, Berlin, Heidelberg, 2015. Springer-Verlag. ISBN 978-3-319-21689-8.Google ScholarGoogle ScholarCross RefCross Ref
  36. S. K. Lahiri, C. Hawblitzel, M. Kawaguchi, and H. Rebêlo. Symdi ff: A language-agnostic semantic diff tool for imperative programs. In Proceedings of the 24th International Conference on Computer Aided Verification, CAV’12, pages 712–717, Berlin, Heidelberg, 2012. Springer-Verlag. ISBN 978-3-642- 31423-0. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Lakhotia, M. D. Preda, and R. Giacobazzi. Fast location of similar code fragments using semantic ’juice’. In Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop, PPREW ’13, pages 5:1–5:6, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-1857-0. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. C. Le Goues, N. Holtschulte, E. K. Smith, Y. Brun, P. Devanbu, S. Forrest, and W. Weimer. The manybugs and introclass benchmarks for automated repair of c programs. IEEE Transactions on Software Engineering (TSE), 41(12):1236–1256, December 2015. ISSN 0098-5589.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. B. Liblit, A. Aiken, A. X. Zheng, and M. I. Jordan. Bug isolation via remote program sampling. In PLDI’03, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. W. R. Murray. Automatic program debugging for intelligent tutoring systems. Computational Intelligence, 3(1):1–16, 1987.Google ScholarGoogle ScholarCross RefCross Ref
  41. G. C. Necula. Translation validation for an optimizing compiler. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, PLDI ’00, pages 83–94, New York, NY, USA, 2000. ACM. ISBN 1-58113-199-2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. H. D. T. Nguyen, D. Qi, A. Roychoudhury, and S. Chandra. Semfix: Program repair via semantic analysis. In Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pages 772–781, Piscataway, NJ, USA, 2013. IEEE Press. ISBN 978-1-4673-3076-3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. K. Ramanathan, A. Grama, and S. Jagannathan. Sieve: A tool for automatically detecting variations across program versions. In Proceedings of the 21st IEEE /ACM International Conference on Automated Software Engineering, ASE ’06, pages 241–252, Washington, DC, USA, 2006. IEEE Computer Society. ISBN 0-7695-2579-2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. C. J. V. Rijsbergen. Information Retrieval. Butterworth-Heinemann, Newton, MA, USA, 2nd edition, 1979. ISBN 0408709294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. S. K. Sahoo, J. Criswell, C. Geigle, and V. Adve. Using likely invariants for automated software fault localization. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’13, pages 139–152, New York, NY, USA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. ACM. ISBN 978-1-4503-1870-9.Google ScholarGoogle Scholar
  47. R. Singh, S. Gulwani, and A. Solar-Lezama. Automated feedback generation for introductory programming assignments. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’13, pages 15–26, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-2014-6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. W. N. Sumner and X. Zhang. Comparative causality: Explaining the di fferences between executions. In Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pages 272–281, Piscataway, NJ, USA, 2013. IEEE Press. ISBN 978-1-4673-3076-3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. D. Weeratunge, X. Zhang, W. N. Sumner, and S. Jagannathan. Analyzing concurrency bugs using dual slicing. In Proceedings of the 19th International Symposium on Software Testing and Analysis, ISSTA ’10, pages 253–264, New York, NY, USA, 2010. ACM. ISBN 978-1-60558-823-0. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. A. Zeller. Isolating cause-e ffect chains from computer programs. In Proceedings of the 10th ACM SIGSOFT Symposium on Foundations of Software Engineering, SIGSOFT ’02 /FSE- 10, pages 1–10, New York, NY, USA, 2002. ACM. ISBN 1-58113-514-9. Introduction Motivation Problem Formalization Design Phase (1): Iterative Instance Matching Phase (2): Residue Alignment Phase (3): Comparative Dependence Graph Construction, Slicing, and Feedback Generation Implementation and Evaluation Experiment with Real Student Submissions Experiment with stackoverflow.com Programs User Study Limitations Comparison with PMaxSat Experiment with IntroClass Benchmarks Related Work Conclusion Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Apex: automatic programming assignment error explanation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 51, Issue 10
      OOPSLA '16
      October 2016
      915 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/3022671
      Issue’s Table of Contents
      • cover image ACM Conferences
        OOPSLA 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications
        October 2016
        915 pages
        ISBN:9781450344449
        DOI:10.1145/2983990

      Copyright © 2016 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 October 2016

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!