Abstract
This paper presents Apex, a system that can automatically generate explanations for programming assignment bugs, regarding where the bugs are and how the root causes led to the runtime failures. It works by comparing the passing execution of a correct implementation (provided by the instructor) and the failing execution of the buggy implementation (submitted by the student). The technique overcomes a number of technical challenges caused by syntactic and semantic differences of the two implementations. It collects the symbolic traces of the executions and matches assignment statements in the two execution traces by reasoning about symbolic equivalence. It then matches predicates by aligning the control dependences of the matched assignment statements, avoiding direct matching of path conditions which are usually quite different. Our evaluation shows that Apex is every effective for 205 buggy real world student submissions of 4 programming assignments, and a set of 15 programming assignment type of buggy programs collected from stackoverflow.com, precisely pinpointing the root causes and capturing the causality for 94.5% of them. The evaluation on a standard benchmark set with over 700 student bugs shows similar results. A user study in the classroom shows that Apex has substantially improved student productivity.
- What is wrong with my binary search implementation? http: //stackoverflow.com/questions/21709124.Google Scholar
- Dijkstra’s algorithm not working. http://stackoverflow. com/questions/14135999,.Google Scholar
- Logical error in my implementation of dijkstra’s algorithm. http://stackoverflow.com/questions/10432682,.Google Scholar
- Apex benchmarks. http://apexpub.altervista.org/.Google Scholar
- Euclid algorithm incorrect results. http://stackoverflow. com/questions/16567505,.Google Scholar
- Inverse function works properly, but if works after while loops it produces wrong answers. http://stackoverflow.com/ questions/22921661,.Google Scholar
- Bug in my floyd-warshall c ++ implementation. http://st ackoverflow.com/questions/3027216.Google Scholar
- Is this an incorrect implementation of kadane’s algorithm? http://stackoverflow.com/questions/22927720.Google Scholar
- Knapsack algorithm for two bags. http://stackoverflow. com/questions/20255319,.Google Scholar
- Is there something wrong with my knapsack. http://stac koverflow.com/questions/21360767,.Google Scholar
- Incorrect result in matrix multiplication in c. http://stacko verflow.com/questions/15512963.Google Scholar
- Merge sort implementation. http://stackoverflow.com/ questions/18141065.Google Scholar
- Prims alghoritm. http://stackoverflow.com/question s/24145687.Google Scholar
- What is wrong with this algorithm? http://stackoverflo w.com/questions/18794190,.Google Scholar
- Project euler problem 4. http://stackoverflow.com/qu estions/7000168,.Google Scholar
- Project euler 8, i don’t understand where i’m going wrong. http://stackoverflow.com/questions/23824570,.Google Scholar
- Stackoverflow. http://www.stackoverflow.com.Google Scholar
- Analysis: The exploding demand for computer science education, and why america needs to keep up. http://www.geekwire.com/2014/analysis-examini ng-computer-science-education-explosion/, 2014.Google Scholar
- A. Adam and J.-P. Laurent. Laura, a system to debug student programs. Artificial Intelligence, 15(1):75–122, 1980.Google Scholar
Digital Library
- C. Ansótegui, F. Didier, and J. Gabàs. Exploiting the structure of unsatisfiable cores in maxsat. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI ’15, pages 283–289. AAAI Press, 2015. ISBN 978-1-57735-738-4. Google Scholar
Digital Library
- S. Artzi, J. Dolby, F. Tip, and M. Pistoia. Directed test generation for e ffective fault localization. In Proceedings of the 19th International Symposium on Software Testing and Analysis, ISSTA ’10, pages 49–60, New York, NY, USA, 2010. Google Scholar
Digital Library
- ACM. ISBN 978-1-60558-823-0.Google Scholar
- A. Banerjee, A. Roychoudhury, J. A. Harlie, and Z. Liang. Golden implementation driven software debugging. In Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE ’10, pages 177– 186, New York, NY, USA, 2010. ACM. ISBN 978-1-60558- 791-2. Google Scholar
Digital Library
- H. Cleve and A. Zeller. Locating causes of program failures. In Proceedings of the 27th International Conference on Software Engineering, ICSE ’05, pages 342–351, New York, NY, USA, 2005. ACM. ISBN 1-58113-963-2. Google Scholar
Digital Library
- L. De Moura and N. Bjørner. Z3: An e fficient smt solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS’08 /ETAPS’08, pages 337–340, Berlin, Heidelberg, 2008. Springer-Verlag. Google Scholar
Digital Library
- ISBN 3-540-78799-2, 978-3-540-78799-0.Google Scholar
- A. Groce, S. Chaki, D. Kroening, and O. Strichman. Error explanation with distance metrics. International Journal on Software Tools for Technology Transfer, 8(3):229–247, June 2006. ISSN 1433-2779.Google Scholar
Digital Library
- S. Gulwani, I. Radiˇcek, and F. Zuleger. Feedback generation for performance problems in introductory programming assignments. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE ’14, pages 41–51, New York, NY, USA, 2014. ACM. Google Scholar
Digital Library
- D. S. Hirschberg. Algorithms for the longest common subsequence problem. Journal of ACM, 24(4):664–675, Oct. 1977. ISSN 0004-5411. Google Scholar
Digital Library
- K. J. Ho ffman, P. Eugster, and S. Jagannathan. Semanticsaware trace analysis. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’09, pages 453–464, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-392-1. Google Scholar
Digital Library
- M. Jose and R. Majumdar. Cause clue clauses: Error localization using maximum satisfiability. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’11, pages 437–446, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0663-8. Google Scholar
Digital Library
- S. Kaleeswaran, V. Tulsian, A. Kanade, and A. Orso. Minthint: Automated synthesis of repair hints. In Proceedings of the 36th International Conference on Software Engineering, ICSE ’14, pages 266–276, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-2756-5. Google Scholar
Digital Library
- R. Könighofer and R. Bloem. Automated error localization and correction for imperative programs. In Proceedings of the International Conference on Formal Methods in Computer-Aided Design, FMCAD ’11, pages 91–100, Austin, TX, 2011. Google Scholar
Digital Library
- FMCAD Inc. ISBN 978-0-9835678-1-3.Google Scholar
- S. Lahiri, R. Sinha, and C. Hawblitzel. Automatic rootcausing for program equivalence failures in binaries. In Proceedings of the 27th International Conference on Computer Aided Verification, CAV’15, pages 362–379, Berlin, Heidelberg, 2015. Springer-Verlag. ISBN 978-3-319-21689-8.Google Scholar
Cross Ref
- S. K. Lahiri, C. Hawblitzel, M. Kawaguchi, and H. Rebêlo. Symdi ff: A language-agnostic semantic diff tool for imperative programs. In Proceedings of the 24th International Conference on Computer Aided Verification, CAV’12, pages 712–717, Berlin, Heidelberg, 2012. Springer-Verlag. ISBN 978-3-642- 31423-0. Google Scholar
Digital Library
- A. Lakhotia, M. D. Preda, and R. Giacobazzi. Fast location of similar code fragments using semantic ’juice’. In Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop, PPREW ’13, pages 5:1–5:6, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-1857-0. Google Scholar
Digital Library
- C. Le Goues, N. Holtschulte, E. K. Smith, Y. Brun, P. Devanbu, S. Forrest, and W. Weimer. The manybugs and introclass benchmarks for automated repair of c programs. IEEE Transactions on Software Engineering (TSE), 41(12):1236–1256, December 2015. ISSN 0098-5589.Google Scholar
Digital Library
- B. Liblit, A. Aiken, A. X. Zheng, and M. I. Jordan. Bug isolation via remote program sampling. In PLDI’03, 2003. Google Scholar
Digital Library
- W. R. Murray. Automatic program debugging for intelligent tutoring systems. Computational Intelligence, 3(1):1–16, 1987.Google Scholar
Cross Ref
- G. C. Necula. Translation validation for an optimizing compiler. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, PLDI ’00, pages 83–94, New York, NY, USA, 2000. ACM. ISBN 1-58113-199-2. Google Scholar
Digital Library
- H. D. T. Nguyen, D. Qi, A. Roychoudhury, and S. Chandra. Semfix: Program repair via semantic analysis. In Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pages 772–781, Piscataway, NJ, USA, 2013. IEEE Press. ISBN 978-1-4673-3076-3. Google Scholar
Digital Library
- M. K. Ramanathan, A. Grama, and S. Jagannathan. Sieve: A tool for automatically detecting variations across program versions. In Proceedings of the 21st IEEE /ACM International Conference on Automated Software Engineering, ASE ’06, pages 241–252, Washington, DC, USA, 2006. IEEE Computer Society. ISBN 0-7695-2579-2. Google Scholar
Digital Library
- C. J. V. Rijsbergen. Information Retrieval. Butterworth-Heinemann, Newton, MA, USA, 2nd edition, 1979. ISBN 0408709294. Google Scholar
Digital Library
- S. K. Sahoo, J. Criswell, C. Geigle, and V. Adve. Using likely invariants for automated software fault localization. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’13, pages 139–152, New York, NY, USA, 2013. Google Scholar
Digital Library
- ACM. ISBN 978-1-4503-1870-9.Google Scholar
- R. Singh, S. Gulwani, and A. Solar-Lezama. Automated feedback generation for introductory programming assignments. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’13, pages 15–26, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-2014-6. Google Scholar
Digital Library
- W. N. Sumner and X. Zhang. Comparative causality: Explaining the di fferences between executions. In Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pages 272–281, Piscataway, NJ, USA, 2013. IEEE Press. ISBN 978-1-4673-3076-3. Google Scholar
Digital Library
- D. Weeratunge, X. Zhang, W. N. Sumner, and S. Jagannathan. Analyzing concurrency bugs using dual slicing. In Proceedings of the 19th International Symposium on Software Testing and Analysis, ISSTA ’10, pages 253–264, New York, NY, USA, 2010. ACM. ISBN 978-1-60558-823-0. Google Scholar
Digital Library
- A. Zeller. Isolating cause-e ffect chains from computer programs. In Proceedings of the 10th ACM SIGSOFT Symposium on Foundations of Software Engineering, SIGSOFT ’02 /FSE- 10, pages 1–10, New York, NY, USA, 2002. ACM. ISBN 1-58113-514-9. Introduction Motivation Problem Formalization Design Phase (1): Iterative Instance Matching Phase (2): Residue Alignment Phase (3): Comparative Dependence Graph Construction, Slicing, and Feedback Generation Implementation and Evaluation Experiment with Real Student Submissions Experiment with stackoverflow.com Programs User Study Limitations Comparison with PMaxSat Experiment with IntroClass Benchmarks Related Work Conclusion Google Scholar
Digital Library
Index Terms
Apex: automatic programming assignment error explanation
Recommendations
Apex: automatic programming assignment error explanation
OOPSLA 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsThis paper presents Apex, a system that can automatically generate explanations for programming assignment bugs, regarding where the bugs are and how the root causes led to the runtime failures. It works by comparing the passing execution of a correct ...
Automatic Bug Assignment Using Information Extraction Methods
ACSAT '12: Proceedings of the 2012 International Conference on Advanced Computer Science Applications and TechnologiesThe number of reported bugs in large open source projects is high and triaging these bugs is an important issue in software maintenance. As a step in the bug triaging process, assigning a new bug to the most appropriate developer to fix it, is not only ...
Program visualization and explanation for novice C programmers
ACE '14: Proceedings of the Sixteenth Australasian Computing Education Conference - Volume 148Program visualization and natural language explanations of program behaviour have been shown to assist novice programmers with improving their programming knowledge, correcting misunderstandings, and debugging programs. These techniques have been used ...







Comments