10.1007/978-3-030-52237-7_15guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Fooling Automatic Short Answer Grading Systems

Published:06 July 2020Publication History

Abstract

With the rising success of adversarial attacks on many NLP tasks, systems which actually operate in an adversarial scenario need to be reevaluated. For this purpose, we pose the following research question: How difficult is it to fool automatic short answer grading systems? In particular, we investigate the robustness of the state of the art automatic short answer grading system proposed by Sung et al. towards cheating in the form of universal adversarial trigger employment. These are short token sequences that can be prepended to students’ answers in an exam to artificially improve their automatically assigned grade. Such triggers are especially critical as they can easily be used by anyone once they are found. In our experiments, we discovered triggers which allow students to pass exams with passing thresholds of without answering a single question correctly. Furthermore, we show that such triggers generalize across models and datasets in this scenario, nullifying the defense strategy of keeping grading models or data secret.

References

  1. 1.Ahmadi ACheating on exams in the Iranian EFL contextJ. Acad. Ethics2012102151170Google ScholarGoogle Scholar
  2. 2.Akhtar NMian AThreat of adversarial attacks on deep learning in computer vision: a surveyIEEE Access201861441014430Google ScholarGoogle Scholar
  3. 3.Alzantot, M., Sharma, Y., Elgohary, A., Ho, B.J., Srivastava, M., Chang, K.W.: Generating natural language adversarial examples. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2890–2896 (2018)Google ScholarGoogle Scholar
  4. 4.Basu SJacobs CVanderwende LPowergrading: a clustering approach to amplify human effort for short answer gradingTrans. Assoc. Comput. Linguist.20131391402Google ScholarGoogle ScholarCross RefCross Ref
  5. 5.Behjati, M., Moosavi-Dezfooli, S.M., Baghshah, M.S., Frossard, P.: Universal adversarial attacks on text classifiers. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7345–7349. IEEE (2019)Google ScholarGoogle Scholar
  6. 6.Belinkov, Y., Bisk, Y.: Synthetic and natural noise both break neural machine translation. arXiv preprint arXiv:1711.02173 (2017)Google ScholarGoogle Scholar
  7. 7.Burrows SGurevych IStein BThe eras and trends of automatic short answer gradingInt. J. Artif. Intell. Educ.201525160117Google ScholarGoogle ScholarCross RefCross Ref
  8. 8.Carlini, N., Wagner, D.: Adversarial examples are not easily detected: bypassing ten detection methods. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 3–14. ACM (2017)Google ScholarGoogle Scholar
  9. 9.Danielsen RDSimon AFPavlick RThe culture of cheating: from the classroom to the exam roomJ. Phys. Assist. Educ. (Phys. Assist. Educ. Assoc.)20061712329Google ScholarGoogle Scholar
  10. 10.Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)Google ScholarGoogle Scholar
  11. 11.Diekhoff GMLaBeff EEShinohara KYasukawa HCollege cheating in Japan and the United StatesRes. High. Educ.1999403343353Google ScholarGoogle Scholar
  12. 12.Dzikovska, M.O., et al.: SemEval-2013 task 7: the joint student response analysis and 8th recognizing textual entailment challenge. Technical report. North Texas State Univ., Denton (2013)Google ScholarGoogle Scholar
  13. 13.Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: HotFlip: white-box adversarial examples for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 31–36 (2018)Google ScholarGoogle Scholar
  14. 14.Galhardi LBBrancher JDSimari GRFermé EGutiérrez Segura FRodríguez Melquiades JAMachine learning approach for automatic short answer grading: a systematic reviewAdvances in Artificial Intelligence - IBERAMIA 20182018ChamSpringer380391Google ScholarGoogle Scholar
  15. 15.Gao, H., Oates, T.: Universal adversarial perturbation for text classification. arXiv preprint arXiv:1910.04618 (2019)Google ScholarGoogle Scholar
  16. 16.Horbach, A., Pinkal, M.: Semi-supervised clustering for short answer scoring. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (2018)Google ScholarGoogle Scholar
  17. 17.Iyyer, M., Wieting, J., Gimpel, K., Zettlemoyer, L.: Adversarial example generation with syntactically controlled paraphrase networks. In: Proceedings of NAACL-HLT, pp. 1875–1885 (2018)Google ScholarGoogle Scholar
  18. 18.King CGGuyette RWPiotrowski COnline exams and cheating: an empirical analysis of business students’ viewsJ. Educ. Online200961n1Google ScholarGoogle ScholarCross RefCross Ref
  19. 19.Klein HALevenburg NMMcKendall MMothersell WCheating during the college years: how do business school students compare?J. Bus. Ethics2007722197206Google ScholarGoogle Scholar
  20. 20.Kumar, S., Chakrabarti, S., Roy, S.: Earth mover’s distance pooling over Siamese LSTMs for automatic short answer grading. In: IJCAI, pp. 2046–2052 (2017)Google ScholarGoogle Scholar
  21. 21.Leacock CChodorow MC-rater: automated scoring of short-answer questionsComput. Humanit.2003374389405Google ScholarGoogle ScholarCross RefCross Ref
  22. 22.Liang, B., Li, H., Su, M., Bian, P., Li, X., Shi, W.: Deep text classification can be fooled. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 4208–4215. AAAI Press (2018)Google ScholarGoogle Scholar
  23. 23.Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)Google ScholarGoogle Scholar
  24. 24.Marvaniya, S., Saha, S., Dhamecha, T.I., Foltz, P., Sindhgatta, R., Sengupta, B.: Creating scoring rubric from representative student answers for improved short answer grading. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 2018, pp. 993–1002. Association for Computing Machinery, New York (2018). 10.1145/3269206.3271755Google ScholarGoogle Scholar
  25. 25.Mohler, M., Bunescu, R., Mihalcea, R.: Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 752–762. Association for Computational Linguistics (2011)Google ScholarGoogle Scholar
  26. 26.Murdock TBAnderman EMMotivational perspectives on student cheating: toward an integrated model of academic dishonestyEduc. Psychol.2006413129145Google ScholarGoogle Scholar
  27. 27.Padó, U.: Get semantic with me! the usefulness of different feature types for short-answer grading. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2186–2195 (2016)Google ScholarGoogle Scholar
  28. 28.Pedersen, T., Patwardhan, S., Michelizzi, J.: Wordnet::similarity: measuring the relatedness of concepts. In: Demonstration Papers at HLT-NAACL 2004, pp. 38–41. Association for Computational Linguistics (2004)Google ScholarGoogle Scholar
  29. 29.Ren, S., Deng, Y., He, K., Che, W.: Generating natural language adversarial examples through probability weighted word saliency. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1085–1097 (2019)Google ScholarGoogle Scholar
  30. 30.Ribeiro, M.T., Singh, S., Guestrin, C.: Semantically equivalent adversarial rules for debugging NLP models. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 856–865 (2018)Google ScholarGoogle Scholar
  31. 31.Riordan, B., Horbach, A., Cahill, A., Zesch, T., Lee, C.M.: Investigating neural architectures for short answer scoring. In: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 159–168 (2017)Google ScholarGoogle Scholar
  32. 32.Roy SNarahari YDeshmukh ODRas EJoosten-ten Brinke DA perspective on computer assisted assessment techniques for short free-text answersComputer Assisted Assessment. Research into E-Assessment2015ChamSpringer96109Google ScholarGoogle Scholar
  33. 33.Saha SDhamecha TIMarvaniya SSindhgatta RSengupta Bet al.Penstein Rosé Cet al.Sentence level or token level features for automatic short answer grading? Use bothArtificial Intelligence in Education2018ChamSpringer503517Google ScholarGoogle Scholar
  34. 34.Sahu ABhowmick PKFeature engineering and ensemble-based approach for improving automatic short-answer grading performanceIEEE Trans. Learn. Technol.20191317790Google ScholarGoogle Scholar
  35. 35.Samanta, S., Mehta, S.: Towards crafting text adversarial samples. arXiv preprint arXiv:1707.02812 (2017)Google ScholarGoogle Scholar
  36. 36.Sheard JDick MMarkham SMacdonald IWalsh MCheating and plagiarism: perceptions and practices of first year IT studentsACM SIGCSE Bull.200234183187Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. 37.Smyth MLDavis JRAn examination of student cheating in the two-year collegeCommun. Coll. Rev.20033111732Google ScholarGoogle Scholar
  38. 38.Sultan, M.A., Salazar, C., Sumner, T.: Fast and easy short answer grading with high accuracy. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1070–1075 (2016)Google ScholarGoogle Scholar
  39. 39.Sung CDhamecha TIMukhi NIsotani SMillán EOgan AHastings PMcLaren BLuckin RImproving short answer grading using transformer-based pre-trainingArtificial Intelligence in Education2019ChamSpringer469481Google ScholarGoogle Scholar
  40. 40.Tan, C., Wei, F., Wang, W., Lv, W., Zhou, M.: Multiway attention networks for modeling sentence pairs. In: IJCAI, pp. 4411–4417 (2018)Google ScholarGoogle Scholar
  41. 41.Wallace, E., Feng, S., Kandpal, N., Gardner, M., Singh, S.: Universal adversarial triggers for attacking and analyzing NLP. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2153–2162 (2019)Google ScholarGoogle Scholar
  42. 42.Whitley BEFactors associated with cheating among college students: a reviewRes. High. Educ.1998393235274Google ScholarGoogle ScholarCross RefCross Ref
  43. 43.Willis, A.: Using NLP to support scalable assessment of short free text responses. In: Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 243–253 (2015)Google ScholarGoogle Scholar
  44. 44.Yuan XHe PZhu QLi XAdversarial examples: attacks and defenses for deep learningIEEE Trans. Neural Netw. Learn. Syst.2019309280528244001274Google ScholarGoogle ScholarCross RefCross Ref
  45. 45.Zehner FSälzer CGoldhammer FAutomatic coding of short text responses via clustering in educational assessmentEduc. Psychol. Measur.2016762280303Google ScholarGoogle Scholar
  46. 46.Zesch, T., Heilman, M., Cahill, A.: Reducing annotation efforts in supervised short answer scoring. In: Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 124–132 (2015)Google ScholarGoogle Scholar
  47. 47.Zhang, H., Zhou, H., Miao, N., Li, L.: Generating fluent adversarial examples for natural languages. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5564–5569 (2019)Google ScholarGoogle Scholar
  48. 48.Zhang, W.E., Sheng, Q.Z., Alhazmi, A., Li, C.: Adversarial attacks on deep learning models in natural language processing: a survey (2019)Google ScholarGoogle Scholar
  49. 49.Zhu, Y., et al.: Aligning books and movies: towards story-like visual explanations by watching movies and reading books. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 19–27 (2015)Google ScholarGoogle Scholar

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image Guide Proceedings
    Artificial Intelligence in Education: 21st International Conference, AIED 2020, Ifrane, Morocco, July 6–10, 2020, Proceedings, Part I
    Jul 2020
    659 pages
    ISBN:978-3-030-52236-0
    DOI:10.1007/978-3-030-52237-7

    © Springer Nature Switzerland AG 2020

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    • Published: 6 July 2020

    Qualifiers

    • Article
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!