Abstract
Although existing machine reading comprehension models are making rapid progress on many datasets, they are far from robust. In this article, we propose an understanding-oriented machine reading comprehension model to address three kinds of robustness issues, which are over-sensitivity, over-stability, and generalization. Specifically, we first use a natural language inference module to help the model understand the accurate semantic meanings of input questions to address the issues of over-sensitivity and over-stability. Then, in the machine reading comprehension module, we propose a memory-guided multi-head attention method that can further well understand the semantic meanings of input questions and passages. Third, we propose a multi-language learning mechanism to address the issue of generalization. Finally, these modules are integrated with a multi-task learning-based method. We evaluate our model on three benchmark datasets that are designed to measure models’ robustness, including DuReader (robust) and two SQuAD-related datasets. Extensive experiments show that our model can well address the mentioned three kinds of robustness issues. And it achieves much better results than the compared state-of-the-art models on all these datasets under different evaluation metrics, even under some extreme and unfair evaluations. The source code of our work is available at https://github.com/neukg/RobustMRC.
- . 2021. Self-supervised test-time learning for reading comprehension. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1200–1211.
DOI: Google ScholarCross Ref
- . 2021. Ensemble learning-based approach for improving generalization capability of machine reading comprehension systems. Neurocomputing 466 (2021), 229–242.Google Scholar
- . 2021. Improving question answering model robustness with synthetic adversarial data generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 8830–8848.
DOI: Google ScholarCross Ref
- . 2019. Don’t take the premise for granted: Mitigating artifacts in natural language inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 877–891.
DOI: Google ScholarCross Ref
- . 2015. A large annotated corpus for learning natural language inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 632–642.
DOI: Google ScholarCross Ref
- . 2021a. Can NLI models verify QA systems’ predictions? In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, 3841–3854.
DOI: Google ScholarCross Ref
- . 2021. Robust question answering through sub-part alignment. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1251–1263.
DOI: Google ScholarCross Ref
- . 2020. Multi-choice relational reasoning for machine reading comprehension. In Proceedings of the 28th International Conference on Computational Linguistics. 6448–6458.Google Scholar
Cross Ref
- . 2021b. NeuralLog: Natural language inference with joint neural and logical reasoning. In Proceedings of the 10th Joint Conference on Lexical and Computational Semantics. Association for Computational Linguistics, 78–88.
DOI: Google ScholarCross Ref
- . 2020. ForceReader: A BERT-based interactive machine reading comprehension model with attention separation. In Proceedings of the 28th International Conference on Computational Linguistics. 2676–2686.Google Scholar
Cross Ref
- . 2017. SmarNet: Teaching machines to read and comprehend like human. arXiv preprint arXiv:1710.02772 (2017).Google Scholar
- . 2018. Simple and effective multi-paragraph reading comprehension. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 845–855.Google Scholar
Cross Ref
- . 2019. Pre-training with whole word masking for Chinese BERT. arXiv preprint arXiv:1906.08101 (2019).Google Scholar
- . 2017. Attention-over-attention neural networks for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 593–602.Google Scholar
Cross Ref
- . 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171–4186.Google Scholar
- . 2019. Ranking generated summaries by correctness: An interesting but challenging application for natural language inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2214–2220.
DOI: Google ScholarCross Ref
- . 2019. Improving the robustness of question answering systems to question paraphrasing. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 6065–6075.Google Scholar
Cross Ref
- . 2020. Recurrent chunking mechanisms for long-text machine reading comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 6751–6761.Google Scholar
Cross Ref
- . 2020a. Incorporating syntax and frame semantics in neural network for machine reading comprehension. In Proceedings of the 28th International Conference on Computational Linguistics. 2635–2641.Google Scholar
Cross Ref
- . 2020b. A frame-based sentence representation for machine reading comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 891–896.Google Scholar
Cross Ref
- . 2018. DuReader: A Chinese machine reading comprehension dataset from real-world applications. In Proceedings of the Workshop on Machine Reading for Question Answering. 37–46.Google Scholar
Cross Ref
- . 2019. Zero-shot reading comprehension by cross-lingual transfer learning with multi-lingual language representation model. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, 5933–5940.
DOI: Google ScholarCross Ref
- . 2019a. Retrieve, read, rerank: Towards end-to-end multi-document reading comprehension. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2285–2295.Google Scholar
Cross Ref
- . 2018. Attention-guided answer distillation for machine reading comprehension. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2077–2086.Google Scholar
Cross Ref
- . 2019b. Read + verify: Machine reading comprehension with unanswerable questions. In Proceedings of the AAAI Conference on Artificial Intelligence. 6529–6537.Google Scholar
Digital Library
- . 2021. Improving unsupervised commonsense reasoning using knowledge-enabled natural language inference. In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, 4875–4885.
DOI: Google ScholarCross Ref
- . 2020. NUT-RC: Noisy user-generated text-oriented reading comprehension. In Proceedings of the 28th International Conference on Computational Linguistics. 2687–2698.Google Scholar
Cross Ref
- . 2006. The PASCAL recognising textual entailment challenge. In Proceedings of the 1st International Conference on Machine Learning Challenges: Evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment. 177–190.Google Scholar
- . 2017. Adversarial examples for evaluating reading comprehension systems. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2021–2031.Google Scholar
Cross Ref
- . 2021. Alignment rationale for natural language inference. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 5372–5387.
DOI: Google ScholarCross Ref
- . 2017. TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 1601–1611.Google Scholar
Cross Ref
- . 2018. SciTaiL: A textual entailment dataset from science question answering. In Proceedings of the AAAI Conference on Artificial Intelligence.Google Scholar
- . 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations.Google Scholar
- . 2021. ContractNLI: A dataset for document-level natural language inference for contracts. In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, 1907–1919.
DOI: Google ScholarCross Ref
- . 2020. ALBERT: A lite BERT for self-supervised learning of language representations. In Proceedings of the 8th International Conference on Learning Representations.Google Scholar
- . 2020b. Towards medical machine reading comprehension with structural knowledge and plain text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 1427–1438.Google Scholar
Cross Ref
- . 2020a. MRC examples answerable by BERT without a question are less effective in MRC model training. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: Student Research Workshop. 146–152.Google Scholar
- . 2021. How to select one among all? An empirical study towards the robustness of knowledge distillation in natural language understanding. In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, 750–762.
DOI: Google ScholarCross Ref
- . 2020. A robust adversarial training approach to machine reading comprehension. In Proceedings of the AAAI Conference on Artificial Intelligence. 8392–8400.Google Scholar
Cross Ref
- . 2018a. LCQMC: A large-scale Chinese question matching corpus. In Proceedings of the 27th International Conference on Computational Linguistics. 1952–1962.Google Scholar
- . 2018b. Stochastic answer networks for machine reading comprehension. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 1694–1704.Google Scholar
Cross Ref
- . 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019).Google Scholar
- . 2020. Synonym knowledge enhanced reader for Chinese idiom reading comprehension. In Proceedings of the 28th International Conference on Computational Linguistics. 3684–3695.Google Scholar
Cross Ref
- . 2020. MaP: A matrix-based prediction approach to improve span extraction in machine reading comprehension. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing. 687–695.Google Scholar
- . 2020. Bridging information-seeking human gaze and machine reading comprehension. In Proceedings of the 24th Conference on Computational Natural Language Learning. 142–152.Google Scholar
Cross Ref
- . 2021. Embracing ambiguity: Shifting the training target of NLI models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 862–869.
DOI: Google ScholarCross Ref
- . 2018. Efficient and robust question answering from minimal context over documents. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 1725–1735.Google Scholar
Cross Ref
- . 2021. Understanding model robustness to user-generated noisy texts. In Proceedings of the 7th Workshop on Noisy User-generated Text (W-NUT’21). Association for Computational Linguistics, 340–350.
DOI: Google ScholarCross Ref
- . 2016. MS MARCO: A human generated machine reading comprehension dataset. In Proceedings of the [email protected] Workshop.Google Scholar
- . 2022. From good to best: Two-stage training for cross-lingual machine reading comprehension. In Proceedings of the AAAI Conference on Artificial Intelligence.Google Scholar
- . 2020. Bi-directional cognitive thinking network for machine reading comprehension. In Proceedings of the 28th International Conference on Computational Linguistics.Google Scholar
Cross Ref
- . 2018. Know what you don’t know: Unanswerable questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 784–789.Google Scholar
Cross Ref
- . 2021. Are VQA systems RAD? Measuring robustness to augmented data with focused interventions. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 61–70.
DOI: Google ScholarCross Ref
- . 2016. Bidirectional attention flow for machine comprehension. In Proceedings of the International Conference on Learning Representations (Poster).Google Scholar
- . 2021. Improving the robustness of QA models to challenge sets with variational question-answer pair generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop. Association for Computational Linguistics, 197–214.
DOI: Google ScholarCross Ref
- . 2020. Benchmarking robustness of machine reading comprehension models. arXiv preprint arXiv:2004.14004 (2020).Google Scholar
- . 2019b. Improving machine reading comprehension with general reading strategies. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2633–2643.Google Scholar
Cross Ref
- . 2019a. ERNIE: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 (2019).Google Scholar
- . 2021. DuReader_robust: A Chinese dataset towards evaluating robustness and generalization of machine reading comprehension in real-world applications. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 955–963. Retrieved from https://aclanthology.org/2021.acl-short.120.Google Scholar
Cross Ref
- . 2020. Scene restoring for narrative machine reading comprehension. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 3063–3073.Google Scholar
Cross Ref
- . 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 5998–6008.Google Scholar
- . 2019. Explicit utilization of general knowledge in machine reading comprehension. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2263–2272.Google Scholar
Cross Ref
- . 2018c. Multi-granularity hierarchical attention fusion networks for reading comprehension and question answering. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 1705–1714.Google Scholar
Cross Ref
- . 2018. Robust machine comprehension models via adversarial training. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 575–581.Google Scholar
Cross Ref
- . 2018a. Multi-passage machine reading comprehension with cross-passage answer verification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 1918–1927.Google Scholar
Cross Ref
- . 2018b. Joint training of candidate extraction and answer selection for reading comprehension. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 1715–1724.Google Scholar
Cross Ref
- . 2020. Undersensitivity in neural reading comprehension. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, 1152–1165.Google Scholar
- . 2019. Dialogue natural language inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 3731–3741.
DOI: Google ScholarCross Ref
- . 2018. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1112–1122.
DOI: Google ScholarCross Ref
- . 2020. Improving the robustness of machine reading comprehension model with hierarchical knowledge and auxiliary unanswerability prediction. Knowl.-based Syst. 203 (2020), 106075.Google Scholar
Cross Ref
- . 2019. A deep cascade model for multi-document reading comprehension. In Proceedings of the AAAI Conference on Artificial Intelligence. 7354–7361.Google Scholar
Digital Library
- . 2019. XLNet: Generalized autoregressive pretraining for language understanding. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 5753–5763.Google Scholar
- . 2018. QANet: Combining local convolution with global self-attention for reading comprehension. In Proceedings of the International Conference on Learning Representations.Google Scholar
- . 2020. Learn with noisy data via unsupervised loss correction for weakly supervised reading comprehension. In Proceedings of the 28th International Conference on Computational Linguistics. 2624–2634.Google Scholar
Cross Ref
- . 2021. Retrospective reader for machine reading comprehension. In AAAI 2021.Google Scholar
- . 2020. Document modeling with graph attention networks for multi-grained machine reading comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 6708–6718.Google Scholar
Cross Ref
- . 2020. Robust reading comprehension with linguistic constraints via posterior regularization. IEEE Transactions on Audio, Speech, and Language Processing 28 (2020), 2500–2510.Google Scholar
Digital Library
- . 2020. Towards robustifying NLI models against lexical dataset biases. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 8759–8771.
DOI: Google ScholarCross Ref
- . 2021. HILDIF: Interactive debugging of NLI models using influence functions. In Proceedings of the First Workshop on Interactive Learning for Natural Language Processing. Association for Computational Linguistics, 1–6.
DOI: Google ScholarCross Ref
Index Terms
An Understanding-oriented Robust Machine Reading Comprehension Model
Recommendations
Reading Comprehension in Czech via Machine Translation and Cross-Lingual Transfer
Text, Speech, and DialogueAbstractReading comprehension is a well studied task, with huge training datasets in English. This work focuses on building reading comprehension systems for Czech, without requiring any manually annotated Czech training data. First of all, we ...
XCMRC: Evaluating Cross-Lingual Machine Reading Comprehension
Natural Language Processing and Chinese ComputingAbstractWe present XCMRC, the first public cross-lingual language understanding (XLU) benchmark which aims to test machines on their cross-lingual reading comprehension ability. To be specific, XCMRC is a Cross-lingual Cloze-style Machine Reading ...
Sentence Extraction-Based Machine Reading Comprehension for Vietnamese
Knowledge Science, Engineering and ManagementAbstractThe development of natural language processing (NLP) in general and machine reading comprehension in particular has attracted the great attention of the research community. In recent years, there are a few datasets for machine reading ...






Comments