Abstract
Machine reading comprehension is a natural language understanding task where the computing system is required to read a text and then find the answer to a specific question posed by a human. Large-scale and high-quality corpora are necessary for evaluating machine reading comprehension models. Furthermore, machine reading comprehension (MRC) for the health sector has potential for practical applications; nevertheless, MRC research in this domain is currently scarce. This article presents UIT-ViNewsQA, a new corpus for the Vietnamese language to evaluate MRC models for the healthcare textual domain. The corpus consists of 22,057 human-generated question-answer pairs. Crowd-workers create the questions and answers on a collection of 4,416 online Vietnamese healthcare news articles, where the answers are textual spans extracted from the corresponding articles. We introduce a process for creating a high-quality corpus for the Vietnamese machine reading comprehension task. Linguistically, our corpus accommodates diversity in question and answer types. In addition, we conduct experiments and compare the effectiveness of different MRC methods based on the neural networks and transformer architectures. Experimental results on our corpus show that the MRC system based on ALBERT architecture outperforms the neural network architectures and the BERT-based approach, an exact match score of 65.26% and an F1-score of 84.89%. The best machine model achieves about 10.90% F1-score less efficiently than humans, which proves that exploring machine models on UIT-ViNewsQA to surpass humans is challenging for researchers in the future. Our corpus is publicly available on our website: http://nlp.uit.edu.vn/datasets for research purposes.
- [1] . 2019. Ensemble approach for natural language question answering problem. In Proceedings of the 2019 7th International Symposium on Computing and Networking Workshops (CANDARW). IEEE, 180–183.Google Scholar
Cross Ref
- [2] . 2017. Reading Wikipedia to answer open-domain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1870–1879.Google Scholar
Cross Ref
- [3] . 2020. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 8440–8451. Google Scholar
Cross Ref
- [4] . 2019. A span-extraction dataset for chinese machine reading comprehension. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 5886–5891.Google Scholar
Cross Ref
- [5] . 2016. Consensus attention-based neural networks for chinese reading comprehension. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 1777–1786.Google Scholar
- [6] . 2020. A sentence cloze dataset for Chinese machine reading comprehension. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Barcelona, Spain (Online), 6717–6723. https://www.aclweb.org/anthology/2020.coling-main.589.Google Scholar
Cross Ref
- [7] . 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.Google Scholar
- [8] . 2020. FQuAD: French question answering dataset. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 1193–1208. https://www.aclweb.org/anthology/2020.findings-emnlp.107.Google Scholar
- [9] . 2021. Developing a Vietnamese tourism question answering system using knowledge graph and deep learning. Transactions on Asian and Low-Resource Language Information Processing 20, 5 (2021), 1–18.Google Scholar
Digital Library
- [10] . 2019. DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2368–2378.Google Scholar
- [11] . 2020. SberQuAD – Russian reading comprehension dataset: Description and analysis. In Experimental IR Meets Multilinguality, Multimodality, and Interaction, , , , , , , , , , and (Eds.). Springer International Publishing, Cham, 3–15. Google Scholar
- [12] . 2019. A deep neural network framework for English Hindi question answering. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 19, 2 (2019), 1–22.Google Scholar
- [13] . 2018. DuReader: A Chinese machine reading comprehension dataset from real-world applications. ACL 2018 (2018), 37.Google Scholar
- [14] . 2015. Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems. 1693–1701.Google Scholar
- [15] . 2015. The Goldilocks principle: Reading children’s books with explicit memory representations. arXiv preprint arXiv:1511.02301 (2015).Google Scholar
- [16] . 2018. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 328–339. Google Scholar
Cross Ref
- [17] . 2018. FusionNet: Fusing via fully-aware attention with application to machine comprehension. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [18] . 2019. PubMedQA: A dataset for biomedical research question answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 2567–2577. Google Scholar
Cross Ref
- [19] . 2020. Spanbert: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics 8 (2020), 64–77.Google Scholar
Cross Ref
- [20] . 2016. Selqa: A new benchmark for selection-based question answering. In 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 820–827.Google Scholar
Cross Ref
- [21] . 2018. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Brussels, Belgium, 66–71. Google Scholar
Cross Ref
- [22] . 2017. RACE: Large-scale reading comprehension dataset from examinations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 785–794.Google Scholar
Cross Ref
- [23] . 2019. ALBERT: A lite BERT for self-supervised learning of language representations. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [24] . 2018. A factoid question answering system for Vietnamese. In Companion Proceedings of the Web Conference 2018. International World Wide Web Conferences Steering Committee, 1049–1055.Google Scholar
- [25] . 2019. Korquad1.0: Korean QA dataset for machine reading comprehension. arXiv preprint arXiv:1909.07005 (2019).Google Scholar
- [26] . 2020. A Vietnamese dataset for evaluating machine reading comprehension. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Barcelona, Spain (Online), 2595–2605. https://www.aclweb.org/anthology/2020.coling-main.233.Google Scholar
Cross Ref
- [27] . 2018. UIT-VSFC: Vietnamese students’ feedback corpus for sentiment analysis. In Proceedings of the 2018 10th International Conference on Knowledge and Systems Engineering (KSE). 19–24. Google Scholar
Cross Ref
- [28] . 2021. Constructive and toxic speech detection for open-domain social media comments in Vietnamese. In Advances and Trends in Artificial Intelligence. Proceedings of the 34th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems (IEA/AIE 2021, Kuala Lumpur, Malaysia, July 26-29, 2021), Part I.
Lecture Notes in Computer Science , Vol. 12798,. , , , and (Eds.). Springer-Verlag, 572–583. Google ScholarDigital Library
- [29] . 2021. Vietnamese complaint detection on e-commerce websites. In New Trends in Intelligent Software Methodologies, Tools and Techniques. IOS Press, 618–629.Google Scholar
- [30] . 2018. Ensuring annotation consistency and accuracy for Vietnamese treebank. Language Resources and Evaluation 52, 1 (2018), 269–315.Google Scholar
Digital Library
- [31] . 2020. S3-NET: SRU-based sentence and self-matching networks for machine reading comprehension. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 19, 3 (2020), 1–14.Google Scholar
Digital Library
- [32] . 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2227–2237.Google Scholar
Cross Ref
- [33] . 2018. Improving language understanding by generative pretraining. (2018). https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf.Google Scholar
- [34] . 2018. Know what you don’t know: Unanswerable questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Melbourne, Australia, 784–789. Google Scholar
Cross Ref
- [35] . 2016. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2383–2392.Google Scholar
Cross Ref
- [36] . 2019. CoQA: A conversational question answering challenge. Transactions of the Association for Computational Linguistics 7 (2019), 249–266.Google Scholar
Cross Ref
- [37] . 2013. MCTest: A challenge dataset for the open-domain machine comprehension of text. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 193–203.Google Scholar
- [38] . 2016. Bidirectional attention flow for machine comprehension. arXiv preprint arXiv:1611.01603 (2016).Google Scholar
- [39] . 2018. DRED: A Chinese machine reading comprehension dataset. arXiv preprint arXiv:1806.00920 (2018).Google Scholar
- [40] . 2019. DREAM: A challenge data set and models for dialogue-based reading comprehension. Transactions of the Association for Computational Linguistics 7 (2019), 217–231.Google Scholar
Cross Ref
- [41] . 2018. CliCR: A dataset of clinical case reports for machine reading comprehension. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 1551–1563. Google Scholar
Cross Ref
- [42] . 2017. NewsQA: A machine comprehension dataset. In Proceedings of the 2nd Workshop on Representation Learning for NLP. Association for Computational Linguistics, Vancouver, Canada, 191–200. Google Scholar
Cross Ref
- [43] . 2020. A pilot study of text-to-SQL semantic parsing for Vietnamese. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 4079–4085. Google Scholar
Cross Ref
- [44] . 2020. Enhancing lexical-based approach with external knowledge for Vietnamese multiple-choice machine reading comprehension. IEEE Access 8 (2020), 201404–201417.Google Scholar
Cross Ref
- [45] . 2018. Comparative analysis of neural QA models on SQuAD. In Proceedings of the Workshop on Machine Reading for Question Answering. 89–97.Google Scholar
Cross Ref
- [46] . 2016. Machine comprehension using match-lstm and answer pointer. arXiv preprint arXiv:1608.07905 (2016).Google Scholar
- [47] . 2018. R 3: Reinforced ranker-reader for open-domain question answering. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google Scholar
Cross Ref
- [48] . 2017. Gated self-matching networks for reading comprehension and question answering. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 189–198.Google Scholar
Cross Ref
- [49] . 2017. Making neural QA as simple as possible but not simpler. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017). 271–280.Google Scholar
Cross Ref
- [50] . 2018. Large-scale cloze test dataset created by teachers. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 2344–2356. Google Scholar
Cross Ref
- [51] . 2019. End-to-end open-domain question answering with bertserini. arXiv preprint arXiv:1902.01718 (2019).Google Scholar
- [52] . 2018. QANet: Combining local convolution with global self-attention for reading comprehension. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [53] . 2018. Medical exam question answering with large-scale reading comprehension. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google Scholar
Index Terms
New Vietnamese Corpus for Machine Reading Comprehension of Health News Articles
Recommendations
Sentence Extraction-Based Machine Reading Comprehension for Vietnamese
Knowledge Science, Engineering and ManagementAbstractThe development of natural language processing (NLP) in general and machine reading comprehension in particular has attracted the great attention of the research community. In recent years, there are a few datasets for machine reading ...
XCMRC: Evaluating Cross-Lingual Machine Reading Comprehension
Natural Language Processing and Chinese ComputingAbstractWe present XCMRC, the first public cross-lingual language understanding (XLU) benchmark which aims to test machines on their cross-lingual reading comprehension ability. To be specific, XCMRC is a Cross-lingual Cloze-style Machine Reading ...
A Pointwise Approach for Vietnamese Diacritics Restoration
IALP '12: Proceedings of the 2012 International Conference on Asian Language ProcessingThe automatic insertion of diacritics in electronic texts is necessary for a number of languages, including French, Romanian, Croatian, Sindhi, Vietnamese, etc. When diacritics are removed from a word and the resulting string of characters is not a word,...






Comments