Abstract
In recent years, the Question Answering System (QAS) has been widely used to develop many systems, such as conversation systems, chatbots, and intelligent search. Depending on the amount of information or knowledge that the system processes, the system can be applied in answering the questions in an open domain or closed domain. There are many approaches to solving the QA problem, but the neural network models have yielded impressive and promising results, especially the Machine Reading Comprehension approach. In this article, we build a closed-domain QAS for a low-resource language, Vietnamese—specifically, “The Postgraduate Admission of Ho Chi Minh City University of Food Industry, Vietnam.” In addition, we have created two datasets to serve our QAS: vi-SQuAD v1.1, which is automatically translated and edited from SQuAD (Stanford University Question Answering Dataset), and HUFI-PostGrad, which is manually collected. We use two main models for the system, including the Intent Classification model and the Machine Reading Comprehension model. Experimental results initially show that our QAS gives encouraging results.
- [1] . 2018. Semi-supervised training data generation for multilingual question answering. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC’18), Miyazaki, Japan, , , , , , , , et al. (Eds.). European Language Resources Association. http://www.lrec-conf.org/proceedings/lrec2018/summaries/711.html.Google Scholar
- [2] . 2016. SQuAD: 100,000+ questions for machine comprehension of text. CoRR abs/1606.05250 (2016).Google Scholar
- [3] . 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (2nd ed.) Prentice Hall, Pearson Education International. https://www.worldcat.org/oclc/315913020.Google Scholar
- [4] . 2017. Accurate supervised and semi-supervised machine reading for long documents. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP’17), Copenhagen, Denmark, , , and (Eds.). Association for Computational Linguistics, 2011–2020.
DOI: Google ScholarCross Ref
- [5] . 2019. CoQA: A conversational question answering challenge. Trans. Assoc. Comput. Linguistics 7 (2019), 249–266. https://transacl.org/ojs/index.php/tacl/article/view/1572.Google Scholar
Cross Ref
- [6] . 2017. SuperAgent: A customer service chatbot for e-commerce websites. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17), Vancouver, Canada, System Demonstrations, and (Eds.). Association for Computational Linguistics, 97–102.
DOI: Google ScholarCross Ref
- [7] . 2010. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 10 (2010), 1345–1359.
DOI: Google ScholarDigital Library
- [8] . 2018. Finding answers from the word of God: Domain adaptation for neural networks in biblical question answering. CoRR abs/1810.12118 (2018).Google Scholar
- [9] . 2004. A statistical interpretation of term specificity and its application in retrieval. J. Documentation 60, 5 (2004), 493–502.
DOI: Google ScholarCross Ref
- [10] . 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a Meeting Held December 5–8, 2013, Lake Tahoe, Nevada, United States, , , , and (Eds.). Curran Associates, 3111–3119. https://proceedings.neurips.cc/paper/2013/hash/9aa42b31882ec039965f3c4923ce901b-Abstract.html.Google Scholar
- [11] . 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, a Meeting of SIGDAT, a Special Interest Group of the ACL, , , and (Eds.). Association for Computational Linguistics, 1532–1543.
DOI: Google ScholarCross Ref
- [12] . 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1–6, 2018, Volume 1 (Long Papers), , , and (Eds.). Association for Computational Linguistics, 2227–2237.
DOI: Google ScholarCross Ref
- [13] . 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780.
DOI: Google ScholarDigital Library
- [14] . 1998. Convolutional Networks for Images, Speech, and Time Series. MIT Press, Cambridge, MA, 255–258. Google Scholar
- [15] . 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4–9, 2017, Long Beach, CA, USA, , , , , , , and (Eds.). Curran Associates, 5998–6008. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.Google Scholar
- [16] . 1954. Distributional structure. WORD 10, 2–3 (1954), 146–162.
DOI: Google ScholarCross Ref
- [17] . 1957. A statistical approach to mechanized encoding and searching of literary information. IBM J. Res. Dev. 1, 4 (1957), 309–317.
DOI: Google ScholarDigital Library
- [18] . 2017. Bidirectional attention flow for machine comprehension. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17). https://openreview.net/forum?id=HJ0UKP9ge.Google Scholar
- [19] . 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, Volume 1 (Long and Short Papers), , , and (Eds.). Association for Computational Linguistics, 4171–4186.
DOI: Google ScholarCross Ref
- [20] . 2020. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5–10, 2020, , , , and (Eds.). Association for Computational Linguistics, 8440–8451.
DOI: Google ScholarCross Ref
- [21] . 2020. MLQA: Evaluating cross-lingual extractive question answering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5–10, 2020, , , , and (Eds.). Association for Computational Linguistics, 7315–7330.
DOI: Google ScholarCross Ref
- [22] . 2018. Know what you don’t know: Unanswerable questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15–20, 2018, Volume 2: Short Papers, and (Eds.). Association for Computational Linguistics, 784–789.
DOI: Google ScholarCross Ref
Index Terms
Building a Closed-Domain Question Answering System for a Low-Resource Language
Recommendations
XLMRQA: Open-Domain Question Answering on Vietnamese Wikipedia-Based Textual Knowledge Source
Intelligent Information and Database SystemsAbstractQuestion answering (QA) is a natural language understanding task within the fields of information retrieval and information extraction that has attracted much attention from the computational linguistics and artificial intelligence research ...
Quality-aware collaborative question answering: methods and evaluation
WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data MiningCommunity Question Answering (QA) portals contain questions and answers contributed by hundreds of millions of users. These databases of questions and answers are of great value if they can be used directly to answer questions from any user. In this ...
RRQA: reconfirmed reader for open-domain question answering
AbstractIn open-domain question answering (QA), the system needs to answer questions from various fields and forms according to given passages. Machine reading comprehension (MRC) can assist the system in comprehending passages and questions, hence often ...






Comments