skip to main content
research-article

Arabic Span Extraction-based Reading Comprehension Benchmark (ASER) and Neural Baseline Models

Published:08 May 2023Publication History
Skip Abstract Section

Abstract

Machine reading comprehension (MRC) requires machines to read and answer questions about a given text. This can be achieved through either predicting answers or extracting them. Extracting answers from text involves predicting the first and last index of the answer span within the paragraph. Training machines to answer questions requires datasets that are created for such a purpose. The lack of availability of benchmarking datasets for the Arabic language has hindered research into machine reading comprehension from Arabic text. The aim of this article is to propose an Arabic Span-Extraction-based Reading Comprehension Benchmark (ASER) and complement it with neural baseline models for performance evaluations. Detailed steps are depicted for building and evaluating ASER, which is an Arabic dataset created manually for the task of machine reading comprehension. It contains 10,000 records from different domains and is divided into training and testing sets. The results of ASER evaluation led to the conclusion that it is a challenging benchmark since the answers have varying lengths and human performance resulted in an exact match of 42%. On the other hand, two main baseline models were the focus of ASER experimentation: the sequence-to-sequence (Seq2Seq) model with different neural networks and the bidirectional attention flow (BIDAF) model. These experiments were implemented using different embeddings, and the results showed an exact match with lower values than human performance.

REFERENCES

  1. [1] Abualigah Laith Mohammad Qasim. 2019. Feature Selection and Enhanced Krill Herd Algorithm for Text Document Clustering. Springer.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Abualigah Laith Mohammad, Khader Ahamad Tajudin, and Hanandeh. Essam Said 2018. A combination of objective functions and hybrid krill herd algorithm for text document clustering analysis. Engineering Applications of Artificial Intelligence 73 (2018), 111125.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Albilali Eman, Altwairesh Nora, and Hosny Manar. 2021. What does BERT learn from Arabic machine reading comprehension datasets? In Proceedings of the 6th Arabic Natural Language Processing Workshop.Google ScholarGoogle Scholar
  4. [4] Antoun Wissam, Baly Fady, and Hajj Hazem. 2020. AraBERT: Transformer-based model for arabic language understanding. In Proceedings of the LREC 2020 Workshop Language Resources and Evaluation Conference.Google ScholarGoogle Scholar
  5. [5] Biltawi Mariam M., Tedmori Sara, and Awajan Arafat. 2021. Arabic question answering systems: Gap analysis. IEEE Access 9 (2021), 6387663904.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Biltawi Mariam, Awajan Arafat, and Tedmori Sara. 2017. Towards building a frame-based ontology for the arabic language. In Proceedings of the International Arab Conference on Information Technology.Google ScholarGoogle Scholar
  7. [7] Biltawi Mariam, Awajan Arafat, and Tedmori Sara. 2020. Towards building an open-domain corpus for arabic reading comprehension. In Proceedings of the 35th IBlMA Conference.Google ScholarGoogle Scholar
  8. [8] Chen Danqi. 2018. Neural Reading Comprehension and Beyond. Stanford University.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Chen Danqi, Bolton Jason, and Manning Christopher D.. 2016. A thorough examination of the CNN/Daily mail reading comprehension task. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 23582367.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Cho Kyunghyun, Van Merriënboer Bart, Gulcehre Caglar, Bahdanau Dzmitry, Bougares Fethi, Schwenk Holger, and Bengio Yoshua. 2014. Learning phrase representations using RNN Encoder–Decoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Cui Yimin, Che Wanxiang, Yang Ziqing, Liu Ting, Qin Bing, Wang Shijin, and Hu Guoping. 2022. Interactive gated decoder for machine reading comprehension. Transactions on Asian and Low-resource Language Information Processing 21, 4 (2022), 119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Dzendzik Daria, Vogel Carl, and Foster Jennifer. 2021. English machine reading comprehension datasets: A survey. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 87848804.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Eid Ahmad, El-Makky Nagwa M., and Nagi Khaled. 2019. Towards machine comprehension of arabic text. In Proceedings of the KDIR.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Ezen-Can Aysu. 2020. A comparison of LSTM and BERT for small corpus. arXiv:2009.05451. Retrieved from https://arxiv.org/abs/2009.05451Google ScholarGoogle Scholar
  15. [15] Grave Edouard, Bojanowski Piotr, Gupta Prakhar, Joulin Armand, and Mikolov Tomas. 2018. Learning word vectors for 157 languages. In Proceedings of the 11th International Conference on Language Resources and Evaluation.Google ScholarGoogle Scholar
  16. [16] Halabi Dana, Fayyoumi Ebaa, and Awajan Arafat. 2021. I3rab: A new arabic dependency treebank based on arabic grammatical theory. Transactions on Asian and Low-resource Language Information Processing 21, 2 (2021), 132.Google ScholarGoogle Scholar
  17. [17] Moritz Hermann Karl, Kočiský Tomáš, Grefenstette Edward, Espeholt Lasse, Kay Will, Suleyman Mustafa, and Blunsom Phil. 2015. Teaching machines to read and comprehend. Advances in Neural Information Processing Systems 28 (2015).Google ScholarGoogle Scholar
  18. [18] Hochreiter Sepp, and Schmidhuber Jürgen. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 17351780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Hu Xiaolin and Balasubramaniam P.. 2008. (Eds.). Recurrent Neural Networks, Vol. 400. I-Tech Education and Publishing KG.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Jin Qiao, Yuan Zheng, Xiong Guangzhi, Yu Qianlan, Ying Huaiyuan, Tan Chuanqi, Chen Mosha, Huang Songfang, Liu Xiaozhong, and Yu Sheng. 2022. Biomedical question answering: A survey of approaches and challenges. ACM Computing Surveys 55, 2 (2022), 136.Google ScholarGoogle Scholar
  21. [21] Lee Kenton, Salant Shimi, Kwiatkowski Tom, Parikh Ankur, Das Dipanjan, and Berant Jonathan. 2016. Learning recurrent span representations for extractive question answering. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  22. [22] Mikolov Tomas, Sutskever Ilya, Chen Kai, Corrado Greg, and Dean Jeffrey. 2013. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems 26 (2013).Google ScholarGoogle Scholar
  23. [23] Mozannar Hussein, Hajal Karl El, Maamary Elie, and Hajj Hazem. 2019. Neural arabic question answering. In Proceedings of the 4rth Arabic Natural Language Processing Workshop. 108118.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Nguyen Tri, Rosenberg Mir, Song Xia, Gao Jianfeng, Tiwary Saurabh, Majumder Rangan, and Deng Li. 2016. MS MARCO: A human generated machine reading comprehension dataset. CoCo@ NIPS 2640 (2016), 660.Google ScholarGoogle Scholar
  25. [25] Ouyang Jianquan and Fu Mengen. 2022. Improving machine reading comprehension with multi-task learning and self-training. Mathematics 10, 3, (2022), 310.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Pan Boyuan, Li Hao, Zhao Zhou, Cao Bin, Cai Deng, and He Xiaofei. 2017. MEMEN: Multi-layer embedding with memory networks for machine comprehension. In Proceedings of the AAAI Conference on Artificial Intelligence, the 30th innovative Applications of Artificial Intelligence, and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence.Google ScholarGoogle Scholar
  27. [27] Rajpurkar Pranav, Zhang Jian, Lopyrev Konstantin, and Liang Percy. 2016. SQuAD: 100,000+ Questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 23832392.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Seo Minjoon, Kembhavi Aniruddha, Farhadi Ali, and Hajishirzi Hannaneh. 2016. Bi-directional attention flow for machine comprehension. International Conference on Learning Representations.Google ScholarGoogle Scholar
  29. [29] Soliman Abu Bakr, Eissa Kareem, and El-Beltagy Samhaa R.. 2017. AraVec: A set of arabic word embedding models for use in arabic NLP. Procedia Computer Science 117 (2017), 256265.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Sutskever Ilya, Vinyals Oriol, and Le Quoc V.. 2014. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems 27 (2014).Google ScholarGoogle Scholar
  31. [31] Trischler Adam, Wang Tong, Yuan Xingdi, Harris Justin, Sordoni Alessandro, Bachman Philip, and Suleman Kaheer. 2016. NewsQA: A machine comprehension dataset. In Proceedings of the 2nd Workshop on Representation Learning for NLP. 191200.Google ScholarGoogle Scholar
  32. [32] Vanmassenhove Eva, Shterionov Dimitar, and Way Andy. 2019. Lost in translation: Loss and decay of linguistic richness in machine translation. In Proceedings of the Machine Translation Summit XVII. European Association for Machine Translation.Google ScholarGoogle Scholar
  33. [33] Wang Shuohang and Jiang Jing. 2017. Machine comprehension using match-lstm and answer pointer. In Proceedings of the ICLR 2017: International Conference on Learning Representations. Toulon. 115.Google ScholarGoogle Scholar
  34. [34] Wang W.. 2017. R-Net: machine reading comprehension with self-matching networks. Natural Language Computer Group, Microsoft Reserach.Google ScholarGoogle Scholar
  35. [35] Wang Wenhui, Yang Nan, Wei Furu, Chang Baobao, and Zhou Ming. 2017. Gated self-matching networks for reading comprehension and question answering. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Wang Zhiguo, Mi Haitao, Hamza Wael, and Florian Radu. 2016. Multi-Perspective context matching for machine comprehension. arXiv:1612.04211. Retrieved from https://arxiv.org/abs/1612.04211Google ScholarGoogle Scholar
  37. [37] Weissenborn Dirk, Wiese Georg, and Seiffe Laura. 2017. Making neural QA as simple as possible but not simpler. In Proceedings of the 21st Conference on Computational Natural Language Learning. 271280.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Welbl Johannes, Liu Nelson F., and Gardner Matt. 2017. Crowdsourcing multiple choice science questions. In Proceedings of the 3rd Workshop on Noisy User-generated Text. 94106.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Xie Pengtao and Xing Eric. 2017. A constituent-centric neural architecture for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Arabic Span Extraction-based Reading Comprehension Benchmark (ASER) and Neural Baseline Models

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Article Metrics

      • Downloads (Last 12 months)77
      • Downloads (Last 6 weeks)5

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!