skip to main content
research-article

AyaTEC: Building a Reusable Verse-Based Test Collection for Arabic Question Answering on the Holy Qur’an

Published:02 October 2020Publication History
Skip Abstract Section

Abstract

The absence of publicly available reusable test collections for Arabic question answering on the Holy Qur’an has impeded the possibility of fairly comparing the performance of systems in that domain. In this article, we introduce AyaTEC, a reusable test collection for verse-based question answering on the Holy Qur’an, which serves as a common experimental testbed for this task. AyaTEC includes 207 questions (with their corresponding 1,762 answers) covering 11 topic categories of the Holy Qur’an that target the information needs of both curious and skeptical users. To the best of our effort, the answers to the questions (each represented as a sequence of verses) in AyaTEC were exhaustive—that is, all qur’anic verses that directly answered the questions were exhaustively extracted and annotated. To facilitate the use of AyaTEC in evaluating the systems designed for that task, we propose several evaluation measures to support the different types of questions and the nature of verse-based answers while integrating the concept of partial matching of answers in the evaluation.

References

  1. Heba Abdelnasser, Maha Ragab, Reham Mohamed, Alaa Mohamed, Bassant Farouk, Nagwa El-Makky, and Marwan Torki. 2014. Al-Bayan: An Arabic question answering system for the Holy Quran. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP’14). 57--64. http://www.aclweb.org/anthology/W14-3607.Google ScholarGoogle ScholarCross RefCross Ref
  2. Fatimah Dato Ahmad. 1995. A Malay Language Document Retrieval System: An Experimental Approach and Analysis. UKM, Bangi.Google ScholarGoogle Scholar
  3. M. Alrabiah, A. Al-Salman, E. S. Atwell, and Nawal Alhelewh. 2014. KSUCCA: A key to exploring Arabic historical linguistics. International Journal of Computational Linguistics 5, 2 (2014), 27--36.Google ScholarGoogle Scholar
  4. Eric Atwell, Nizar Habash, Bill Louw, Bayan Abu Shawar, Tony McEnery, Wajdi Zaghouani, and Mahmoud El-Haj. 2010. Understanding the Quran: A new grand challenge for computer science and artificial intelligence. In Proceedings of the Conference on Grand Challenges in Computing Research (GCCR’10).Google ScholarGoogle Scholar
  5. Yonatan Belinkov, Alexander Magidow, Alberto Barrón-Cedeño, Avi Shmidman, and Maxim Romanov. 2019. Studying the history of the Arabic language: Language technology and a large-scale historical corpus. Language Resources and Evaluation 53 (2019), 771--805.Google ScholarGoogle ScholarCross RefCross Ref
  6. Hoa Trang Dang, Diane Kelly, and Jimmy Lin. 2007. Overview of the TREC 2007 question answering track. In Proceedings of the 15th Text REtrieval Conference (TREC’07).Google ScholarGoogle Scholar
  7. Hoa Trang Dang, Jimmy Lin, and Diane Kelly. 2006. Overview of the TREC 2006 question answering track. In Proceedings of the 14th Text REtrieval Conference (TREC’06).Google ScholarGoogle Scholar
  8. Aimad Hakkoum and Said Raghay. 2016. Semantic Q8A system on the Quran. Arabian Journal for Science and Engineering 41, 12 (Dec. 2016), 5205--5214. DOI:https://doi.org/10.1007/s13369-016-2251-yGoogle ScholarGoogle ScholarCross RefCross Ref
  9. M. A. Hamdelsayed and E. S. Atwell. 2016. Islamic applications of automatic question-answering. Journal of Engineering and Computer Science 17, 2 (2016), 51--57.Google ScholarGoogle Scholar
  10. Mohamed Adany Hamdelsayed and E. S. Atwell. 2016. Using Arabic numbers (singular, dual, and plurals) patterns to enhance question answering system results. In Proceedings of the 4th International Conference on Islamic Applications in Computer Science and Technologies (IMAN’16).Google ScholarGoogle Scholar
  11. Mohamed Adany Hamdelsayed, Ebtihal Mustafa Elamin Mohamed, MohamedAlmoayed TajAlsir Mohamed Saeed, Abakr Musa Ai, Edress Babiker Edress Mohamed Mhmoud, Maha Ali Mahmoud, Ahmed Shamat, and Eric Atwell. 2017. Islamic application of question answering systems: Comparative study. Journal of Advanced Computer Science and Technology Research 7, 1 (2017), 29--41.Google ScholarGoogle Scholar
  12. Suhaib Kh Hamed and Mohd Juzaiddin Ab Aziz. 2016. A question answering system on Holy Quran translation based on question expansion technique and neural network classification. Journal of Computer Science 12, 3 (2016), 169--177.Google ScholarGoogle ScholarCross RefCross Ref
  13. Bothaina Hamoud and Eric Atwell. 2016. Using an Islamic question and answer knowledge base to answer questions about the Holy Quran. International Journal on Islamic Applications in Computer Science And Technology 4, 4 (2016), 20--29.Google ScholarGoogle Scholar
  14. Bothaina Hamoud and Eric Atwell. 2017. Evaluation corpus for restricted-domain question-answering systems for the Holy Quran. International Journal of Science and Research 6, 8 (2017), 1133--1138.Google ScholarGoogle Scholar
  15. Clive Holes. 2004. Modern Arabic: Structures, Functions, and Varieties. Georgetown University Press.Google ScholarGoogle Scholar
  16. Aisha Jilani. 2013. Parallel Corpus Multi Stream Question Answering with Applications to the Qu’ran. Ph.D. Dissertation. University of Huddersfield.Google ScholarGoogle Scholar
  17. J. Richard Landis and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics 33, 1 (1977), 159--174.Google ScholarGoogle Scholar
  18. Jimmy Lin and Boris Katz. 2006. Building a reusable test collection for question answering. Journal of the American Society for Information Science and Technology 57, 7 (2006), 851--861.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Karim Ouda. 2015. QuranAnalysis: A Semantic Search and Intelligence System for the Quran. Ph.D. Dissertation. University of Leeds, Leeds, UK.Google ScholarGoogle Scholar
  20. Hamed Zakeri Rad, Sabrina Tiun, and Saidah Saad. 2018. Lexical scoring system of lexical chain for quranic document retrieval. GEMA Online® Journal of Language Studies 18, 2 (2018), 59--79.Google ScholarGoogle Scholar
  21. Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know what you don’t know: Unanswerable questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 784--789. https://www.aclweb.org/anthology/papers/P/P18/P18-2124/.Google ScholarGoogle ScholarCross RefCross Ref
  22. Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2383--2392. DOI:https://doi.org/10.18653/v1/D16-1264Google ScholarGoogle ScholarCross RefCross Ref
  23. Abdul-Baquee M. Sharaf and Eric Atwell. 2012. QurAna: Corpus of the Quran annotated with pronominal anaphora. In Proceedings of the 8th Conference on International Language Resources and Evaluation (LREC’12). 130--137.Google ScholarGoogle Scholar
  24. H. Shmeisani, S. Tartir, A. Al-Na’ssaan, and M. Naji. 2014. Semantically answering questions from the Holy Quran. In Proceedings of the 2nd International Conference on Islamic Applications in Computer Science and Technology. 1--8.Google ScholarGoogle Scholar
  25. Julius Sim and Chris C. Wright. 2005. The kappa statistic in reliability studies: Use, interpretation, and sample size requirements. Physical Therapy 85, 3 (2005), 257--268.Google ScholarGoogle ScholarCross RefCross Ref
  26. Ellen M. Voorhees. 2003. Overview of the TREC 2003 question answering track. In Proceedings of the 11th Text REtrieval Conference (TREC’03).Google ScholarGoogle Scholar
  27. Ellen M. Voorhees. 2004. Overview of the TREC 2004 question answering track. In Proceedings of the 12th Text REtrieval Conference (TREC’04). 54--68.Google ScholarGoogle Scholar
  28. Ellen M. Voorhees and Hoa Trang Dang. 2005. Overview of the TREC 2005 question answering track. In Proceedings of the 13th Text REtrieval Conference (TREC’05). 52--62.Google ScholarGoogle Scholar
  29. Ellen M. Voorhees and Dawn M. Tice. 2000. Building a question answering test collection. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 200--207.Google ScholarGoogle Scholar
  30. Aliyu Rufai Yauri, Rabiah Abdul Kadir, Azreen Azman, and M. A. Azmi Murad. 2013. Quranic verse extraction base on concepts using OWL-DL ontology. Research Journal of Applied Sciences, Engineering and Technology 6, 23 (2013), 4492--4498.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. AyaTEC: Building a Reusable Verse-Based Test Collection for Arabic Question Answering on the Holy Qur’an

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Asian and Low-Resource Language Information Processing
          ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 19, Issue 6
          November 2020
          277 pages
          ISSN:2375-4699
          EISSN:2375-4702
          DOI:10.1145/3426881
          Issue’s Table of Contents

          Copyright © 2020 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 2 October 2020
          • Accepted: 1 May 2020
          • Revised: 1 March 2020
          • Received: 1 October 2019
          Published in tallip Volume 19, Issue 6

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!