skip to main content
research-article

I3rab: A New Arabic Dependency Treebank Based on Arabic Grammatical Theory

Authors Info & Claims
Published:18 November 2021Publication History
Skip Abstract Section

Abstract

Treebanks are valuable linguistic resources that include the syntactic structure of a language sentence in addition to part-of-speech tags and morphological features. They are mainly utilized in modeling statistical parsers. Although the statistical natural language parser has recently become more accurate for languages such as English, those for the Arabic language still have low accuracy. The purpose of this article is to construct a new Arabic dependency treebank based on the traditional Arabic grammatical theory and the characteristics of the Arabic language, to investigate their effects on the accuracy of statistical parsers. The proposed Arabic dependency treebank, called I3rab, contrasts with existing Arabic dependency treebanks in two main concepts. The first concept is the approach of determining the main word of the sentence, and the second concept is the representation of the joined and covert pronouns. To evaluate I3rab, we compared its performance against a subset of Prague Arabic Dependency Treebank that shares a comparable level of details. The conducted experiments show that the percentage improvement reached up to 10.24% in UAS and 18.42% in LAS.

REFERENCES

  1. Alosh M.. 2005. Using Arabic: A Guide to Contemporary Usage. Cambridge University Press, Cambridge, UK.Google ScholarGoogle ScholarCross RefCross Ref
  2. Alotaiby F., Foda S., and Alkharashi I.. 2010. Clitics in Arabic language: A statistical study. In Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation. 595601.Google ScholarGoogle Scholar
  3. Al-Sughaiyer I. A. and Al-Kharashi I. A.. 2004. Arabic morphological analysis techniques: A comprehensive survey. J. Amer. Soc. Info. Sci. Technol. 55, 3 (2004), 189213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Athreya R. G., Bansal S. K., Ngomo A. C. N., and Usbeck R.. 2021. Template-based question answering using recursive neural networks. In Proceedings of the IEEE 15th International Conference on Semantic Computing (ICSC’21). IEEE, 195198.Google ScholarGoogle ScholarCross RefCross Ref
  5. Attia M.. 2007. Arabic tokenization system. In Proceedings of the Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources. 6572. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Attia M. A.. 2008. Handling Arabic Morphological and Syntactic Ambiguity within the LFG Framework with a View to Machine Translation. The University of Manchester, UK.Google ScholarGoogle Scholar
  7. Awajan A.. 2007. Arabic text preprocessing for the natural language processing applications. Arab Gulf J. Sci. Res. 25, 4 (2007), 179189.Google ScholarGoogle Scholar
  8. Awajan A.. 2015. Keyword extraction from Arabic documents using term equivalence classes. ACM Trans. Asian Low-Res. Lang. Info. Process. 14, 2 (2015), 118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bikel D. M.. 2004. On the Parameter Space of Generative Lexicalized Statistical Parsing Models. University of Pennsylvania. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Böhmová A., Hajič J., Hajičová E., and Hladká B.. 2003. The Prague dependency treebank. In Treebanks. Springer, Dordrecht, 103127.Google ScholarGoogle Scholar
  11. Buchholz S. and Marsi E.. 2006. CoNLL-X shared task on multilingual dependency parsing. In Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X’06). 149164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chomsky N.. 2009. Syntactic Structures. De Gruyter Mouton, Berlin.Google ScholarGoogle Scholar
  13. Civit M. and Martı M.. 2005. Building Cast3LB: A Spanish treebank, Res. Lang. Comput. 2, 4 (2005), 549574.Google ScholarGoogle ScholarCross RefCross Ref
  14. Civit M., Bufí N., and Valverde P.. 2004. Cat3LB: A treebank for Catalan with word sense annotation. In Proceedings of the 3rd Workshop on Treebanks and Linguistic Theories. Tuebingen, Germany.Google ScholarGoogle Scholar
  15. Collins M.. 2003. Head-driven statistical models for natural language parsing. Comput. Linguist. 29, 4 (2003), 589637. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Comas P. R., Turmo J., and Márquez L.. 2010. Using dependency parsing and machine learning for factoid question answering on spoken documents. In Proceedings of the 11th Annual Conference of the International Speech Communication Association.Google ScholarGoogle ScholarCross RefCross Ref
  17. Dukes K. and Buckwalter T.. 2010. A dependency treebank of the Quran using traditional Arabic grammar. In Proceedings of the 7th International Conference on Informatics and Systems (INFOS’10). IEEE, 17.Google ScholarGoogle Scholar
  18. Dukes K. and Habash N.. 2011. One-step statistical parsing of hybrid dependency-constituency syntactic representations. In Proceedings of the 12th International Conference on Parsing Technologies. 92103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Dukes K.. 2015. Statistical parsing by machine learning from a classical Arabic treebank. Retrieved from https://arXiv:1510.07193.Google ScholarGoogle Scholar
  20. Fayyoumi E. and Idwan S.. 2021. Semantic partitioning and machine learning in sentiment analysis. Data 6 (2021), 67. DOI: https://doi.org/10.3390/data6060067Google ScholarGoogle ScholarCross RefCross Ref
  21. Frank A., Zaenen A., and Hinrichs E.. 2012. Treebanks: Linking linguistic theory to computational linguistics. Linguist. Iss. Lang. Technol. 7, 1 (2012).Google ScholarGoogle Scholar
  22. Galley M. and Manning C. D.. 2009. Quadratic-time dependency parsing for machine translation. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 773781. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Gillenwater J., He X., Gao J., and Deng L.. 2013. End-to-end learning of parsing models for information retrieval. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 33123316.Google ScholarGoogle ScholarCross RefCross Ref
  24. Bharati A., Chaitanya V., and Sangal R.. 1995. Natural language processing: A Paninian perspective. Indian Institute of Technology, Kanpur. New Delhi: Prentice-Hall of India.Google ScholarGoogle Scholar
  25. Habash N. and Roth R.. 2009. Catib: The Columbia Arabic treebank. In Proceedings of the ACL-IJCNLP Conference Short Papers. 221224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hajic J., Smrz O., Zemánek P., Šnaidauf J., and Beška E.. 2004. Prague Arabic dependency treebank: Development in data and tools. In Proceedings of the NEMLAR International Conference on Arabic Language Resources and Tools. 110117.Google ScholarGoogle Scholar
  27. Halabi D., Awajan A., and Fayyoumi E.. 2017. Arabic LFG-inspired dependency treebank. In Proceedings of the International Conference on New Trends in Computing Sciences (ICTCS’17). IEEE, 207215.Google ScholarGoogle ScholarCross RefCross Ref
  28. Halabi D., Awajan A., and Fayyoumi E.. 2020. Improving Arabic dependency parsers by using dependency relations. In Proceedings of the 21st International Arab Conference on Information Technology (ACIT’20). IEEE, 17.Google ScholarGoogle ScholarCross RefCross Ref
  29. Halabi D., Awajan A., and Fayyoumi E.. 2021. Syntactic annotation in the I3rab dependency treebank. Int. Arab. J. Inf. Technol 18, 1 (2021).Google ScholarGoogle Scholar
  30. Hall J., Nilsson J., and Nivre J.. 2013. Malteval. Retrieved from http://www.maltparser.org/malteval.html.Google ScholarGoogle Scholar
  31. Han C. H., Han N. R., Ko E. S., Martha P., and Heejong Y.. 2002. Penn Korean treebank: Development and evaluation. In Proceedings of the Korean Society for Language and Information Conference. Korean Society for Language and Information, 6978.Google ScholarGoogle Scholar
  32. Katz-Brown J., Petrov S., McDonald R., Och F. J., Talbot D., Ichikawa H., Seno M., and Kazawa H.. 2011. Training a parser for machine translation reordering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 183192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Kübler S., McDonald R., and Nivre J.. 2009. Dependency parsing. Synth. Lect. Hum. Lang. Technol. 1, 1 (2009), 1127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Kulick S., Gabbard R., and Marcus M.. 2006. Parsing the Arabic treebank: Analysis and improvements. In Proceedings of the Treebanks and Linguistic Theories Conference. 3142.Google ScholarGoogle Scholar
  35. LDC. 2004a. Buckwalter Arabic morphological analyzer version. Retrieved from https://catalog.ldc.upenn.edu/LDC2004L02.Google ScholarGoogle Scholar
  36. LDC. 2004b. Prague Arabic dependency treebank 1.0. Retrieved from https://catalog.ldc.upenn.edu/docs/LDC2004T23/.Google ScholarGoogle Scholar
  37. LDC. 2007. Prague Arabic dependency treebank ++. Retrieved from http://padt-online.blogspot.com/2007/01/conll-shared-task-2007.html.Google ScholarGoogle Scholar
  38. LDC. 2018. 2007 CoNLL shared task—Arabic and English. Retrieved from https://catalog.ldc.upenn.edu/LDC2018T08.Google ScholarGoogle Scholar
  39. Li H. and Xu F.. 2016. Question answering with DBpedia based on the dependency parser and entity-centric index. In Proceedings of the International Conference on Computational Intelligence and Applications (ICCIA’16). IEEE, 4145.Google ScholarGoogle ScholarCross RefCross Ref
  40. Lynn T.. 2016. Irish Dependency Treebanking and Parsing. PhD Thesis, Dublin City University and Macquarie University Sydney, 2016.Google ScholarGoogle Scholar
  41. Maamouri M., Bies A., Buckwalter T., and Mekki W.. 2004. The Penn Arabic treebank: Building a large-scale annotated Arabic corpus. In Proceedings of the NEMLAR Conference on Arabic Language Resources and Tools. 466467.Google ScholarGoogle Scholar
  42. Maamouri M., Bies A., Krouna S., Gaddeche F., and Bouziri B.. 2009. Penn Arabic treebank guidelines. In Proceedings of the Linguistic Data Consortium.Google ScholarGoogle Scholar
  43. Marcus M., Santorini B., and Marcinkiewicz M. A.. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19 (1993), 313330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. McCaffery M. and Nederhof M. J.. 2016. DTED: Evaluation of machine translation structure using dependency parsing and tree edit distance. In Proceedings of the 1st Conference on Machine Translation. 491498.Google ScholarGoogle ScholarCross RefCross Ref
  45. Nilsson J. and Nivre J.. 2008. MaltEval: An Evaluation and visualization tool for dependency parsing. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’08).Google ScholarGoogle Scholar
  46. Nivre J.. 2005. Dependency grammar and dependency parsing. MSI Report 5133, (1959) 132.Google ScholarGoogle Scholar
  47. Nivre J.. 2009. Parsing Indian languages with maltparser. Proceedings of the ICON09 NLP Tools Contest: Indian Language Dependency Parsing. 1218.Google ScholarGoogle Scholar
  48. Nivre J., Hall J., and Nilsson J.. 2006. Maltparser: A data-driven parser-generator for dependency parsing. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’06). 22162219.Google ScholarGoogle Scholar
  49. Nivre J., Hall J., Kübler S., McDonald R., Nilsson J., Riedel S., and Yuret D.. 2007. The CoNLL 2007 shared task on dependency parsing. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07). 915932.Google ScholarGoogle Scholar
  50. Nivre J., Hall J., Nilsson J., Chanev A., Eryigit G., Kübler S., Marinov S., and Marsi E.. 2007. MaltParser: A language-independent system for data-driven dependency parsing. Natural Lang. Eng. 13, 2 (2007), 95.Google ScholarGoogle ScholarCross RefCross Ref
  51. Nivre. 2018. MaltParser. Retrieved from http://www.maltparser.org/.Google ScholarGoogle Scholar
  52. Owczarzak K., Van Genabith J., and Way A.. 2007. Dependency-based automatic evaluation for machine translation. In Proceedings of the Workshop on Syntax and Structure in Statistical Translation (SSST’07). 8087. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Owens J.. 1988. The Foundations of Grammar. Benjamins, Amsterdam.Google ScholarGoogle ScholarCross RefCross Ref
  54. Owens J.. 1990. Early Arabic Grammatical Theory: Heterogeneity and Standardization. John Benjamins Publishing, Amsterdam.Google ScholarGoogle ScholarCross RefCross Ref
  55. Renduchintala A. and Williams A.. 2021. Investigating failures of automatic translation in the case of unambiguous gender. Retrieved from https://arXiv:2104.07838.Google ScholarGoogle Scholar
  56. Ryding K. C.. 2005. A Reference Grammar of Modern Standard Arabic. Cambridge University Press, Cambridge, UK.Google ScholarGoogle ScholarCross RefCross Ref
  57. Sima'an K., Itai A., Winter Y., Altman A., and Nativ N.. 2001. Building a treebank of modern Hebrew text. Traitement Automatique des Langues 42, 2 (2001), 247380.Google ScholarGoogle Scholar
  58. Smrz O. and Pajas P.. 2004. Morphotrees of Arabic and their annotation in the TrEd environment. In Proceedings of the NEMLAR International Conference on Arabic Language Resources and Tools. 3841.Google ScholarGoogle Scholar
  59. Smrz O., Bielicky V., and Hajic J.. 2008. Prague Arabic dependency treebank: A word on the million words. In Proceedings of the Workshop on Arabic and Local Languages (LREC 2008), Marrakech, Morocco. 16--23.Google ScholarGoogle Scholar
  60. Smrz O., Šnaidauf J., and Zemánek P.. 2002. Prague dependency treebank for Arabic: Multi-level annotation of Arabic corpus. In Proceedings of the International Symposium on Processing of Arabic. 147155.Google ScholarGoogle Scholar
  61. Solberg P. E., Skjærholt A., Øvrelid L., Hagen K., and Johannessen J. B.. 2014. The Norwegian dependency treebank. In Proceedings of the International Conference on Language Resources and Evaluation (LREC).Google ScholarGoogle Scholar
  62. Tratz S. C.. 2016. ARLArabic Dependency Treebank. U.S. Army Research Laboratory, Adelphi, MD.Google ScholarGoogle ScholarCross RefCross Ref
  63. Volk M., Gustafson-Capková S., Hagstrand D., and Uibo H.. 2005. Teaching treebanking. Nordisk Sprogteknologi 2004: 2004: Aarbog for Nordisk Sprogteknologisk Forskningsprogram 2000–2004, 143.Google ScholarGoogle Scholar
  64. Xia F. and Palmer M.. 2001. Converting Dependency Structures to Phrase Structures. Pennsylvania University, Philadelphia, PA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Xue N., Zhang X., Jiang Z., Palmer M., Xia F., Chiou F. D. and Chang M.. 2013. Chinese Treebank 8.0 LDC2013T21. In Proceedings of the Linguistic Data Consortium.Google ScholarGoogle Scholar
  66. Yu H., Wu X., Jiang W., Liu Q., and Lin S.. 2015. An automatic machine translation evaluation metric based on dependency parsing model. Retrieved from https://arXiv:1508.01996.Google ScholarGoogle Scholar

Index Terms

  1. I3rab: A New Arabic Dependency Treebank Based on Arabic Grammatical Theory

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 2
      March 2022
      413 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3494070
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 November 2021
      • Accepted: 1 June 2021
      • Revised: 1 April 2021
      • Received: 1 April 2020
      Published in tallip Volume 21, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)76
      • Downloads (Last 6 weeks)2

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!