Abstract
Treebanks are valuable linguistic resources that include the syntactic structure of a language sentence in addition to part-of-speech tags and morphological features. They are mainly utilized in modeling statistical parsers. Although the statistical natural language parser has recently become more accurate for languages such as English, those for the Arabic language still have low accuracy. The purpose of this article is to construct a new Arabic dependency treebank based on the traditional Arabic grammatical theory and the characteristics of the Arabic language, to investigate their effects on the accuracy of statistical parsers. The proposed Arabic dependency treebank, called I3rab, contrasts with existing Arabic dependency treebanks in two main concepts. The first concept is the approach of determining the main word of the sentence, and the second concept is the representation of the joined and covert pronouns. To evaluate I3rab, we compared its performance against a subset of Prague Arabic Dependency Treebank that shares a comparable level of details. The conducted experiments show that the percentage improvement reached up to 10.24% in UAS and 18.42% in LAS.
- . 2005. Using Arabic: A Guide to Contemporary Usage. Cambridge University Press, Cambridge, UK.Google Scholar
Cross Ref
- . 2010. Clitics in Arabic language: A statistical study. In Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation. 595–601.Google Scholar
- . 2004. Arabic morphological analysis techniques: A comprehensive survey. J. Amer. Soc. Info. Sci. Technol. 55, 3 (2004), 189–213. Google Scholar
Digital Library
- . 2021. Template-based question answering using recursive neural networks. In Proceedings of the IEEE 15th International Conference on Semantic Computing (ICSC’21). IEEE, 195–198.Google Scholar
Cross Ref
- . 2007. Arabic tokenization system. In Proceedings of the Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources. 65–72. Google Scholar
Digital Library
- . 2008. Handling Arabic Morphological and Syntactic Ambiguity within the LFG Framework with a View to Machine Translation. The University of Manchester, UK.Google Scholar
- . 2007. Arabic text preprocessing for the natural language processing applications. Arab Gulf J. Sci. Res. 25, 4 (2007), 179–189.Google Scholar
- . 2015. Keyword extraction from Arabic documents using term equivalence classes. ACM Trans. Asian Low-Res. Lang. Info. Process. 14, 2 (2015), 1–18. Google Scholar
Digital Library
- . 2004. On the Parameter Space of Generative Lexicalized Statistical Parsing Models. University of Pennsylvania. Google Scholar
Digital Library
- . 2003. The Prague dependency treebank. In Treebanks. Springer, Dordrecht, 103–127.Google Scholar
- . 2006. CoNLL-X shared task on multilingual dependency parsing. In Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X’06). 149–164. Google Scholar
Digital Library
- . 2009. Syntactic Structures. De Gruyter Mouton, Berlin.Google Scholar
- . 2005. Building Cast3LB: A Spanish treebank, Res. Lang. Comput. 2, 4 (2005), 549–574.Google Scholar
Cross Ref
- . 2004. Cat3LB: A treebank for Catalan with word sense annotation. In Proceedings of the 3rd Workshop on Treebanks and Linguistic Theories. Tuebingen, Germany.Google Scholar
- . 2003. Head-driven statistical models for natural language parsing. Comput. Linguist. 29, 4 (2003), 589–637. Google Scholar
Digital Library
- . 2010. Using dependency parsing and machine learning for factoid question answering on spoken documents. In Proceedings of the 11th Annual Conference of the International Speech Communication Association.Google Scholar
Cross Ref
- . 2010. A dependency treebank of the Quran using traditional Arabic grammar. In Proceedings of the 7th International Conference on Informatics and Systems (INFOS’10). IEEE, 1–7.Google Scholar
- . 2011. One-step statistical parsing of hybrid dependency-constituency syntactic representations. In Proceedings of the 12th International Conference on Parsing Technologies. 92–103. Google Scholar
Digital Library
- . 2015. Statistical parsing by machine learning from a classical Arabic treebank. Retrieved from https://arXiv:1510.07193.Google Scholar
- . 2021. Semantic partitioning and machine learning in sentiment analysis. Data 6 (2021), 67.
DOI:
https://doi.org/10.3390/data6060067Google Scholar
Cross Ref
- . 2012. Treebanks: Linking linguistic theory to computational linguistics. Linguist. Iss. Lang. Technol. 7, 1 (2012).Google Scholar
- . 2009. Quadratic-time dependency parsing for machine translation. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 773–781. Google Scholar
Digital Library
- . 2013. End-to-end learning of parsing models for information retrieval. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 3312–3316.Google Scholar
Cross Ref
- . 1995. Natural language processing: A Paninian perspective. Indian Institute of Technology, Kanpur. New Delhi: Prentice-Hall of India.Google Scholar
- . 2009. Catib: The Columbia Arabic treebank. In Proceedings of the ACL-IJCNLP Conference Short Papers. 221–224. Google Scholar
Digital Library
- . 2004. Prague Arabic dependency treebank: Development in data and tools. In Proceedings of the NEMLAR International Conference on Arabic Language Resources and Tools. 110–117.Google Scholar
- . 2017. Arabic LFG-inspired dependency treebank. In Proceedings of the International Conference on New Trends in Computing Sciences (ICTCS’17). IEEE, 207–215.Google Scholar
Cross Ref
- . 2020. Improving Arabic dependency parsers by using dependency relations. In Proceedings of the 21st International Arab Conference on Information Technology (ACIT’20). IEEE, 1–7.Google Scholar
Cross Ref
- . 2021. Syntactic annotation in the I3rab dependency treebank. Int. Arab. J. Inf. Technol 18, 1 (2021).Google Scholar
- . 2013. Malteval. Retrieved from http://www.maltparser.org/malteval.html.Google Scholar
- . 2002. Penn Korean treebank: Development and evaluation. In Proceedings of the Korean Society for Language and Information Conference. Korean Society for Language and Information, 69–78.Google Scholar
- . 2011. Training a parser for machine translation reordering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 183–192. Google Scholar
Digital Library
- . 2009. Dependency parsing. Synth. Lect. Hum. Lang. Technol. 1, 1 (2009), 1–127. Google Scholar
Digital Library
- . 2006. Parsing the Arabic treebank: Analysis and improvements. In Proceedings of the Treebanks and Linguistic Theories Conference. 31–42.Google Scholar
- . 2004a. Buckwalter Arabic morphological analyzer version. Retrieved from https://catalog.ldc.upenn.edu/LDC2004L02.Google Scholar
- . 2004b. Prague Arabic dependency treebank 1.0. Retrieved from https://catalog.ldc.upenn.edu/docs/LDC2004T23/.Google Scholar
- . 2007. Prague Arabic dependency treebank ++. Retrieved from http://padt-online.blogspot.com/2007/01/conll-shared-task-2007.html.Google Scholar
- . 2018. 2007 CoNLL shared task—Arabic and English. Retrieved from https://catalog.ldc.upenn.edu/LDC2018T08.Google Scholar
- . 2016. Question answering with DBpedia based on the dependency parser and entity-centric index. In Proceedings of the International Conference on Computational Intelligence and Applications (ICCIA’16). IEEE, 41–45.Google Scholar
Cross Ref
- . 2016. Irish Dependency Treebanking and Parsing. PhD Thesis, Dublin City University and Macquarie University Sydney, 2016.Google Scholar
- . 2004. The Penn Arabic treebank: Building a large-scale annotated Arabic corpus. In Proceedings of the NEMLAR Conference on Arabic Language Resources and Tools. 466–467.Google Scholar
- . 2009. Penn Arabic treebank guidelines. In Proceedings of the Linguistic Data Consortium.Google Scholar
- . 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19 (1993), 313330. Google Scholar
Digital Library
- . 2016. DTED: Evaluation of machine translation structure using dependency parsing and tree edit distance. In Proceedings of the 1st Conference on Machine Translation. 491–498.Google Scholar
Cross Ref
- . 2008. MaltEval: An Evaluation and visualization tool for dependency parsing. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’08).Google Scholar
- . 2005. Dependency grammar and dependency parsing. MSI Report 5133, (1959) 1–32.Google Scholar
- . 2009. Parsing Indian languages with maltparser. Proceedings of the ICON09 NLP Tools Contest: Indian Language Dependency Parsing. 12–18.Google Scholar
- . 2006. Maltparser: A data-driven parser-generator for dependency parsing. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’06). 2216–2219.Google Scholar
- . 2007. The CoNLL 2007 shared task on dependency parsing. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07). 915–932.Google Scholar
- . 2007. MaltParser: A language-independent system for data-driven dependency parsing. Natural Lang. Eng. 13, 2 (2007), 95.Google Scholar
Cross Ref
- . 2018. MaltParser. Retrieved from http://www.maltparser.org/.Google Scholar
- . 2007. Dependency-based automatic evaluation for machine translation. In Proceedings of the Workshop on Syntax and Structure in Statistical Translation (SSST’07). 80–87. Google Scholar
Digital Library
- . 1988. The Foundations of Grammar. Benjamins, Amsterdam.Google Scholar
Cross Ref
- . 1990. Early Arabic Grammatical Theory: Heterogeneity and Standardization. John Benjamins Publishing, Amsterdam.Google Scholar
Cross Ref
- . 2021. Investigating failures of automatic translation in the case of unambiguous gender. Retrieved from https://arXiv:2104.07838.Google Scholar
- . 2005. A Reference Grammar of Modern Standard Arabic. Cambridge University Press, Cambridge, UK.Google Scholar
Cross Ref
- . 2001. Building a treebank of modern Hebrew text. Traitement Automatique des Langues 42, 2 (2001), 247–380.Google Scholar
- . 2004. Morphotrees of Arabic and their annotation in the TrEd environment. In Proceedings of the NEMLAR International Conference on Arabic Language Resources and Tools. 38–41.Google Scholar
- . 2008. Prague Arabic dependency treebank: A word on the million words. In Proceedings of the Workshop on Arabic and Local Languages (LREC 2008), Marrakech, Morocco. 16--23.Google Scholar
- . 2002. Prague dependency treebank for Arabic: Multi-level annotation of Arabic corpus. In Proceedings of the International Symposium on Processing of Arabic. 147–155.Google Scholar
- . 2014. The Norwegian dependency treebank. In Proceedings of the International Conference on Language Resources and Evaluation (LREC).Google Scholar
- . 2016. ARLArabic Dependency Treebank. U.S. Army Research Laboratory, Adelphi, MD.Google Scholar
Cross Ref
- . 2005. Teaching treebanking. Nordisk Sprogteknologi 2004: 2004: Aarbog for Nordisk Sprogteknologisk Forskningsprogram 2000–2004, 143.Google Scholar
- . 2001. Converting Dependency Structures to Phrase Structures. Pennsylvania University, Philadelphia, PA.Google Scholar
Digital Library
- . 2013. Chinese Treebank 8.0 LDC2013T21. In Proceedings of the Linguistic Data Consortium.Google Scholar
- . 2015. An automatic machine translation evaluation metric based on dependency parsing model. Retrieved from https://arXiv:1508.01996.Google Scholar
Index Terms
I3rab: A New Arabic Dependency Treebank Based on Arabic Grammatical Theory
Recommendations
Parsing Arabic using induced probabilistic context free grammar
The importance of the parsing task for NLP applications is well understood. However developing parsers remains difficult because of the complexity of the Arabic language. Most parsers are based on syntactic grammars that describe the syntactic ...
A Survey of Syntactic Parsers of Arabic Language
BDAW '16: Proceedings of the International Conference on Big Data and Advanced Wireless TechnologiesSyntactic parsing constitutes one of the most important stages for many Natural Language Processing applications such as Information Retrieval or Question Answering. We present a survey that covers almost all syntactic parsers of Arabic language ...
Improving Telugu Dependency Parsing using Combinatory Categorial Grammar Supertags
We show that Combinatory Categorial Grammar (CCG) supertags can improve Telugu dependency parsing. In this process, we first extract a CCG lexicon from the dependency treebank. Using both the CCG lexicon and the dependency treebank, we create a CCG ...






Comments