Abstract
Named Entity Recognition (NER) is a basic prerequisite of using Natural Language Processing (NLP) for information retrieval. Arabic NER is especially challenging as the language is morphologically rich and has short vowels with no capitalisation convention. This article presents a novel rule-based approach that uses linguistic grammar-based techniques to extract Arabic composite names from Arabic text. Our approach uniquely exploits the genitive Arabic grammar rules; in particular, the rules regarding the identification of definite nouns (معرفة) and indefinite nouns (نكرة) to support the process of extracting composite names. Based on domain knowledge and Arabic Genitive Rules (AGR), the developed approach formalises a set of syntactical rules and linguistic patterns that initially use genitive patterns to classify definiteness within phrases and then extracts proper composite names from the unstructured text. The developed novel approach does not place any constraints on the length of the Arabic composite name and our initial experimentation demonstrated high recall and precision results when the NER algorithm was applied to a financial domain corpus.
- S. Alanazi. 2017. A named entity recognition system applied to Arabic text in the medical domain. Doctoral dissertation, Staffordshire University.Google Scholar
- M. Ali, G. Tan, and A. Hussain. 2018. Bidirectional recurrent neural network approach for Arabic named entity recognition. Future Internet 10, 123.Google Scholar
Cross Ref
- M. Alruily, A. Ayesh, and H. Zedan. 2014. Crime profiling for the Arabic language using computational linguistic techniques. Information Processing 8 Management 50, 315--341.Google Scholar
- Y. Benajiba, M. Diab, and P. Rosso. 2009. Arabic named entity recognition: A feature-driven study. IEEE Transactions on Audio, Speech, and Language Processing 17, 926--934.Google Scholar
Digital Library
- C. Bizer, T. Heath, and T. Berners-Lee. 2011. Linked data: The story so far. In Semantic Services, Interoperability and Web Applications: Emerging Concepts. IGI Global, 205--227.Google Scholar
- T. Buckwalter. 2004. Buckwalter Arabic morphological analyzer version 2.0, linguistic data consortium (LDC) catalog No LDC2004L02. 2019.Google Scholar
- A. Elsebai, F. Meziane, and F. Z. Belkredim. 2009. A rule based persons names Arabic extraction system. Communications of the IBIMA 11, 53--59.Google Scholar
- S. Green and C. D. Manning. 2010. Better Arabic parsing: Baselines, evaluations, and analysis. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 394--402.Google Scholar
Digital Library
- H. M. Harmain, H. El Khatib, and A. Lakas. 2004. Arabic text mining. In Proceedings of the International Conference on Applied Computing (IADIS). 23--27.Google Scholar
- Z. Harris. 1991. Theory of language and information: A mathematical approach. Oxford University Press UK.Google Scholar
- G. Kanaan, R. Al-Shalabi, and M. Sawalha. 2005. Improving Arabic information retrieval systems using part of speech tagging. Information Technology Journal 4, 32--37.Google Scholar
Cross Ref
- H. Khalil and T. Osman. 2014. Challenges in information retrieval from unstructured Arabic data. In UKSim. 456--461.Google Scholar
- Maknaz. 2018. Maknaz - Expanded Arabic Thesaurus. Retrieved from http://maknaz.org/.Google Scholar
- N. Omar and Q. Al-tashi. 2018. Arabic nested noun compound extraction based on linguistic features and statistical measures. GEMA Online® Journal of Language Studies 18.Google Scholar
- M. Oudah and K. Shaalan. 2012. A pipeline Arabic named entity recognition using a hybrid approach. In Proceedings of COLING 2012. 2159--2176.Google Scholar
- H. S. Rabiee. 2011. Adapting standard open-source resources to tagging a morphologically rich language: A case study with Arabic. In Proceedings of the 2nd Student Research Workshop associated with RANLP 2011. 127--132.Google Scholar
- S. K. Ray and K. Shaalan. 2016. A review and future perspectives of Arabic question answering systems. IEEE Transactions on Knowledge and Data Engineering 28, 3169--3190.Google Scholar
Digital Library
- M. Rodrigues and A. Teixeira. 2015. Advanced Applications of Natural Language Processing for Performing Information Extraction. Springer.Google Scholar
- M. K. Saad and W. Ashour. 2010. OSAC: Open source Arabic corpora. In Proceedings of the 6th ArchEng International Symposiums, EEECS.Google Scholar
- A. M. Sayed, S. Abdou, M. Rashwan, and H. Al-Barhamtoshy. 2019. RANER: RDI framework for Arabic named entity recognition. International Journal of Engineering 8 Technology 8, 1.11 (2009), 161--164.Google Scholar
- K. Shaalan and H. Raza. 2009. NERA: Named entity recognition for Arabic. Journal of the American Society for Information Science and Technology 60, 1652--1663.Google Scholar
Digital Library
- K. Shaalan. 2014. A survey of Arabic named entity recognition and classification. Computational Linguistics 40, 469--510.Google Scholar
Digital Library
- K. Shaalan and H. Raza. 2008. Arabic named entity recognition from diverse text types. In Advances in Natural Language Processing. Springer, 440--451.Google Scholar
- K. Shaalan and H. Raza. 2007. Person name entity recognition for Arabic. In Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources. Association for Computational Linguistics, 17--24.Google Scholar
- K. F. Shaalan. 2005. Arabic gramcheck: A grammar checker for Arabic. Software: Practice and Experience 35, 643--665.Google Scholar
Digital Library
- H. Traboulsi. 2009. Arabic named entity extraction: A local grammar-based approach. In Proceedings of the International Multiconference on Computer Science and Information Technology, 2009 (IMCSIT'09). IEEE, 139--143.Google Scholar
- V. Yadav and S. Bethard. 2018. A survey on recent advances in named entity recognition from deep learning models. In Proceedings of the 27th International Conference on Computational Linguistics. 2145--2158.Google Scholar
- W. Zaghouani. 2012. RENAR: A rule-based Arabic named entity recognition system. ACM Transactions on Asian Language Information Processing (TALIP) 11, 2.Google Scholar
Digital Library
- S. Zaidi, M. T. Laskri, and A. Abdelali. 2010. Arabic collocations extraction using gate. In Proceedings of the 2010 International Conference on Machine and Web Intelligence (ICMWI). IEEE 473--475.Google Scholar
- O. H. Zayed, S. R. El-Beltagy, and O. Haggag. 2013. A novel approach for detecting Arabic persons' names using limited resources. Research in Computing Science 70, 81--93.Google Scholar
Cross Ref
Index Terms
Extracting Arabic Composite Names Using Genitive Principles of Arabic Grammar
Recommendations
Extracting names from Arabic text for question-answering systems
RIAO '04: Coupling approaches, coupling media and coupling languages for information retrievalTagging and extracting proper names is an important key for improving the effectiveness of question-answering systems. The valuable information in the text usually is located around proper names, to collect this information it should be found first. By ...
NERA: Named Entity Recognition for Arabic
Name identification has been worked on quite intensively for the past few years, and has been incorporated into several products revolving around natural language processing tasks. Many researchers have attacked the name identification problem in a ...
Person name entity recognition for Arabic
Semitic '07: Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and ResourcesNamed entity recognition (NER) is nowadays an important task, which is responsible for the identification of proper names in text and their classification as different types of named entity such as people, locations, and organizations. In this paper, we ...






Comments