skip to main content
research-article

Extracting Arabic Composite Names Using Genitive Principles of Arabic Grammar

Published:07 June 2020Publication History
Skip Abstract Section

Abstract

Named Entity Recognition (NER) is a basic prerequisite of using Natural Language Processing (NLP) for information retrieval. Arabic NER is especially challenging as the language is morphologically rich and has short vowels with no capitalisation convention. This article presents a novel rule-based approach that uses linguistic grammar-based techniques to extract Arabic composite names from Arabic text. Our approach uniquely exploits the genitive Arabic grammar rules; in particular, the rules regarding the identification of definite nouns (معرفة) and indefinite nouns (نكرة) to support the process of extracting composite names. Based on domain knowledge and Arabic Genitive Rules (AGR), the developed approach formalises a set of syntactical rules and linguistic patterns that initially use genitive patterns to classify definiteness within phrases and then extracts proper composite names from the unstructured text. The developed novel approach does not place any constraints on the length of the Arabic composite name and our initial experimentation demonstrated high recall and precision results when the NER algorithm was applied to a financial domain corpus.

References

  1. S. Alanazi. 2017. A named entity recognition system applied to Arabic text in the medical domain. Doctoral dissertation, Staffordshire University.Google ScholarGoogle Scholar
  2. M. Ali, G. Tan, and A. Hussain. 2018. Bidirectional recurrent neural network approach for Arabic named entity recognition. Future Internet 10, 123.Google ScholarGoogle ScholarCross RefCross Ref
  3. M. Alruily, A. Ayesh, and H. Zedan. 2014. Crime profiling for the Arabic language using computational linguistic techniques. Information Processing 8 Management 50, 315--341.Google ScholarGoogle Scholar
  4. Y. Benajiba, M. Diab, and P. Rosso. 2009. Arabic named entity recognition: A feature-driven study. IEEE Transactions on Audio, Speech, and Language Processing 17, 926--934.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Bizer, T. Heath, and T. Berners-Lee. 2011. Linked data: The story so far. In Semantic Services, Interoperability and Web Applications: Emerging Concepts. IGI Global, 205--227.Google ScholarGoogle Scholar
  6. T. Buckwalter. 2004. Buckwalter Arabic morphological analyzer version 2.0, linguistic data consortium (LDC) catalog No LDC2004L02. 2019.Google ScholarGoogle Scholar
  7. A. Elsebai, F. Meziane, and F. Z. Belkredim. 2009. A rule based persons names Arabic extraction system. Communications of the IBIMA 11, 53--59.Google ScholarGoogle Scholar
  8. S. Green and C. D. Manning. 2010. Better Arabic parsing: Baselines, evaluations, and analysis. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 394--402.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. M. Harmain, H. El Khatib, and A. Lakas. 2004. Arabic text mining. In Proceedings of the International Conference on Applied Computing (IADIS). 23--27.Google ScholarGoogle Scholar
  10. Z. Harris. 1991. Theory of language and information: A mathematical approach. Oxford University Press UK.Google ScholarGoogle Scholar
  11. G. Kanaan, R. Al-Shalabi, and M. Sawalha. 2005. Improving Arabic information retrieval systems using part of speech tagging. Information Technology Journal 4, 32--37.Google ScholarGoogle ScholarCross RefCross Ref
  12. H. Khalil and T. Osman. 2014. Challenges in information retrieval from unstructured Arabic data. In UKSim. 456--461.Google ScholarGoogle Scholar
  13. Maknaz. 2018. Maknaz - Expanded Arabic Thesaurus. Retrieved from http://maknaz.org/.Google ScholarGoogle Scholar
  14. N. Omar and Q. Al-tashi. 2018. Arabic nested noun compound extraction based on linguistic features and statistical measures. GEMA Online® Journal of Language Studies 18.Google ScholarGoogle Scholar
  15. M. Oudah and K. Shaalan. 2012. A pipeline Arabic named entity recognition using a hybrid approach. In Proceedings of COLING 2012. 2159--2176.Google ScholarGoogle Scholar
  16. H. S. Rabiee. 2011. Adapting standard open-source resources to tagging a morphologically rich language: A case study with Arabic. In Proceedings of the 2nd Student Research Workshop associated with RANLP 2011. 127--132.Google ScholarGoogle Scholar
  17. S. K. Ray and K. Shaalan. 2016. A review and future perspectives of Arabic question answering systems. IEEE Transactions on Knowledge and Data Engineering 28, 3169--3190.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Rodrigues and A. Teixeira. 2015. Advanced Applications of Natural Language Processing for Performing Information Extraction. Springer.Google ScholarGoogle Scholar
  19. M. K. Saad and W. Ashour. 2010. OSAC: Open source Arabic corpora. In Proceedings of the 6th ArchEng International Symposiums, EEECS.Google ScholarGoogle Scholar
  20. A. M. Sayed, S. Abdou, M. Rashwan, and H. Al-Barhamtoshy. 2019. RANER: RDI framework for Arabic named entity recognition. International Journal of Engineering 8 Technology 8, 1.11 (2009), 161--164.Google ScholarGoogle Scholar
  21. K. Shaalan and H. Raza. 2009. NERA: Named entity recognition for Arabic. Journal of the American Society for Information Science and Technology 60, 1652--1663.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Shaalan. 2014. A survey of Arabic named entity recognition and classification. Computational Linguistics 40, 469--510.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. Shaalan and H. Raza. 2008. Arabic named entity recognition from diverse text types. In Advances in Natural Language Processing. Springer, 440--451.Google ScholarGoogle Scholar
  24. K. Shaalan and H. Raza. 2007. Person name entity recognition for Arabic. In Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources. Association for Computational Linguistics, 17--24.Google ScholarGoogle Scholar
  25. K. F. Shaalan. 2005. Arabic gramcheck: A grammar checker for Arabic. Software: Practice and Experience 35, 643--665.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. H. Traboulsi. 2009. Arabic named entity extraction: A local grammar-based approach. In Proceedings of the International Multiconference on Computer Science and Information Technology, 2009 (IMCSIT'09). IEEE, 139--143.Google ScholarGoogle Scholar
  27. V. Yadav and S. Bethard. 2018. A survey on recent advances in named entity recognition from deep learning models. In Proceedings of the 27th International Conference on Computational Linguistics. 2145--2158.Google ScholarGoogle Scholar
  28. W. Zaghouani. 2012. RENAR: A rule-based Arabic named entity recognition system. ACM Transactions on Asian Language Information Processing (TALIP) 11, 2.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Zaidi, M. T. Laskri, and A. Abdelali. 2010. Arabic collocations extraction using gate. In Proceedings of the 2010 International Conference on Machine and Web Intelligence (ICMWI). IEEE 473--475.Google ScholarGoogle Scholar
  30. O. H. Zayed, S. R. El-Beltagy, and O. Haggag. 2013. A novel approach for detecting Arabic persons' names using limited resources. Research in Computing Science 70, 81--93.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Extracting Arabic Composite Names Using Genitive Principles of Arabic Grammar

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!