Abstract
How can we determine the semantic meaning of a word in relation to its context of appearance? We eventually have to grabble with this difficult question, as one of the paramount problems of Natural Language Processing (NLP). In other words, this issue is commonly defined as Word Sense Disambiguation (WSD). The latter is one of the crucial difficulties within the NLP field. In this respect, word vectors extracted from a neural network model have been successfully applied for resolving the WSD problem. Accordingly, this article presents an unprecedented method to disambiguate Arabic words according to both their contextual appearance in a source text and the era in which they emerged. In fact, in the few previous decades, many researchers have been grabbling with Arabic Word Sense Disambiguation.
It should be noted that the Arabic language can be divided into three major historical periods: old Arabic, middle-age Arabic, and contemporary Arabic. Actually, contemporary Arabic has proved to be the greatest concern of many researchers. The main gist of our work is to disambiguate Arabic words according to the historical period in which they appeared. To perform such a task, we suggest a method that deploys contextualized word embeddings to better gather valid syntactic and semantic information of the same word by taking into account its contextual uses. The preponderant thing is to convert both the senses and the contextual uses of an ambiguous item to vectors, then determine which of the possible conceptual meanings of the target word is closer to the given context.
- Ahmed Abdelali, Kareem Darwish, Nadir Durrani, and Hamdy Mubarak. 2016. Farasa: A fast and furious segmenter for arabic. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT’16).Google Scholar
Cross Ref
- Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Roland Vollgraf. 2019. FLAIR : An easy-to-use framework for state-of-the-art NLP. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL’19).Google Scholar
- Alan Akbik, Tanja Bergmann, and Rol Vollgraf. 2019. Pooled contextualized embeddings for named entity recognition. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL’19).Google Scholar
Cross Ref
- Alan Akbik, Duncan Blythe, and Roland Vollgraf. 2018. Contextual string embeddings for sequence labeling. In Proceedings of the International Conference on Computational Linguistics (COLING’18).Google Scholar
- Almoataz B. Al-Said. 2015. The historical arabic dictionary resources. J. Arab Lang. 129 (2015).Google Scholar
- Almoataz B. Al-Said and Lucía Medea-García. 2014. The historical arabic dictionary corpus and its suitability for a grammaticalization approach. In Proceedings of the 5th International Conference in Linguistics.Google Scholar
- Marwah Alian, Arafat Awajan, and Akram Al-Kouz. 2016. Arabic word sense disambiguation using wikipedia. Int. J. Comput. Info. Sci. 12 (2016), 857--867.Google Scholar
- Marwah Alian, Arafat Awajan, and Akram Al-Kouz. 2017. Arabic word sense disambiguation—Survey. In Proceedings of the International Conference on New Trends in Computing Sciences.Google Scholar
Cross Ref
- Ali Alkhatlan, Jugal Kalita, and Ahmed Alhaddad. 2018. Word sense disambiguation for arabic exploiting arabic wordnet and word embedding and word embedding. In Proceedings of the 4th International Conference On Arabic Compitational Linguistics (ACLing’18).Google Scholar
Cross Ref
- Jiang Bian, Bin Gao, and Tie-Yan Liu. 2014. Knowledge-powered deep learning for word embedding. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECMLPKDD’14).Google Scholar
Cross Ref
- Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Trans. Assoc. Comput. Ling. 5 (2017), 135--146.Google Scholar
Cross Ref
- Nadia Bouhriz, Faouzia Benabbou, and El Habib Ben Lahmar. 2016. Word sense disambiguation approach for arabic text. Int. J. Adv. Comput. Sci. and Appl. 7, 4 (2016).Google Scholar
- Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing:Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning.Google Scholar
Digital Library
- Arjun Das, Debasis Ganguly, and Utpal Garain. 2017. Named entity recognition with word embeddings and wikipedia categories for a low-resource language. ACM Trans. Asian Low-Res. Lang. Info. Process. 16, 3 (2017).Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL’19).Google Scholar
- O. Dongsuk, Sunjae Kwon, Kyungsun Kim, and Youngjoong Ko. 2018. WordSense disambiguation based on word similarity calculation using word vector representation from a knowledge-based graph. In Proceedings of the 27th International Conference on Computational Linguistics.Google Scholar
- Jibril Frej, Jean-Pierre Chevallet, and Didier Schwab. 2018. Enhancing translation language models with word embedding for information retrieval. Comput. Res. Repos. (2018), 1801.03844.Google Scholar
- Zellig S. Harris. 1954. Distributional structure. Word 10, 2--3 (1954).Google Scholar
Cross Ref
- Mustafa Jarrar. 2018. The Arabic Ontology Basics. Retrieved from http://www.jarrar.info/courses/Jarrar.LectureNotes.ArabicOntology.pdf.Google Scholar
- Rim Laatar, Chafik Aloulou, and Lamia Hadrich-Belguith. 2018. Word sense disambiguation to create a historical dictionary for arabic language. In Proceedings of the 8th International Conference on Computer Science and Information Technology (CSIT’18).Google Scholar
- Rim Laatar, Chafik Aloulou, and Lamia Hadrich-Belguith. 2018. Word2vec for arabic word sense disambiguation. In Proceedings of the International Conference on Natural Language 8 Information Systems (NLDB’18).Google Scholar
Cross Ref
- Rim Laatar, Chafik Aloulou, and Lamia Hadrich-Belguith. 2020. Towards a historical dictionary for arabic language. Int. J. Speech Technol. (2020). DOI:https://doi.org/10.1007/s10772-020-09704-zGoogle Scholar
- Minh Le, Marten Postma, Jacopo Urbani, and Piek Vossen. 2018. A deep dive into word sense disambiguation with LSTM. In Proceedings of the International Conference on Computational Linguistics.Google Scholar
- Yuncong Li, Cunxiang Yin, Ting Wei, Huiqiang Zhong, Jinchang Luo, Siqi Xu, and Xiaohui Wu. 2019. A joint model for aspect-category sentiment analysis with contextualized aspect embedding. Comput. Res. Repos. (2019), 1908.11017.Google Scholar
- Mohamed El Bachir Menai. 2014. Word sense disambiguation using evolutionary algorithms—Application to Arabic language. Comput. Hum. Behav. 41 (2014), 92--103.Google Scholar
Digital Library
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the International Conference on Learning Representations (ICLR’13).Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeffrey Adgate Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems.Google Scholar
- Andriy Mnih and Geoffrey E. Hinton. 2008. A scalable hierarchical distributed language model. In Advances in Neural Information Processing Systems. MIT Press.Google Scholar
- Korawit Orkphol and Wu Yang. 2019. Word sense disambiguation using cosine similarity collaborates with word2vec and wordnet. Big Data Anal. Artific. Intell. (2019). DOI:https://doi.org/10.3390/fi11050114Google Scholar
- Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14).Google Scholar
Cross Ref
- Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL’18).Google Scholar
Cross Ref
- Barbara Plank, Anders Søgaard, and Yoav Goldberg. 2016. Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. In Proceedings of the Association for Computational Linguistics (ACL’16).Google Scholar
Cross Ref
- Sanjana Ramprasad and James Maddox. 2019. CoKE : Word sense induction using contextualized knowledge embeddings. In Proceedings of the Spring Symposium on Combining Machine Learning with Knowledge Engineering.Google Scholar
- Nils Reimers, Benjamin Schiller, Tilman Beck, Johannes Daxenberger, Christian Stab, and Iryna Gurevych. 2019. Classification and clustering of arguments with contextualized word embeddings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Google Scholar
Cross Ref
- Motaz Saad and Wesam Ashour. 2010. OSAC: Open source arabic corpora. In Proceedings of the International Conference on Electrical and Computer Systems.Google Scholar
- Joaquim Santos, Juliano Terra, Bernardo Consoli, and Renata Vieira. 2019. Multidomain contextual embeddings for namedentity recognition. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF’19).Google Scholar
- Bianca Scarlini, Tommaso Pasini, and Roberto Navigli. 2020. SensEmBERT: Context-enhanced sense embeddings for multilingual word sense disambiguation. In TProceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’20).Google Scholar
Cross Ref
- Didier Schwab, Laurent Besacier, Jérémy Ferrero, and Frédéric Agnès. 2017. Using word embedding for cross-language plagiarism detection. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL’17).Google Scholar
- Miran Seok, Hye-Jeong Song, Chan-Young Park, Jong-Dae Kim, and Yu-Seop Kim. 2016. Named entity recognition using word embedding as a feature. Int. J. Softw. Eng. Appl. (2016). DOI:10.14257/IJSEIA.2016.10.2.08Google Scholar
- D. Shashavali, V. Vishwjeet, Rahul Kumar, Gaurav Mathur, Nikhil Nihal, Siddhartha Mukherjee, and Suresh Venkanagouda Patil. 2019. Sentence similarity techniques for short vs variable length text using word embeddings. Comput. Sist. 23, 3 (2019).Google Scholar
- Dima Suleiman, Arafat Awajan, and Nailah Al-Madi. 2017. Deep learning-based technique for plagiarism detection in arabic texts. In Proceedings of the International Conference on New Trends in Computing Sciences (ICTCS’17).Google Scholar
Cross Ref
- Dongfang Xu, Egoitz Laparra, and Steven Bethard. 2019. Pre-trained contextualized character embeddings lead to major improvements in time normalization: A detailed analysis. In Proceedings of the 8th Joint Conference on Lexical and Computational Semantics (SEM’19).Google Scholar
Cross Ref
- Dayu Yuan, Julian Richardson, Ryan Doherty, Colin Evans, and Eric Altendorf. 2016. Semi supervised word sense disambiguation with neural models. In Proceedings of the International Conference on Computational Linguistics (COLING’16).Google Scholar
- Anis Zouaghi, Laroussi Merhbene, and Mounir Zrigui. 2012. Combination of information retrieval methods with LESK algorithm for arabic word sense disambiguation. Artific. Intell. Rev. (2012). DOI:https://doi.org/10.1007/s10462-011-9249-3Google Scholar
- Anis Zouaghi, Laroussi Merhbene, and Mounir Zrigui. 2012. A hybrid approach for arabic word sense disambiguation. Int. J. Comput. Process. Lang. (2012). DOI:https://doi.org/10.1142/S1793840612400090Google Scholar
Index Terms
Disambiguating Arabic Words According to Their Historical Appearance in the Document Based on Recurrent Neural Networks
Recommendations
The Contribution of Selected Linguistic Markers for Unsupervised Arabic Verb Sense Disambiguation
Word sense disambiguation is the task of automatically determining the meaning of a polysemous word in a specific context. Word sense induction is the unsupervised clustering of word usages in a different context to distinguish senses and perform ...
Towards a historical dictionary for Arabic language
AbstractA historical dictionary is a language dictionary which studies the evolution of the construction of words and their meanings through the chronological stages the language has undergone. However, despite its richness, Arabic does not yet have a ...
Unsupervised translated word sense disambiguation in constructing bilingual lexical database
SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied ComputingThe performance of a machine translation system depends on the availability of bilingual lexical dictionary and completion of its word sense disambiguation performance. Word sense disambiguation plays a vital role in several applications such as machine ...






Comments