Abstract
Recent years have witnessed phenomenal developments worldwide in the field of NLP. But developments in Indian regional languages are very few compared to them. This work is a step towards the construction of a target word sense disambiguation system in Malayalam, which is the regional language of the state of Kerala, India. Word Sense Disambiguation/Determination refers to the task of correctly identifying the sense of an ambiguous word from its context. This is considered an AI-Complete problem in the field of Natural Language Processing. For this purpose, an exclusive corpus of 1,147 contexts of target ambiguous words has been created, which to the best of our knowledge is the first attempt in Malayalam. This work describes how the performance of an unsupervised LDA-based approach towards WSD could be improved using semantic features like synonyms and co-occurrence information.
- [1] . 2009. Word sense disambiguation: A survey. ACM Computing Surveys (CSUR) 41, 2 (2009), 1–69. Google Scholar
Digital Library
- [2] . 2019. Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Information Sciences 477, 15–29.Google Scholar
Cross Ref
- [3] . 2013. Document classification by topic labeling. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. 877–880. Google Scholar
Digital Library
- [4] . 2003. Latent Dirichlet allocation. The Journal of Machine Learning Research 3 (2003), 993–1022. Google Scholar
Digital Library
- [5] . 2007. A topic model for word sense disambiguation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07). 1024–1033.Google Scholar
- [6] . 2013. Unsupervised domain tuning to improve word sense disambiguation. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 680–684.Google Scholar
- [7] . 2013. Xling: Matching query sentences to a parallel corpus using topic models for WSD. In Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval’13). 167–170.Google Scholar
- [8] . 2015. Topic modeling and word sense disambiguation on the Ancora corpus. Procesamiento Del Lenguaje Natural 55 (2015), 15–22.Google Scholar
- [9] . 2015. Topic2Vec: Learning distributed representations of topics. In 2015 International Conference on Asian Language Processing (IALP’15). IEEE, 193–196.Google Scholar
- [10] . 2021. TWE-WSD: An effective topical word embedding based word sense disambiguation. CAAI Transactions on Intelligence Technology 6, 1 (2021), 72–79.Google Scholar
Cross Ref
- [11] . 2013. Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:1310.4546 Google Scholar
Digital Library
- [12] . 2020. Evaluating Hierarchical LDA Topic Models for Article Categorization.Google Scholar
- [13] . 2020. Knowledge based word sense disambiguation with distributional semantic expansion for the Persian language. In 2020 10th International Conference on Computer and Knowledge Engineering (ICCKE’20). IEEE, 329–335.Google Scholar
Cross Ref
- [14] . 2020. A computational social science perspective on qualitative data exploration: Using topic models for the descriptive analysis of social media data. Journal of Technology in Human Services 38, 1 (2020), 54–86.Google Scholar
Cross Ref
- [15] . 2010. BabelNet: Building a very large multilingual semantic network. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 216–225. Google Scholar
Digital Library
- [16] . 2020. Concept-LDA: Incorporating Babelfy into LDA for aspect extraction. Journal of Information Science 46, 3 (2020), 406–418.Google Scholar
Digital Library
- [17] . 2020. Hybrid context-aware word sense disambiguation in topic modeling based document representation. In 2020 IEEE International Conference on Data Mining (ICDM’20). IEEE, 332–341.Google Scholar
Cross Ref
- [18] . 2021. Adaptive and hybrid context-aware fine-grained word sense disambiguation in topic modeling based document representation. Information Processing & Management 58, 4 (2021), 102592.Google Scholar
Digital Library
- [19] . 2018. Deep contextualized word representations. arXiv preprint arXiv:1802.05365.Google Scholar
- [20] . 2019. Language models are unsupervised multi task learners. OpenAI Blog 1, 8 (2019), 9.Google Scholar
- [21] . 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.Google Scholar
- [22] . 2020. Adapting BERT for word sense disambiguation with gloss selection objective and example sentences. arXiv preprint arXiv:2009.11795.Google Scholar
- [23] . 2020. Automatic labelling of genre-specific collections for word sense disambiguation in Russian. In Russian Conference on Artificial Intelligence. Springer, Cham, 215–227.Google Scholar
Digital Library
- [24] . 2021. Arabic gloss WSD using BERT. Applied Sciences 11, 6 (2021), 2567.Google Scholar
Cross Ref
- [25] . 2016. Unsupervised approach to word sense disambiguation in Malayalam. Procedia Technology 24, 1507–1513.Google Scholar
Cross Ref
- [26] . 2017. Word sense disambiguation for Malayalam in a conditional random field framework. In Proceedings of the 14th International Conference on Natural Language Processing (ICON’17). 495–502.Google Scholar
- [27] . 2019. Word sense disambiguation of Malayalam nouns. Recent Advances in Computational Intelligence. Springer, Cham, 291–314.Google Scholar
Cross Ref
- [28] . 1872. A Malayalam and English dictionary. C. Stolz.Google Scholar
- [29] . 2010. Understanding Bag-of-Words model: A statistical framework. International Journal of Machine Learning and Cybernetics 1, 1–4 (2010), 43–52.Google Scholar
Cross Ref
- [30] . 2020. An LDA-based approach towards word sense disambiguation in Malayalam. In Proceedings of International Conference on Machine Intelligence and Data Science Applications: MIDAS 2020. Springer Nature, 457.Google Scholar
Index Terms
Improved Word Sense Determination in Malayalam using Latent Dirichlet Allocation and Semantic Features
Recommendations
Word sense and semantic relations in noun compounds
Special issue on multiword expressions: From theory to practice and use, part 2In this article, we investigate word sense distributions in noun compounds (NCs). Our primary goal is to disambiguate the word sense of component words in NCs, based on investigation of “semantic collocation” between them. We use sense collocation and ...
Word Sense Determination using WordNet and Sense Co-occurrence
AINA '06: Proceedings of the 20th International Conference on Advanced Information Networking and Applications - Volume 01This paper presents a method of word sense disambiguation that assigns a target word the sense that is most related to the senses of its neighbor words. We explore the use of measures of relatedness between word senses based on a novel hybrid approach. ...
Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora
An unsupervised method for word-sense disambiguation using bilingual comparable corpora was developed. First, it extracts word associations, i.e., statistically significant pairs of associated words, from the corpus of each language. Then, it aligns ...






Comments