Abstract
Nowadays, wordnets are extensively used as a major resource in natural language processing and information retrieval tasks. Therefore, the accuracy of wordnets has a direct influence on the performance of the involved applications. This paper presents a fully-automated method for extending a previously developed Persian wordnet to cover more comprehensive and accurate verbal entries. At first, by using a bilingual dictionary, some Persian verbs are linked to Princeton WordNet synsets. A feature set related to the semantic behavior of compound verbs as the majority of Persian verbs is proposed. This feature set is employed in a supervised classification system to select the proper links for inclusion in the wordnet. We also benefit from a pre-existing Persian wordnet, FarsNet, and a similarity-based method to produce a training set. This is the largest automatically developed Persian wordnet with more than 27,000 words, 28,000 PWN synsets and 67,000 word-sense pairs that substantially outperforms the previous Persian wordnet with about 16,000 words, 22,000 PWN synsets and 38,000 word-sense pairs.
- Marianna Apidianaki and Benoît Sagot. 2014. Data-driven synset induction and disambiguation for wordnet development. Language Resources and Evaluation 48, 4 (2014), 655–677. Google Scholar
Digital Library
- Somayeh Bagherbeygi and Mehrnoush Shamsfard. 2012. Corpus based semi-automatic extraction of Persian compound verbs and their relations. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12). 23–25. Google Scholar
- Chumki Basu, Laura Dietz, and Christiane Fellbaum. 2018. WordNetContext: Information retrieval-friendly access to WordNet senses. In ProfS/KG4IR/Data: Search@ SIGIR. 63–64.Google Scholar
- Parisa Berangi, Zahra Mousavi, Heshaam Faili, and Azadeh Shakery. 2020. WordNet construction for under-resourced languages using personalized PageRank. Digital Scholarship in the Humanities (2020). DOI:https://doi.org/10.1093/llc/fqaa036Google Scholar
- Mahmood Bijankhan. 2004. The role of the corpus in writing a grammar: An introduction to a software. Iranian Journal of Linguistics 19, 2 (2004), 48–67.Google Scholar
- Francis Bond and Ryan Foster. 2013. Linking and extending an open multilingual wordnet. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1352–1362.Google Scholar
- Jordan Boyd-Graber, Christiane Fellbaum, Daniel Osherson, and Robert Schapire. 2006. Adding dense, weighted connections to WordNet. In Proceedings of the 3rd International WordNet Conference. 29–36.Google Scholar
- Luciano del Corro, Rainer Gemulla, and Gerhard Weikum. 2014. Werdy: Recognition and disambiguation of verbs and verb phrases with syntactic and semantic pruning. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language (EMNLP’14). 374–385. DOI:https://doi.org/10.3115/v1/D14-1042Google Scholar
Cross Ref
- Mohammad Dabir-Moghaddam. 1997. Compound verbs in Persian. Studies in the Linguistic Sciences 27, 2 (1997), 25–59.Google Scholar
- Gonenc Ercan and Farid Haziyev. 2019. Synset expansion on translation graph for automatic wordnet construction. Information Processing & Management 56, 1 (2019), 130–150.Google Scholar
Cross Ref
- Marzieh Fadaee, Hamidreza Ghader, Heshaam Faili, and Azadeh Shakery. 2013. Automatic WordNet construction using Markov chain Monte Carlo. Polibits 47 (2013), 13–22.Google Scholar
Cross Ref
- Geert Booij. 1998. Phonology meets morphology: An interface account of Persian stress. In Proceedings of the Conference on Cognitive Science.Google Scholar
- Neiloufar Family. 2006. Explorations of semantic space: The case of light verb constructions in Persian. Unpublished Ph.D. Dissertation. Ecole des Hautes Etudes en Sciences Sociales. http://hdl.handle.net/11707/3818Google Scholar
- C. Fellbaum. 1998. WordNet: An Electronic Lexical Database. Bradford Books.Google Scholar
- Raffaella Folli, Heidi Harley, and Simin Karimi. 2005. Determinants of event type in Persian complex predicates. Lingua 115, 10 (2005), 1365–1401.Google Scholar
Cross Ref
- Adele E. Goldberg. 1996. Words by default: Optimizing constraints and the Persian complex predicate. In Proceedings of the 22nd Annual Meeting of the Berkeley Linguistics Society: General Session and Parasession on the Role of Learnability in Grammatical Theory.Google Scholar
Cross Ref
- Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA, data mining software: An update. ACM SIGKDD Explorations Newsletter 11, 1 (2009), 10–18. DOI:https://doi.org/10.1145/1656274.1656278 Google Scholar
Digital Library
- Yuncheng Jiang, Mingxuan Yang, and Rong Qu. 2019. Semantic similarity measures for formal concept analysis using linked data and WordNet. Multimedia Tools and Applications 78 (2019), 19807–19837. DOI:https://doi.org/10.1007/s11042-019-7150-2 Google Scholar
Digital Library
- Simin Karimi. 1997. Persian complex verbs: Idiomatic or compositional. Lexicology–Berlin 3 (1997), 273–318.Google Scholar
- Gholamhossein Karimi-Doostan. 2005. Light verbs and structural case. Lingua 115, 12 (2005), 1737–1756.Google Scholar
Cross Ref
- Farhad Keyvan, Habib Borjian, Manuchehr Kasheff, and Christiane Fellbaum. 2007. Developing PersiaNet: The persian wordnet. In Proceedings of the 3rd Global Wordnet Conference. 315–318.Google Scholar
- Parviz Khanlari. 1995. Tarikh-e zaban-e farsi [history of the Persian language].Tehran: Farhang-e Nashr-e No.Google Scholar
- Mikhail Khodak, Andrej Risteski, Christiane Fellbaum, and Sanjeev Arora. 2017. Automated wordnet construction using word embeddings. In Proceedings of the 1st Workshop on Sense, Concept, and Entity Representations and Their Applications. 12–23.Google Scholar
Cross Ref
- J. Kittler, M. Hatef, Robert P. W. Duin, and J. Matas. 1998. On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 3 (1998), 226–239. Google Scholar
Digital Library
- Khang Nhut Lam, Feras Al Tarouti, and Jugal Kalita. 2014. Automatically constructing Wordnet synsets. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 106–111.Google Scholar
Cross Ref
- Michael Lesk. 1986. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of the 5th Annual International Conference on Systems Documentation. ACM, New York, NY, 24–26. DOI:https://doi.org/10.1145/318723.318728 Google Scholar
Digital Library
- Sebastian Löbner. 2013. Understanding Semantics. Routledge.Google Scholar
- Shahrzad Mahootian. 2002. Persian. Routledge.Google Scholar
- Niloofar Mansoory and Mahmood Bijankhan. 2008. The possible effects of Persian light verb constructions on Persian WordNet. In Proceedings of the 4th Global WordNet Conference. 297–303.Google Scholar
- Niloofar Mansoory, Mehrnoush Shamsfard, and Masoud Rouhizadeh. 2012. Compound verbs in Persian wordnet. International Journal of Lexicography 25, 1 (2012), 50–67. DOI:https://doi.org/10.1093/ijl/ecr022Google Scholar
Cross Ref
- Stefano Melacci, Achille Globo, and Leonardo Rigutini. 2018. Enhancing modern supervised word sense disambiguation models by semantic lexical resources. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC’18).Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111–3119. Google Scholar
Digital Library
- Jan Mohammad and Simin Karimi. 1992. Light verbs are taking over: Complex verbs in Persian. In Proceedings of the Western Conference on Linguistics (WECOL’92), Vol. 5. 195–212.Google Scholar
- Mortaza Montazery and Heshaam Faili. 2010. Automatic Persian wordnet construction. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters. 846–850. Google Scholar
Digital Library
- Mortaza Montazery and Heshaam Faili. 2011. Unsupervised learning for Persian WordNet construction. In Proceedings of the 2011 International Conference on Recent Advances in Natural Language Processing (RANLP’11). 302–308.Google Scholar
- Zahra Mousavi, Heshaam Faili, and Marzieh Fadaee. 2017. Persian wordnet construction using supervised learning. International Journal of Information & Communication Technology Research 9, 2 (2017), 35–44.Google Scholar
- Antoni Oliver and Salvador Climent. 2012. Parallel corpora for wordnet construction: Machine translation vs. automatic sense tagging. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics. 110–121. Google Scholar
Digital Library
- Antoni Oliver and Salvador Climent. 2014. Automatic creation of WordNets from parallel corpora. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14). 1112–1116.Google Scholar
- Emanuele Pianta, Luisa Bentivogli, and Christian Girardi. 2002. MultiWordNet: Developing an aligned multilingual database. In Proceedings of the 1st International Conference on Global WordNet. 293–302.Google Scholar
- Maciej Piasecki. 2019. Paintball: Automated wordnet expansion algorithm based on distributional semantics and information spreading. Computational Methods in Science and Technology 25, 1 (2019), 41–56.Google Scholar
Cross Ref
- Venkatesh Prabhu, Shilpa Desai, Hanumant Redkar, Neha Prabhugaonkar, Apurva Nagvenkar, and Ramdas Karmali. 2012. An efficient database design for IndoWordNet development using hybrid approach. In Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing. 229–236.Google Scholar
- Mohammad Sadegh Rasooli, Heshaam Faili, and Behrouz Minaei-Bidgoli. 2011. Unsupervised identification of Persian compound verbs. In Proceedings of the Mexican International Conference on Artificial Intelligence. 394–406. Google Scholar
Digital Library
- Masoud Rouhizadeh, Mehrnoush Shamsfard, and M. Yarmohammadi. 2008. Building a WordNet for Persian verbs. In Proceedings of the 4th Global WordNet Conference. 406–412.Google Scholar
- Masoud Rouhizadeh, Mahsa A. Yarmohammadi, and Mehrnoush Shamsfard. 2010. Developing the Persian WordNet of verbs: Issues of compound verbs and building the editor. In Proceedings of 5th Global WordNet Conference.Google Scholar
- Mehrnoush Shamsfard. 2008. Developing FarsNet: A lexical ontology for Persian. In Proceedings of the 4th Global WordNet Conference.Google Scholar
- Mehrnoush Shamsfard, Akbar Hesabi, Hakimeh Fadaei, Niloofar Mansoory, Ali Famian, Somayeh Bagherbeigi, Elham Fekri, Maliheh Monshizadeh, and S. Mostafa Assi. 2010. Semi automatic development of FarsNet; the Persian wordnet. In Proceedings of 5th Global WordNet Conference, Vol. 29.Google Scholar
- Nasrin Taghizadeh and Hesham Faili. 2016. Automatic wordnet development for low-resource languages using cross-lingual WSD. Journal of Artificial Intelligence Research 56 (2016), 61–87. DOI:https://doi.org/10.1613/jair.4968 Google Scholar
Digital Library
- Dan Tufis, Dan Cristea, and Sofia Stamou. 2004. BalkaNet: Aims, methods, results and perspectives. A general overview. Romanian Journal of Information Science and Technology 7, 1-2 (2004), 9–43.Google Scholar
- Dmitry Ustalov, Mikhail Chernoskutov, Chris Biemann, and Alexander Panchenko. 2017. Fighting with the sparsity of synonymy dictionaries for automatic synset induction. In Proceedings of the International Conference on Analysis of Images, Social Networks, and Texts. 94–105.Google Scholar
- Dmitry Ustalov, Alexander Panchenko, and Chris Biemann. 2017. Watset: Automatic induction of synsets from a graph of synonyms. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17). 1579–1590.Google Scholar
- Mohammad-Mehdi Vahedi-Langrudi. 1996. The Syntax, Semantics and Argument Structure of Complex Predicates in Modern Farsi. University of Ottawa (Canada).Google Scholar
- Piek Vossen. 1998. Introduction to EuroWordNet. Computers and the Humanities 32, 2-3 (1998), 73–89. DOI:https://doi.org/10.1023/A:1001175424222Google Scholar
- Zhibiao Wu and Martha Palmer. 1994. Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics. 133–138. DOI:https://doi.org/10.3115/981732.981751 Google Scholar
Digital Library
- Torsten Zesch and Iryna Gurevych. 2010. Wisdom of crowds versus wisdom of linguists—Measuring the semantic relatedness of words. Natural Language Engineering 16, 1 (2010), 25–59. Google Scholar
Digital Library
Index Terms
Developing the Persian Wordnet of Verbs Using Supervised Learning
Recommendations
Annotating words using wordnet semantic glosses
ICONIP'12: Proceedings of the 19th international conference on Neural Information Processing - Volume Part IVAn approach to the word sense disambiguation (WSD) relaying on the WordNet synsets is proposed. The method uses semantically tagged glosses to perform a process similar to the spreading activation in semantic network, creating ranking of the most ...
Improving Vietnamese WordNet using word embedding
NLPIR '19: Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information RetrievalThis paper presents a simple but effective method to improve the quality of WordNet synsets and extract glosses for synsets. We translate the Princeton WordNet and other intermediate WordNets to a target language using a machine translator, then the ...
Data-driven synset induction and disambiguation for wordnet development
Automatic methods for wordnet development in languages other than English generally exploit information found in Princeton WordNet (PWN) and translations extracted from parallel corpora. A common approach consists in preserving the structure of PWN and ...






Comments