skip to main content
research-article

Developing the Persian Wordnet of Verbs Using Supervised Learning

Published:26 May 2021Publication History
Skip Abstract Section

Abstract

Nowadays, wordnets are extensively used as a major resource in natural language processing and information retrieval tasks. Therefore, the accuracy of wordnets has a direct influence on the performance of the involved applications. This paper presents a fully-automated method for extending a previously developed Persian wordnet to cover more comprehensive and accurate verbal entries. At first, by using a bilingual dictionary, some Persian verbs are linked to Princeton WordNet synsets. A feature set related to the semantic behavior of compound verbs as the majority of Persian verbs is proposed. This feature set is employed in a supervised classification system to select the proper links for inclusion in the wordnet. We also benefit from a pre-existing Persian wordnet, FarsNet, and a similarity-based method to produce a training set. This is the largest automatically developed Persian wordnet with more than 27,000 words, 28,000 PWN synsets and 67,000 word-sense pairs that substantially outperforms the previous Persian wordnet with about 16,000 words, 22,000 PWN synsets and 38,000 word-sense pairs.

References

  1. Marianna Apidianaki and Benoît Sagot. 2014. Data-driven synset induction and disambiguation for wordnet development. Language Resources and Evaluation 48, 4 (2014), 655–677. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Somayeh Bagherbeygi and Mehrnoush Shamsfard. 2012. Corpus based semi-automatic extraction of Persian compound verbs and their relations. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12). 23–25. Google ScholarGoogle Scholar
  3. Chumki Basu, Laura Dietz, and Christiane Fellbaum. 2018. WordNetContext: Information retrieval-friendly access to WordNet senses. In ProfS/KG4IR/Data: Search@ SIGIR. 63–64.Google ScholarGoogle Scholar
  4. Parisa Berangi, Zahra Mousavi, Heshaam Faili, and Azadeh Shakery. 2020. WordNet construction for under-resourced languages using personalized PageRank. Digital Scholarship in the Humanities (2020). DOI:https://doi.org/10.1093/llc/fqaa036Google ScholarGoogle Scholar
  5. Mahmood Bijankhan. 2004. The role of the corpus in writing a grammar: An introduction to a software. Iranian Journal of Linguistics 19, 2 (2004), 48–67.Google ScholarGoogle Scholar
  6. Francis Bond and Ryan Foster. 2013. Linking and extending an open multilingual wordnet. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1352–1362.Google ScholarGoogle Scholar
  7. Jordan Boyd-Graber, Christiane Fellbaum, Daniel Osherson, and Robert Schapire. 2006. Adding dense, weighted connections to WordNet. In Proceedings of the 3rd International WordNet Conference. 29–36.Google ScholarGoogle Scholar
  8. Luciano del Corro, Rainer Gemulla, and Gerhard Weikum. 2014. Werdy: Recognition and disambiguation of verbs and verb phrases with syntactic and semantic pruning. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language (EMNLP’14). 374–385. DOI:https://doi.org/10.3115/v1/D14-1042Google ScholarGoogle ScholarCross RefCross Ref
  9. Mohammad Dabir-Moghaddam. 1997. Compound verbs in Persian. Studies in the Linguistic Sciences 27, 2 (1997), 25–59.Google ScholarGoogle Scholar
  10. Gonenc Ercan and Farid Haziyev. 2019. Synset expansion on translation graph for automatic wordnet construction. Information Processing & Management 56, 1 (2019), 130–150.Google ScholarGoogle ScholarCross RefCross Ref
  11. Marzieh Fadaee, Hamidreza Ghader, Heshaam Faili, and Azadeh Shakery. 2013. Automatic WordNet construction using Markov chain Monte Carlo. Polibits 47 (2013), 13–22.Google ScholarGoogle ScholarCross RefCross Ref
  12. Geert Booij. 1998. Phonology meets morphology: An interface account of Persian stress. In Proceedings of the Conference on Cognitive Science.Google ScholarGoogle Scholar
  13. Neiloufar Family. 2006. Explorations of semantic space: The case of light verb constructions in Persian. Unpublished Ph.D. Dissertation. Ecole des Hautes Etudes en Sciences Sociales. http://hdl.handle.net/11707/3818Google ScholarGoogle Scholar
  14. C. Fellbaum. 1998. WordNet: An Electronic Lexical Database. Bradford Books.Google ScholarGoogle Scholar
  15. Raffaella Folli, Heidi Harley, and Simin Karimi. 2005. Determinants of event type in Persian complex predicates. Lingua 115, 10 (2005), 1365–1401.Google ScholarGoogle ScholarCross RefCross Ref
  16. Adele E. Goldberg. 1996. Words by default: Optimizing constraints and the Persian complex predicate. In Proceedings of the 22nd Annual Meeting of the Berkeley Linguistics Society: General Session and Parasession on the Role of Learnability in Grammatical Theory.Google ScholarGoogle ScholarCross RefCross Ref
  17. Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA, data mining software: An update. ACM SIGKDD Explorations Newsletter 11, 1 (2009), 10–18. DOI:https://doi.org/10.1145/1656274.1656278 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yuncheng Jiang, Mingxuan Yang, and Rong Qu. 2019. Semantic similarity measures for formal concept analysis using linked data and WordNet. Multimedia Tools and Applications 78 (2019), 19807–19837. DOI:https://doi.org/10.1007/s11042-019-7150-2 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Simin Karimi. 1997. Persian complex verbs: Idiomatic or compositional. Lexicology–Berlin 3 (1997), 273–318.Google ScholarGoogle Scholar
  20. Gholamhossein Karimi-Doostan. 2005. Light verbs and structural case. Lingua 115, 12 (2005), 1737–1756.Google ScholarGoogle ScholarCross RefCross Ref
  21. Farhad Keyvan, Habib Borjian, Manuchehr Kasheff, and Christiane Fellbaum. 2007. Developing PersiaNet: The persian wordnet. In Proceedings of the 3rd Global Wordnet Conference. 315–318.Google ScholarGoogle Scholar
  22. Parviz Khanlari. 1995. Tarikh-e zaban-e farsi [history of the Persian language].Tehran: Farhang-e Nashr-e No.Google ScholarGoogle Scholar
  23. Mikhail Khodak, Andrej Risteski, Christiane Fellbaum, and Sanjeev Arora. 2017. Automated wordnet construction using word embeddings. In Proceedings of the 1st Workshop on Sense, Concept, and Entity Representations and Their Applications. 12–23.Google ScholarGoogle ScholarCross RefCross Ref
  24. J. Kittler, M. Hatef, Robert P. W. Duin, and J. Matas. 1998. On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 3 (1998), 226–239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Khang Nhut Lam, Feras Al Tarouti, and Jugal Kalita. 2014. Automatically constructing Wordnet synsets. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 106–111.Google ScholarGoogle ScholarCross RefCross Ref
  26. Michael Lesk. 1986. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of the 5th Annual International Conference on Systems Documentation. ACM, New York, NY, 24–26. DOI:https://doi.org/10.1145/318723.318728 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sebastian Löbner. 2013. Understanding Semantics. Routledge.Google ScholarGoogle Scholar
  28. Shahrzad Mahootian. 2002. Persian. Routledge.Google ScholarGoogle Scholar
  29. Niloofar Mansoory and Mahmood Bijankhan. 2008. The possible effects of Persian light verb constructions on Persian WordNet. In Proceedings of the 4th Global WordNet Conference. 297–303.Google ScholarGoogle Scholar
  30. Niloofar Mansoory, Mehrnoush Shamsfard, and Masoud Rouhizadeh. 2012. Compound verbs in Persian wordnet. International Journal of Lexicography 25, 1 (2012), 50–67. DOI:https://doi.org/10.1093/ijl/ecr022Google ScholarGoogle ScholarCross RefCross Ref
  31. Stefano Melacci, Achille Globo, and Leonardo Rigutini. 2018. Enhancing modern supervised word sense disambiguation models by semantic lexical resources. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC’18).Google ScholarGoogle Scholar
  32. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111–3119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jan Mohammad and Simin Karimi. 1992. Light verbs are taking over: Complex verbs in Persian. In Proceedings of the Western Conference on Linguistics (WECOL’92), Vol. 5. 195–212.Google ScholarGoogle Scholar
  34. Mortaza Montazery and Heshaam Faili. 2010. Automatic Persian wordnet construction. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters. 846–850. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Mortaza Montazery and Heshaam Faili. 2011. Unsupervised learning for Persian WordNet construction. In Proceedings of the 2011 International Conference on Recent Advances in Natural Language Processing (RANLP’11). 302–308.Google ScholarGoogle Scholar
  36. Zahra Mousavi, Heshaam Faili, and Marzieh Fadaee. 2017. Persian wordnet construction using supervised learning. International Journal of Information & Communication Technology Research 9, 2 (2017), 35–44.Google ScholarGoogle Scholar
  37. Antoni Oliver and Salvador Climent. 2012. Parallel corpora for wordnet construction: Machine translation vs. automatic sense tagging. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics. 110–121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Antoni Oliver and Salvador Climent. 2014. Automatic creation of WordNets from parallel corpora. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14). 1112–1116.Google ScholarGoogle Scholar
  39. Emanuele Pianta, Luisa Bentivogli, and Christian Girardi. 2002. MultiWordNet: Developing an aligned multilingual database. In Proceedings of the 1st International Conference on Global WordNet. 293–302.Google ScholarGoogle Scholar
  40. Maciej Piasecki. 2019. Paintball: Automated wordnet expansion algorithm based on distributional semantics and information spreading. Computational Methods in Science and Technology 25, 1 (2019), 41–56.Google ScholarGoogle ScholarCross RefCross Ref
  41. Venkatesh Prabhu, Shilpa Desai, Hanumant Redkar, Neha Prabhugaonkar, Apurva Nagvenkar, and Ramdas Karmali. 2012. An efficient database design for IndoWordNet development using hybrid approach. In Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing. 229–236.Google ScholarGoogle Scholar
  42. Mohammad Sadegh Rasooli, Heshaam Faili, and Behrouz Minaei-Bidgoli. 2011. Unsupervised identification of Persian compound verbs. In Proceedings of the Mexican International Conference on Artificial Intelligence. 394–406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Masoud Rouhizadeh, Mehrnoush Shamsfard, and M. Yarmohammadi. 2008. Building a WordNet for Persian verbs. In Proceedings of the 4th Global WordNet Conference. 406–412.Google ScholarGoogle Scholar
  44. Masoud Rouhizadeh, Mahsa A. Yarmohammadi, and Mehrnoush Shamsfard. 2010. Developing the Persian WordNet of verbs: Issues of compound verbs and building the editor. In Proceedings of 5th Global WordNet Conference.Google ScholarGoogle Scholar
  45. Mehrnoush Shamsfard. 2008. Developing FarsNet: A lexical ontology for Persian. In Proceedings of the 4th Global WordNet Conference.Google ScholarGoogle Scholar
  46. Mehrnoush Shamsfard, Akbar Hesabi, Hakimeh Fadaei, Niloofar Mansoory, Ali Famian, Somayeh Bagherbeigi, Elham Fekri, Maliheh Monshizadeh, and S. Mostafa Assi. 2010. Semi automatic development of FarsNet; the Persian wordnet. In Proceedings of 5th Global WordNet Conference, Vol. 29.Google ScholarGoogle Scholar
  47. Nasrin Taghizadeh and Hesham Faili. 2016. Automatic wordnet development for low-resource languages using cross-lingual WSD. Journal of Artificial Intelligence Research 56 (2016), 61–87. DOI:https://doi.org/10.1613/jair.4968 Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Dan Tufis, Dan Cristea, and Sofia Stamou. 2004. BalkaNet: Aims, methods, results and perspectives. A general overview. Romanian Journal of Information Science and Technology 7, 1-2 (2004), 9–43.Google ScholarGoogle Scholar
  49. Dmitry Ustalov, Mikhail Chernoskutov, Chris Biemann, and Alexander Panchenko. 2017. Fighting with the sparsity of synonymy dictionaries for automatic synset induction. In Proceedings of the International Conference on Analysis of Images, Social Networks, and Texts. 94–105.Google ScholarGoogle Scholar
  50. Dmitry Ustalov, Alexander Panchenko, and Chris Biemann. 2017. Watset: Automatic induction of synsets from a graph of synonyms. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17). 1579–1590.Google ScholarGoogle Scholar
  51. Mohammad-Mehdi Vahedi-Langrudi. 1996. The Syntax, Semantics and Argument Structure of Complex Predicates in Modern Farsi. University of Ottawa (Canada).Google ScholarGoogle Scholar
  52. Piek Vossen. 1998. Introduction to EuroWordNet. Computers and the Humanities 32, 2-3 (1998), 73–89. DOI:https://doi.org/10.1023/A:1001175424222Google ScholarGoogle Scholar
  53. Zhibiao Wu and Martha Palmer. 1994. Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics. 133–138. DOI:https://doi.org/10.3115/981732.981751 Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Torsten Zesch and Iryna Gurevych. 2010. Wisdom of crowds versus wisdom of linguists—Measuring the semantic relatedness of words. Natural Language Engineering 16, 1 (2010), 25–59. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Developing the Persian Wordnet of Verbs Using Supervised Learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Asian and Low-Resource Language Information Processing
        ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 20, Issue 4
        July 2021
        419 pages
        ISSN:2375-4699
        EISSN:2375-4702
        DOI:10.1145/3465463
        Issue’s Table of Contents

        Copyright © 2021 Association for Computing Machinery.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 26 May 2021
        • Accepted: 1 February 2021
        • Revised: 1 November 2020
        • Received: 1 May 2019
        Published in tallip Volume 20, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)16
        • Downloads (Last 6 weeks)2

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!