Abstract
In this article, we summarize the methodology and the results of our 2-year-long efforts to construct a comprehensive WordNet for Turkish. In our approach, we mine a dictionary for synonym candidate pairs and manually mark the senses in which the candidates are synonymous. We marked every pair twice by different human annotators. We derive the synsets by finding the connected components of the graph whose edges are synonym senses. We also mined Turkish Wikipedia for hypernym relations among the senses. We analyzed the resulting WordNet to highlight the difficulties brought about by the dictionary construction methods of lexicographers. After splitting the unusually large synsets, we used random walk–based clustering that resulted in a Zipfian distribution of synset sizes. We compared our results to BalkaNet and automatic thesaurus construction methods using variation of information metric. Our Turkish WordNet is available online.
- Daniil Alexeyevsky and Anastasiya V. Temchenko. 2016. WSD in monolingual dictionaries for Russian WordNet. In Proceedings of the 8th Global WordNet Conference (GWC’16)Google Scholar
- Global WordNet Association. 2017. Wordnets in the World. Retrieved March 23, 2018, from http://globalwordnet.org/wordnets-in-the-world/.Google Scholar
- Orhan Bilgin, Özlem Çetinoğlu, and Kemal Oflazer. 2004. Building a WordNet for Turkish. Romanian Journal of Information Science and Technology 7, 1--2, 163--172.Google Scholar
- William Black, Sabri Elkateb, Horacio Rodriguez, Musa Alkhalifa, Piek Vossen, Adam Pease, and Christiane Fellbaum. 2006. Introducing the Arabic WordNet project. In Proceedings of the 3rd International WordNet Conference. 295--300.Google Scholar
- Philip Edmonds and Graeme Hirst. 2002. Near-synonymy and lexical choice. Computational Linguistics 28, 2, 105--144. Google Scholar
Digital Library
- Razieh Ehsani, Ercan Solak, and Olcay T. Yıldız. 2017. KeNet. Retrieved March 23, 2018, from http://haydut.isikun.edu.tr/kenet.html.Google Scholar
- Christiane Fellbaum. 1998. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA.Google Scholar
- Sangno Lee, Soon-Young Huh, and Ronald D. McNiel. 2008. Automatic generation of concept hierarchies using WordNet. Expert Systems With Applications 35, 3, 1132--1144. Google Scholar
Digital Library
- Cheng Hua Li, Ju Cheng Yang, and Soon Cheol Park. 2012. Text categorization algorithms using semantic approaches, corpus-based thesaurus and WordNet. Expert Systems With Applications 39, 1, 765--772. Google Scholar
Digital Library
- Krister Lindén, Jyrki Niemi, and Mirka Hyvärinen. 2012. Extending and updating the Finnish Wordnet. In Shall We Play the Festschrift Game? Springer, Berlin, Germany, 67--98.Google Scholar
- Marina Meilă. 2003. Comparing clusterings by the variation of information. In Learning Theory and Kernel Machines. Springer, Berlin, Germany, 173--187.Google Scholar
- George A. Miller. 1995. WordNet: A lexical database for English. Communications of the ACM 38, 11, 39--41. Google Scholar
Digital Library
- G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller. 1990. Introduction to WordNet: An on-line lexical database. International Journal of Lexicography 3, 4, 235--244.Google Scholar
Cross Ref
- Martha Palmer, Hoa Trang Dang, and Christiane Fellbaum. 2007. Making fine-grained and coarse-grained sense distinctions, both manually and automatically. Natural Language Engineering 13, 2, 137--163.Google Scholar
Cross Ref
- Maciej Piasecki, Stan Szpakowicz, Marek Maziarz, and Ewa Rudnicka. 2016. plWordNet 3.0 almost there. In Proceedings of the 8th Global WordNet Conference (GWC’16).Google Scholar
- Oxford University Press. 2017. Oxford Living Dictionaries. Retrieved March 23, 2018, from https://en.oxforddictionaries.com.Google Scholar
- Satu Elisa Schaeffer. 2007. Survey: Graph clustering. Computer Science Review 1, 1, 27--64. Google Scholar
Digital Library
- Rion Snow, Sushant Prakash, Daniel Jurafsky, and Andrew Y. Ng. 2007. Learning to merge word senses. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07). 1005--1014.Google Scholar
- Dan Tufis, Dan Cristea, and Sofia Stamou. 2004. BalkaNet: Aims, methods, results and perspectives. A general overview. Romanian Journal of Information Science and Technology 7, 1--2, 9--43.Google Scholar
- Piek Vossen. 1997. EuroWordNet: A multilingual database for information retrieval. In Proceedings of the DELOS Workshop on Cross-Language Information Retrieval. 5--7.Google Scholar
- Tingting Wei, Yonghe Lu, Huiyou Chang, Qiang Zhou, and Xianyu Bao. 2015. A semantic approach for text clustering using WordNet and lexical chains. Expert Systems With Applications 42, 4, 2264--2275. Google Scholar
Digital Library
- George Kingsley Zipf. 1935. The Psychobiology of Language. Houghton-Mifflin, New York, NY.Google Scholar
Index Terms
Constructing a WordNet for Turkish Using Manual and Automatic Annotation
Recommendations
Improving Vietnamese WordNet using word embedding
NLPIR '19: Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information RetrievalThis paper presents a simple but effective method to improve the quality of WordNet synsets and extract glosses for synsets. We translate the Princeton WordNet and other intermediate WordNets to a target language using a machine translator, then the ...
The chicken-and-egg problem in wordnet design: synonymy, synsets and constitutive relations
Wordnets are built of synsets, not of words. A synset consists of words. Synonymy is a relation between words. Words go into a synset because they are synonyms. Later, a wordnet treats words as synonymous because they belong in the same synset $$\ldots$$ Such ...
Improving selection of synsets from WordNet for domain-specific word sense disambiguation
Unsupervised approach for selecting the predominant synset from WordNet for instances of ambiguous words.An auxiliary corpus is generated from the test corpus by using information from the Web.Lexical information (neighbors of ambiguous words) is ...






Comments