skip to main content
short-paper

Constructing a WordNet for Turkish Using Manual and Automatic Annotation

Published:23 April 2018Publication History
Skip Abstract Section

Abstract

In this article, we summarize the methodology and the results of our 2-year-long efforts to construct a comprehensive WordNet for Turkish. In our approach, we mine a dictionary for synonym candidate pairs and manually mark the senses in which the candidates are synonymous. We marked every pair twice by different human annotators. We derive the synsets by finding the connected components of the graph whose edges are synonym senses. We also mined Turkish Wikipedia for hypernym relations among the senses. We analyzed the resulting WordNet to highlight the difficulties brought about by the dictionary construction methods of lexicographers. After splitting the unusually large synsets, we used random walk–based clustering that resulted in a Zipfian distribution of synset sizes. We compared our results to BalkaNet and automatic thesaurus construction methods using variation of information metric. Our Turkish WordNet is available online.

References

  1. Daniil Alexeyevsky and Anastasiya V. Temchenko. 2016. WSD in monolingual dictionaries for Russian WordNet. In Proceedings of the 8th Global WordNet Conference (GWC’16)Google ScholarGoogle Scholar
  2. Global WordNet Association. 2017. Wordnets in the World. Retrieved March 23, 2018, from http://globalwordnet.org/wordnets-in-the-world/.Google ScholarGoogle Scholar
  3. Orhan Bilgin, Özlem Çetinoğlu, and Kemal Oflazer. 2004. Building a WordNet for Turkish. Romanian Journal of Information Science and Technology 7, 1--2, 163--172.Google ScholarGoogle Scholar
  4. William Black, Sabri Elkateb, Horacio Rodriguez, Musa Alkhalifa, Piek Vossen, Adam Pease, and Christiane Fellbaum. 2006. Introducing the Arabic WordNet project. In Proceedings of the 3rd International WordNet Conference. 295--300.Google ScholarGoogle Scholar
  5. Philip Edmonds and Graeme Hirst. 2002. Near-synonymy and lexical choice. Computational Linguistics 28, 2, 105--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Razieh Ehsani, Ercan Solak, and Olcay T. Yıldız. 2017. KeNet. Retrieved March 23, 2018, from http://haydut.isikun.edu.tr/kenet.html.Google ScholarGoogle Scholar
  7. Christiane Fellbaum. 1998. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  8. Sangno Lee, Soon-Young Huh, and Ronald D. McNiel. 2008. Automatic generation of concept hierarchies using WordNet. Expert Systems With Applications 35, 3, 1132--1144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cheng Hua Li, Ju Cheng Yang, and Soon Cheol Park. 2012. Text categorization algorithms using semantic approaches, corpus-based thesaurus and WordNet. Expert Systems With Applications 39, 1, 765--772. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Krister Lindén, Jyrki Niemi, and Mirka Hyvärinen. 2012. Extending and updating the Finnish Wordnet. In Shall We Play the Festschrift Game? Springer, Berlin, Germany, 67--98.Google ScholarGoogle Scholar
  11. Marina Meilă. 2003. Comparing clusterings by the variation of information. In Learning Theory and Kernel Machines. Springer, Berlin, Germany, 173--187.Google ScholarGoogle Scholar
  12. George A. Miller. 1995. WordNet: A lexical database for English. Communications of the ACM 38, 11, 39--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller. 1990. Introduction to WordNet: An on-line lexical database. International Journal of Lexicography 3, 4, 235--244.Google ScholarGoogle ScholarCross RefCross Ref
  14. Martha Palmer, Hoa Trang Dang, and Christiane Fellbaum. 2007. Making fine-grained and coarse-grained sense distinctions, both manually and automatically. Natural Language Engineering 13, 2, 137--163.Google ScholarGoogle ScholarCross RefCross Ref
  15. Maciej Piasecki, Stan Szpakowicz, Marek Maziarz, and Ewa Rudnicka. 2016. plWordNet 3.0 almost there. In Proceedings of the 8th Global WordNet Conference (GWC’16).Google ScholarGoogle Scholar
  16. Oxford University Press. 2017. Oxford Living Dictionaries. Retrieved March 23, 2018, from https://en.oxforddictionaries.com.Google ScholarGoogle Scholar
  17. Satu Elisa Schaeffer. 2007. Survey: Graph clustering. Computer Science Review 1, 1, 27--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Rion Snow, Sushant Prakash, Daniel Jurafsky, and Andrew Y. Ng. 2007. Learning to merge word senses. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07). 1005--1014.Google ScholarGoogle Scholar
  19. Dan Tufis, Dan Cristea, and Sofia Stamou. 2004. BalkaNet: Aims, methods, results and perspectives. A general overview. Romanian Journal of Information Science and Technology 7, 1--2, 9--43.Google ScholarGoogle Scholar
  20. Piek Vossen. 1997. EuroWordNet: A multilingual database for information retrieval. In Proceedings of the DELOS Workshop on Cross-Language Information Retrieval. 5--7.Google ScholarGoogle Scholar
  21. Tingting Wei, Yonghe Lu, Huiyou Chang, Qiang Zhou, and Xianyu Bao. 2015. A semantic approach for text clustering using WordNet and lexical chains. Expert Systems With Applications 42, 4, 2264--2275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. George Kingsley Zipf. 1935. The Psychobiology of Language. Houghton-Mifflin, New York, NY.Google ScholarGoogle Scholar

Index Terms

  1. Constructing a WordNet for Turkish Using Manual and Automatic Annotation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 17, Issue 3
      September 2018
      196 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3184403
      Issue’s Table of Contents

      Copyright © 2018 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 April 2018
      • Accepted: 1 February 2018
      • Revised: 1 January 2018
      • Received: 1 July 2017
      Published in tallip Volume 17, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!