skip to main content
research-article

Learning Word-vector Quantization: A Case Study in Morphological Disambiguation

Authors Info & Claims
Published:18 June 2020Publication History
Skip Abstract Section

Abstract

We introduced a new classifier named Learning Word-vector Quantization (LWQ) to solve morphological ambiguities in Turkish, which is an agglutinative language. First, a new and morphologically annotated corpus, and then its datasets are prepared with a series of processes. According to datasets, LWQ finds optimal word-vectors positions by moving them in the Euclidean space. LWQ does morphological disambiguation in two steps: First, it defines all solution candidates of an ambiguous word using a morphological analyzer; second, it chooses the best candidate according to its total distances to neighbor words that are not ambiguous. To show LWQ's performance, we have conducted many tests on the corpus by considering the consistency of classification. In the experiments, we achieve 98.4% correct classification ratio to choose correct parse output, which is an excellent level for the literature.

References

  1. Dilek Z. Hakkani-Tür, Kemal Oflazer, and Gökhan Tür. 2002. Statistical morphological disambiguation for agglutinative languages. Comput. Hum. 36, 4 (2002), 381--410.Google ScholarGoogle ScholarCross RefCross Ref
  2. Erenay Dayanık, Ekin Akyürek, and Deniz Yuret. 2018. Morphnet: A sequence-to-sequence model that combines morphological analysis and disambiguation. arXiv preprint arXiv:1805.07946.Google ScholarGoogle Scholar
  3. Deniz Yuret and Ferhan Türe. 2006. Learning morphological disambiguation rules for Turkish. In Proceedings of the Human Language Technology Conference of the NAACL, Main Conference. Association for Computational Linguistics.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Onur Görgün and Olcay Taner Yildiz. 2011. A novel approach to morphological disambiguation for turkish. In Computer and Information Sciences II. Springer, London, 77--83.Google ScholarGoogle Scholar
  5. Haşim Sak, Tunga Güngör, and Murat Saraçlar. 2007. Morphological disambiguation of Turkish text with perceptron algorithm. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics. Springer, Berlin.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Qinlan Shen, Daniel Clothiaux, Emily Tagtow, Patrick Littell, and Chris Dyer. 2016. The role of context in neural morphological disambiguation. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING’16). 181--191.Google ScholarGoogle Scholar
  7. Eray Yildiz, Caglar Tirkaz, H. Bahadir Sahin, Mustafa Tolga Eren, and Ozan Sonmez. 2016. A morphology-aware network for morphological disambiguation. In Proceedings of the 30th AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  8. Dat Quoc Nguyen, Dai Quoc Nguyen, Dang Duc Pham, and Son Bao Pham. 2016. A robust transformation-based learning approach using ripple down rules for part-of-speech tagging. AI Commun. 29, 3 (2016), 409--422.Google ScholarGoogle ScholarCross RefCross Ref
  9. Razieh Ehsani, Muzaffer Ege Alper, Gülşen Eryiğit, and Eşref Adali. 2012. Disambiguating main POS tags for turkish. In Proceedings of the 24th Conference on Computational Linguistics and Speech Processing (ROCLING’12).Google ScholarGoogle Scholar
  10. Alexander Tkachenko and Sirts Kairit. 2018. Modeling composite labels for neural morphological tagging. arXiv preprint arXiv:1810.08815.Google ScholarGoogle Scholar
  11. Georg Heigold, Guenter Neumann, and Josef van Genabith. 2017. An extensive empirical evaluation of character-based morphological tagging for 14 languages. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers.Google ScholarGoogle ScholarCross RefCross Ref
  12. Kemal Oflazer and Ilker Kuruöz. 1994. Tagging and morphological disambiguation of turkish text. In Proceedings of the 4th Conference on Applied Natural Language Processing. Association for Computational Linguistics.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kemal Oflazer and Gokhan Tur. 1996. Combining hand-crafted rules and unsupervised learning in constraint-based morphological disambiguation. arXiv preprint cmp-lg/9604001.Google ScholarGoogle Scholar
  14. Mucahid Kutlu and Ilyas Cicekli. 2013. A hybrid morphological disambiguation system for turkish. In Proceedings of the 6th International Joint Conference on Natural Language Processing.Google ScholarGoogle Scholar
  15. Turhan Daybelge and Ilyas Cicekli. 2007. A rule-based morphological disambiguator for turkish. In Proceedings of Recent Advances in Natural Language Processing.Google ScholarGoogle Scholar
  16. Thomas Müller, Helmut Schmid, and Hinrich Schütze. 2013. Efficient higher-order CRFs for morphological tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.Google ScholarGoogle Scholar
  17. Matthieu Labeau, Kevin Löser, and Alexandre Allauzen. 2015. Non-lexical neural architecture for fine-grained POS tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.Google ScholarGoogle ScholarCross RefCross Ref
  18. Xuezhe Ma and Eduard Hovy. 2016. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. arXiv preprint arXiv:1603.01354.Google ScholarGoogle Scholar
  19. John Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Olcay Taner Yıldız, Begum Avar, and Gokhan Ercan. 2019. An open, extendible, and fast Turkish morphological analyzer. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP’19). 1364--1372.Google ScholarGoogle Scholar
  21. Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14).Google ScholarGoogle ScholarCross RefCross Ref
  22. Teuvo Kohonen. 2012. Self-organization and Associative Memory. Vol. 8. Springer Science 8 Business Media.Google ScholarGoogle Scholar
  23. Teuvo Kohonen. 1990. Improved versions of learning vector quantization. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’90). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  24. Teuvo Kohonen. 1990. The self-organizing map. Proc. IEEE 78, 9 (1990), 1464--1480.Google ScholarGoogle ScholarCross RefCross Ref
  25. Duane DeSieno. 1988. Adding a conscience to competitive learning. In Proceedings of the IEEE International Conference on Neural Networks, 1, 6 (1988), 117--124. Institute of Electrical and Electronics Engineers, New York.Google ScholarGoogle ScholarCross RefCross Ref
  26. Ercan Öztemel. 2016. Yapay Sinir Aǧları (4th Ed.). Papatya Yayinlari, İstanbul, 227.Google ScholarGoogle Scholar
  27. Ergin Altintas, Elif Karsligil, and Vedat Coskun. 2005. The effect of windowing in word sense disambiguation. In Proceedings of the International Symposium on Computer and Information Sciences. Springer, Berlin.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Bahar İlgen, Eşref Adalı, and A. Cüneyd Tantuğ. 2013. A comparative study to determine the effective window size of Turkish word sense disambiguation systems. Information Sciences and Systems 2013. Springer, Cham, 169--176.Google ScholarGoogle Scholar
  29. György Orosz, and Attila Novák. 2013. PurePos 2.0: A hybrid tool for morphological disambiguation. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP’13).Google ScholarGoogle Scholar
  30. Thomas Müller and Hinrich Schütze. 2015. Robust morphological tagging with word representations. In Proceedings of the Human Language Technologies Conference of the NAACL.Google ScholarGoogle ScholarCross RefCross Ref
  31. Drahomira Spoustova, Jan Hajic, Jan Votrubec, Pavel Krbec, and Pavel Kvĕtoň. 2007. The best of two worlds: Cooperation of statistical and rule-based taggers for Czech. In Proceedings of the Workshop on Balto-Slavonic Natural Language Processing. 67--74.Google ScholarGoogle ScholarCross RefCross Ref
  32. Veysel Yücesoy and Aykut Koç. 2019. Co-occurrence weight selection in generation of word embeddings for low resource languages. ACM Trans. Asian Low-Resour. Lang. Inf. Proc. 18, 3 (Jan. 2019). DOI:https://doi.org/10.1145/32824431Google ScholarGoogle Scholar
  33. Razieh Ehsani, Ercan Solak, and Olcay Taner Yildiz. 2018. Constructing a WordNet for Turkish using manual and automatic annotation. ACM Trans. Asian Low-Resour. Lang. Inf. Proc. 17, 3 (Apr. 2018). DOI:https://doi.org/10.1145/31856641Google ScholarGoogle Scholar

Index Terms

  1. Learning Word-vector Quantization: A Case Study in Morphological Disambiguation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!