Abstract
We introduced a new classifier named Learning Word-vector Quantization (LWQ) to solve morphological ambiguities in Turkish, which is an agglutinative language. First, a new and morphologically annotated corpus, and then its datasets are prepared with a series of processes. According to datasets, LWQ finds optimal word-vectors positions by moving them in the Euclidean space. LWQ does morphological disambiguation in two steps: First, it defines all solution candidates of an ambiguous word using a morphological analyzer; second, it chooses the best candidate according to its total distances to neighbor words that are not ambiguous. To show LWQ's performance, we have conducted many tests on the corpus by considering the consistency of classification. In the experiments, we achieve 98.4% correct classification ratio to choose correct parse output, which is an excellent level for the literature.
- Dilek Z. Hakkani-Tür, Kemal Oflazer, and Gökhan Tür. 2002. Statistical morphological disambiguation for agglutinative languages. Comput. Hum. 36, 4 (2002), 381--410.Google Scholar
Cross Ref
- Erenay Dayanık, Ekin Akyürek, and Deniz Yuret. 2018. Morphnet: A sequence-to-sequence model that combines morphological analysis and disambiguation. arXiv preprint arXiv:1805.07946.Google Scholar
- Deniz Yuret and Ferhan Türe. 2006. Learning morphological disambiguation rules for Turkish. In Proceedings of the Human Language Technology Conference of the NAACL, Main Conference. Association for Computational Linguistics.Google Scholar
Digital Library
- Onur Görgün and Olcay Taner Yildiz. 2011. A novel approach to morphological disambiguation for turkish. In Computer and Information Sciences II. Springer, London, 77--83.Google Scholar
- Haşim Sak, Tunga Güngör, and Murat Saraçlar. 2007. Morphological disambiguation of Turkish text with perceptron algorithm. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics. Springer, Berlin.Google Scholar
Digital Library
- Qinlan Shen, Daniel Clothiaux, Emily Tagtow, Patrick Littell, and Chris Dyer. 2016. The role of context in neural morphological disambiguation. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING’16). 181--191.Google Scholar
- Eray Yildiz, Caglar Tirkaz, H. Bahadir Sahin, Mustafa Tolga Eren, and Ozan Sonmez. 2016. A morphology-aware network for morphological disambiguation. In Proceedings of the 30th AAAI Conference on Artificial Intelligence.Google Scholar
- Dat Quoc Nguyen, Dai Quoc Nguyen, Dang Duc Pham, and Son Bao Pham. 2016. A robust transformation-based learning approach using ripple down rules for part-of-speech tagging. AI Commun. 29, 3 (2016), 409--422.Google Scholar
Cross Ref
- Razieh Ehsani, Muzaffer Ege Alper, Gülşen Eryiğit, and Eşref Adali. 2012. Disambiguating main POS tags for turkish. In Proceedings of the 24th Conference on Computational Linguistics and Speech Processing (ROCLING’12).Google Scholar
- Alexander Tkachenko and Sirts Kairit. 2018. Modeling composite labels for neural morphological tagging. arXiv preprint arXiv:1810.08815.Google Scholar
- Georg Heigold, Guenter Neumann, and Josef van Genabith. 2017. An extensive empirical evaluation of character-based morphological tagging for 14 languages. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers.Google Scholar
Cross Ref
- Kemal Oflazer and Ilker Kuruöz. 1994. Tagging and morphological disambiguation of turkish text. In Proceedings of the 4th Conference on Applied Natural Language Processing. Association for Computational Linguistics.Google Scholar
Digital Library
- Kemal Oflazer and Gokhan Tur. 1996. Combining hand-crafted rules and unsupervised learning in constraint-based morphological disambiguation. arXiv preprint cmp-lg/9604001.Google Scholar
- Mucahid Kutlu and Ilyas Cicekli. 2013. A hybrid morphological disambiguation system for turkish. In Proceedings of the 6th International Joint Conference on Natural Language Processing.Google Scholar
- Turhan Daybelge and Ilyas Cicekli. 2007. A rule-based morphological disambiguator for turkish. In Proceedings of Recent Advances in Natural Language Processing.Google Scholar
- Thomas Müller, Helmut Schmid, and Hinrich Schütze. 2013. Efficient higher-order CRFs for morphological tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.Google Scholar
- Matthieu Labeau, Kevin Löser, and Alexandre Allauzen. 2015. Non-lexical neural architecture for fine-grained POS tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.Google Scholar
Cross Ref
- Xuezhe Ma and Eduard Hovy. 2016. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. arXiv preprint arXiv:1603.01354.Google Scholar
- John Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning.Google Scholar
Digital Library
- Olcay Taner Yıldız, Begum Avar, and Gokhan Ercan. 2019. An open, extendible, and fast Turkish morphological analyzer. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP’19). 1364--1372.Google Scholar
- Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14).Google Scholar
Cross Ref
- Teuvo Kohonen. 2012. Self-organization and Associative Memory. Vol. 8. Springer Science 8 Business Media.Google Scholar
- Teuvo Kohonen. 1990. Improved versions of learning vector quantization. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’90). IEEE.Google Scholar
Cross Ref
- Teuvo Kohonen. 1990. The self-organizing map. Proc. IEEE 78, 9 (1990), 1464--1480.Google Scholar
Cross Ref
- Duane DeSieno. 1988. Adding a conscience to competitive learning. In Proceedings of the IEEE International Conference on Neural Networks, 1, 6 (1988), 117--124. Institute of Electrical and Electronics Engineers, New York.Google Scholar
Cross Ref
- Ercan Öztemel. 2016. Yapay Sinir Aǧları (4th Ed.). Papatya Yayinlari, İstanbul, 227.Google Scholar
- Ergin Altintas, Elif Karsligil, and Vedat Coskun. 2005. The effect of windowing in word sense disambiguation. In Proceedings of the International Symposium on Computer and Information Sciences. Springer, Berlin.Google Scholar
Digital Library
- Bahar İlgen, Eşref Adalı, and A. Cüneyd Tantuğ. 2013. A comparative study to determine the effective window size of Turkish word sense disambiguation systems. Information Sciences and Systems 2013. Springer, Cham, 169--176.Google Scholar
- György Orosz, and Attila Novák. 2013. PurePos 2.0: A hybrid tool for morphological disambiguation. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP’13).Google Scholar
- Thomas Müller and Hinrich Schütze. 2015. Robust morphological tagging with word representations. In Proceedings of the Human Language Technologies Conference of the NAACL.Google Scholar
Cross Ref
- Drahomira Spoustova, Jan Hajic, Jan Votrubec, Pavel Krbec, and Pavel Kvĕtoň. 2007. The best of two worlds: Cooperation of statistical and rule-based taggers for Czech. In Proceedings of the Workshop on Balto-Slavonic Natural Language Processing. 67--74.Google Scholar
Cross Ref
- Veysel Yücesoy and Aykut Koç. 2019. Co-occurrence weight selection in generation of word embeddings for low resource languages. ACM Trans. Asian Low-Resour. Lang. Inf. Proc. 18, 3 (Jan. 2019). DOI:https://doi.org/10.1145/32824431Google Scholar
- Razieh Ehsani, Ercan Solak, and Olcay Taner Yildiz. 2018. Constructing a WordNet for Turkish using manual and automatic annotation. ACM Trans. Asian Low-Resour. Lang. Inf. Proc. 17, 3 (Apr. 2018). DOI:https://doi.org/10.1145/31856641Google Scholar
Index Terms
Learning Word-vector Quantization: A Case Study in Morphological Disambiguation
Recommendations
Kernel methods for word sense disambiguation
Many applications of natural language processing (NLP) need an accurate resolution of various ambiguities existing in natural language. The task of fulfilling this need is also called word sense disambiguation (WSD). WSD is to resolve the correct sense ...
Two-Word Collocation Extraction Using Monolingual Word Alignment Method
Statistical bilingual word alignment has been well studied in the field of machine translation. This article adapts the bilingual word alignment algorithm into a monolingual scenario to extract collocations from monolingual corpus, based on the fact ...
Word Sense Disambiguation for Vocabulary Learning
ITS '08: Proceedings of the 9th international conference on Intelligent Tutoring SystemsWords with multiple meanings are a phenomenon inherent to any natural language. In this work, we study the effects of such lexical ambiguities on second language vocabulary learning. We demonstrate that machine learning algorithms for word sense ...






Comments