Abstract
Part-of-speech (POS) tagging is a fundamental task in natural language processing. Korean POS tagging consists of two subtasks: morphological analysis and POS tagging. In recent years, scholars have tended to use the seq2seq model to solve this problem. The full context of a sentence is considered in these seq2seq-based Korean POS tagging methods. However, Korean morphological analysis relies more on local contextual information, and in many cases, there exists one-to-one matching between morpheme surface form and base form. To make better use of these characteristics, we propose a hierarchical seq2seq model. In our model, the low-level Bi-LSTM encodes the syllable sequence, whereas the high-level Bi-LSTM models the context information of the whole sentence, and the decoder generates the morpheme base form syllables as well as the POS tags. To improve the accuracy of the morpheme base form recovery, we introduced the convolution layer and the attention mechanism to our model. The experimental results on the Sejong corpus show that our model outperforms strong baseline systems in both morpheme-level F1-score and eojeol-level accuracy, achieving state-of-the-art performance.
- Jihun Choi, Jonghem Youn, and Sang-Goo Lee. 2016. A grapheme-level approach for constructing a Korean morphological analyzer without linguistic knowledge. In Proceedings of the 2016 IEEE International Conference on Big Data (Big Data’16). IEEE, Los Alamitos, CA, 3872–3879.Google Scholar
Cross Ref
- Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, and Tomas Mikolov. 2018. Learning word vectors for 157 languages. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’18).Google Scholar
- Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv:1508.01991Google Scholar
- Sangkeun Jung, Changki Lee, and Hyunsun Hwang. 2018. End-to-end Korean part-of-speech tagging using copying mechanism. ACM Transactions on Asian and Low-Resource Language Information Processing 17, 3 (2018), 19. Google Scholar
Digital Library
- Changki Lee, Junseok Kim, Jeonghee Kim, and Hyunki Kim. 2013. Joint models for Korean word spacing and POS tagging using structural SVM. Journal of KIISE: Software and Applications 40, 12 (2013), 826–832.Google Scholar
- Andrew Matteson, Chanhee Lee, Youngbum Kim, and Heui-Seok Lim. 2018. Rich character-level information for Korean morphological analysis and part-of-speech tagging. In Proceedings of the 27th International Conference on Computational Linguistics. 2482–2492.Google Scholar
- Seung-Hoon Na. 2015. Conditional random fields for Korean morpheme segmentation and POS tagging. ACM Transactions on Asian and Low-Resource Language Information Processing 14, 3 (2015), 10. Google Scholar
Digital Library
- Kwang-Seob Shim. 2011. Syllable-based POS tagging without Korean morphological analysis. Korean Journal of Cognitive Science 22, 3 (2011), 327–345.Google Scholar
Cross Ref
- Hyun-Je Song and Seong-Bae Park. 2019. Korean morphological analysis with tied sequence-to-sequence multi-task model. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 1436–1441.Google Scholar
Cross Ref
- Hyun-Je Song and Seong-Bae Park. 2020. Korean part-of-speech tagging based on morpheme generation. ACM Transactions on Asian and Low-Resource Language Information Processing 19, 3 (2020), 1–10. Google Scholar
Digital Library
- I. Sutskever, O. Vinyals, and Q. V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. 1–9. Google Scholar
Digital Library
- Wen Zhang, Yang Feng, Fandong Meng, Di You, and Qun Liu. 2019. Bridging the gap between training and inference for neural machine translation. arXiv:1906.02448Google Scholar
Index Terms
A Hierarchical Sequence-to-Sequence Model for Korean POS Tagging
Recommendations
Part-of-Speech (POS) Tagging Using Deep Learning-Based Approaches on the Designed Khasi POS Corpus
Part-of-speech (POS) tagging is one of the research challenging fields in natural language processing (NLP). It requires good knowledge of a particular language with large amounts of data or corpora for feature engineering, which can lead to achieving a ...
Korean Part-of-speech Tagging Based on Morpheme Generation
Two major problems of Korean part-of-speech (POS) tagging are that the word-spacing unit is not mapped one-to-one to a POS tag and that morphemes should be recovered during POS tagging. Therefore, this article proposes a novel two-step Korean POS tagger ...
Conditional Random Fields for Korean Morpheme Segmentation and POS Tagging
There has been recent interest in statistical approaches to Korean morphological analysis. However, previous studies have been based mostly on generative models, including a hidden Markov model (HMM), without utilizing discriminative models such as a ...






Comments