skip to main content
research-article

A Hierarchical Sequence-to-Sequence Model for Korean POS Tagging

Published:23 April 2021Publication History
Skip Abstract Section

Abstract

Part-of-speech (POS) tagging is a fundamental task in natural language processing. Korean POS tagging consists of two subtasks: morphological analysis and POS tagging. In recent years, scholars have tended to use the seq2seq model to solve this problem. The full context of a sentence is considered in these seq2seq-based Korean POS tagging methods. However, Korean morphological analysis relies more on local contextual information, and in many cases, there exists one-to-one matching between morpheme surface form and base form. To make better use of these characteristics, we propose a hierarchical seq2seq model. In our model, the low-level Bi-LSTM encodes the syllable sequence, whereas the high-level Bi-LSTM models the context information of the whole sentence, and the decoder generates the morpheme base form syllables as well as the POS tags. To improve the accuracy of the morpheme base form recovery, we introduced the convolution layer and the attention mechanism to our model. The experimental results on the Sejong corpus show that our model outperforms strong baseline systems in both morpheme-level F1-score and eojeol-level accuracy, achieving state-of-the-art performance.

References

  1. Jihun Choi, Jonghem Youn, and Sang-Goo Lee. 2016. A grapheme-level approach for constructing a Korean morphological analyzer without linguistic knowledge. In Proceedings of the 2016 IEEE International Conference on Big Data (Big Data’16). IEEE, Los Alamitos, CA, 3872–3879.Google ScholarGoogle ScholarCross RefCross Ref
  2. Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, and Tomas Mikolov. 2018. Learning word vectors for 157 languages. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’18).Google ScholarGoogle Scholar
  3. Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv:1508.01991Google ScholarGoogle Scholar
  4. Sangkeun Jung, Changki Lee, and Hyunsun Hwang. 2018. End-to-end Korean part-of-speech tagging using copying mechanism. ACM Transactions on Asian and Low-Resource Language Information Processing 17, 3 (2018), 19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Changki Lee, Junseok Kim, Jeonghee Kim, and Hyunki Kim. 2013. Joint models for Korean word spacing and POS tagging using structural SVM. Journal of KIISE: Software and Applications 40, 12 (2013), 826–832.Google ScholarGoogle Scholar
  6. Andrew Matteson, Chanhee Lee, Youngbum Kim, and Heui-Seok Lim. 2018. Rich character-level information for Korean morphological analysis and part-of-speech tagging. In Proceedings of the 27th International Conference on Computational Linguistics. 2482–2492.Google ScholarGoogle Scholar
  7. Seung-Hoon Na. 2015. Conditional random fields for Korean morpheme segmentation and POS tagging. ACM Transactions on Asian and Low-Resource Language Information Processing 14, 3 (2015), 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kwang-Seob Shim. 2011. Syllable-based POS tagging without Korean morphological analysis. Korean Journal of Cognitive Science 22, 3 (2011), 327–345.Google ScholarGoogle ScholarCross RefCross Ref
  9. Hyun-Je Song and Seong-Bae Park. 2019. Korean morphological analysis with tied sequence-to-sequence multi-task model. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 1436–1441.Google ScholarGoogle ScholarCross RefCross Ref
  10. Hyun-Je Song and Seong-Bae Park. 2020. Korean part-of-speech tagging based on morpheme generation. ACM Transactions on Asian and Low-Resource Language Information Processing 19, 3 (2020), 1–10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. I. Sutskever, O. Vinyals, and Q. V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. 1–9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Wen Zhang, Yang Feng, Fandong Meng, Di You, and Qun Liu. 2019. Bridging the gap between training and inference for neural machine translation. arXiv:1906.02448Google ScholarGoogle Scholar

Index Terms

  1. A Hierarchical Sequence-to-Sequence Model for Korean POS Tagging

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 20, Issue 2
      March 2021
      313 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3454116
      Issue’s Table of Contents

      Copyright © 2021 Association for Computing Machinery.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 April 2021
      • Accepted: 1 November 2020
      • Revised: 1 September 2020
      • Received: 1 March 2020
      Published in tallip Volume 20, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)16
      • Downloads (Last 6 weeks)1

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!