skip to main content
short-paper

Conditional Random Fields for Korean Morpheme Segmentation and POS Tagging

Published:12 June 2015Publication History
Skip Abstract Section

Abstract

There has been recent interest in statistical approaches to Korean morphological analysis. However, previous studies have been based mostly on generative models, including a hidden Markov model (HMM), without utilizing discriminative models such as a conditional random field (CRF). We present a two-stage discriminative approach based on CRFs for Korean morphological analysis. Similar to methods used for Chinese, we perform two disambiguation procedures based on CRFs: (1) morpheme segmentation and (2) POS tagging. In morpheme segmentation, an input sentence is segmented into sequences of morphemes, where a morpheme unit is either atomic or compound. In the POS tagging procedure, each morpheme (atomic or compound) is assigned a POS tag. Once POS tagging is complete, we carry out a post-processing of the compound morphemes, where each compound morpheme is further decomposed into atomic morphemes, which is based on pre-analyzed patterns and generalized HMMs obtained from the given tagged corpus. Experimental results show the promise of our proposed method.

References

  1. Jae-Hyeok Choi and Sang-Jo Lee. 1993. A method for reducing dictionary access with bidirectional longest match strategy in Korean morphological analyzer. J. Korean Inf. Sci. Soc. Softw. Appl. (in Korean) 20, 10, 1497--1507.Google ScholarGoogle Scholar
  2. Jeen-Pyo Hong. 2008. Korean part-of-speech tagger using Eojeol patterns. Master’s thesis, Changwon National University.Google ScholarGoogle Scholar
  3. Seung-Shik Kang and Yung Taek Kim. 1994. Syllable-based model for the Korean morphology. In Proceedings of the 15th Conference on Computational Linguistics (COLING’94). Vol. 1, 221--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Deok-Bong Kim, Sung-Jin Lee, Key-Sun Choi, and Gil-Chang Kim. 1994. A two-level morphological analysis of Korean. In Proceedings of the 15th Conference on Computational Linguistics (COLING’94). Vol. 1, 535--539. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jae-Hoon Kim, Byung-Gyu Jang, Gil Chang Kim, and Jungyun Seo. 1995. Morphological ambiguity reduction using subsumption relation in Korean. In Proceedings of the Natural Language Processing Pacific Rim Symposium (NLPRS’95).Google ScholarGoogle Scholar
  6. Seong-Yong Kim. 1987. A Morphological Analyzer for Korean Language with Tabular Parsing Method and Connectivity Information. Master’s thesis, KAIST.Google ScholarGoogle Scholar
  7. Kimmo Koskenniemi. 1983. Two-level model for morphological analysis. In Proceedings of the 8th International Joint Conference on Artificial Intelligence (IJCAI’83). Vol. 2, 683--685. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Taku Kudo. 2006. MeCab: Yet another part-of-speech and morphological analyzer. http://mecab.sourceforge.net.Google ScholarGoogle Scholar
  9. Taku Kudo, Kaoru Yamamoto, and Yuji Matsumoto. 2004. Applying conditional random fields to Japanese morphological analysis. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’04). 230--237.Google ScholarGoogle Scholar
  10. Oh-Woog Kwon, Yujin Chung, Mi-Young Kim, Dong-Won Ryu, Moon-Ki Lee, and Jong-Hyeok Lee. 1999. Korean morphological analyzer and part-of-speech tagger based on CYK algorithm using syllable information. In Proceedings of the MATEC Web Conferences (MATEC’99). 76--88.Google ScholarGoogle Scholar
  11. Changki Lee. 2013. Joint models for Korean word spacing and POS tagging using structural SVM. J. Korean Inf. Sci. Soc. Softw. Appl. (in Korean) 40, 12, 826--832.Google ScholarGoogle Scholar
  12. Changki Lee and Myung-Gil Jang. 2009. Large-margin training of dependency parsers using Pegasos algorithm. ETRI J. 31, 2, 121--128.Google ScholarGoogle ScholarCross RefCross Ref
  13. Changki Lee and Hyunki Kim. 2013. Automatic Korean word spacing using Pegasos algorithm. Inf. Process. Manage. 49, 1, 370--379. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Do-Gil Lee and Hae-Chang Rim. 2005. Probabilistic models for Korean morphological analysis. In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP’05).Google ScholarGoogle Scholar
  15. Do-Gil Lee and Hae-Chang Rim. 2009. Probabilistic modeling of Korean morphology. IEEE Trans. Audio Speech Lang. Proc. 17, 5, 945--955. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gary Geunbae Lee, Jong-Hyeok Lee, and Jeongwon Cha. 2002. Syllable-pattern-based unknown-morpheme segmentation and estimation for hybrid part-of-speech tagging of Korean. Comput. Linguistics 28, 1, 53--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jae-Sung Lee. 2007. A probabilistic context sensitive rewriting method for effective transliteration variants generation. J. Korea Contents Assoc. (in Korean) 7, 2, 73--83.Google ScholarGoogle ScholarCross RefCross Ref
  18. Jae-Sung Lee. 2011. Three-step probabilistic model for Korean morphological analysis. J. Korean Inf. Sci. Soc. Softw. Appl. (in Korean) 38, 5, 257--268.Google ScholarGoogle Scholar
  19. Heui-Suk Lim, Sang-Zoo Lee, and Hae-Chang Rim. 1995. An efficient Korean morphological analysis using exclusive information. In Proceedings of the International Conference of Computational Processing Oriental Language (ICCPOL’95).Google ScholarGoogle Scholar
  20. Seung-Hoon Na, Seong-Il Yang, Chang-Hyun Kim, Oh-Woog Kwon, and Young-Kil Kim. 2012. CRFs for Korean morpheme segmentation and POS tagging. In Proceedings of the 24th Annual Conference on Human and Cognitive Language Technology (HCLT’12) (in Korean).Google ScholarGoogle Scholar
  21. Graham Neubig, Yosuke Nakata, and Shinsuke Mori. 2011. Pointwise prediction for robust, adaptable Japanese morphological analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers - Volume 2 (ACL-HLT’11). 529--533. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Hwee Tou Ng and Jin Kiat Low. 2004. Chinese part-of-speech tagging: One-at-a-time or all-at-once? Word-based or character-based?. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’04). 277--284.Google ScholarGoogle Scholar
  23. Fuchun Peng, Fangfang Feng, and Andrew McCallum. 2004. Chinese segmentation and new word detection using conditional random fields. In Proceedings of the 20th International Conference on Computational Linguistics (COLING’04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Dan Roth and Wen-tau Yih. 2005. Integer linear programming inference for conditional random fields. In Proceedings of the 22nd International Conference on Machine Learning (ICML’05). 736--743. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Sunita Sarawagi and William W. Cohen. 2004. Semi-Markov conditional random fields for information extraction. In Proceedings of the 18th Annual Conference on Neural Information Processing Systems (NIPS’04).Google ScholarGoogle Scholar
  26. Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. 2007. Pegasos: Primal estimated sub-gradient solver for SVM. In Proceedings of the 24th International Conference on Machine Learning (ICML’07). 807--814. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kwangseob Shim. 2011. Syllable-based POS tagging without Korean morphological analysis. J. Korean Soc. Cogn. Sci. (in Korean) 22, 3, 327--345.Google ScholarGoogle ScholarCross RefCross Ref
  28. Kwangseob Shim and Jaehyung Yang. 2002. MACH: A supersonic Korean morphological analyzer. In Proceedings of the 19th International Conference on Computational Linguistics (COLING’02). Vol. 1, 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Joon-Choul Shin and Cheol-Young Ock. 2012. A Korean morphological analyzer using a pre-analyzed partial word-phrase dictionary. J. Korean Inf. Sci. Soc. Softw. Appl. (in Korean) 39, 5, 415--424.Google ScholarGoogle Scholar
  30. Nianwen Xue. 2003. Chinese word segmentation as character tagging. Int. J. Comput. Linguistics Chinese Lang. Process. 8, 1.Google ScholarGoogle Scholar
  31. Seung Hyun Yang and Young-Sum Kim. 2000. A high-speed Korean morphological analysis method based on pre-analyzed partial words. J. Korean Inf. Sci. Soc. Softw. Appl. (in Korean) 27, 3, 290--301.Google ScholarGoogle Scholar
  32. Shun-Zheng Yu. 2010. Hidden semi-Markov models. Artif. Intell. 174, 2, 215--243. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Conditional Random Fields for Korean Morpheme Segmentation and POS Tagging

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!