skip to main content
short-paper

Word Segmentation for Burmese Based on Dual-Layer CRFs

Authors Info & Claims
Published:12 November 2018Publication History
Skip Abstract Section

Abstract

Burmese is an isolated language, in which the syllable is the smallest unit. Syllable segmentation methods based on matching lead to performance subject to the syllable segmentation effect. This article proposes a word segmentation method with fusion conditions of double syllable features. It combines word segmentation and segmentation of syllables into one process, thus reducing the impact of errors on the syllable segmentation of Burmese. In the first layer of the conditional random fields (CRF) model, Burmese characters as atomic features are integrated into the Burma section of the Barkis Speech Paradigm (Backus normal form) features to realize the Burma syllable sequence tags. In the second layer of the CRFs model, with the syllable marked as input, it realizes the sequence markers through building a feature template with syllables as atomic features. The experimental results show that the proposed method has a better effect compared with the method based on the matching of syllables.

References

  1. Sun Maosong and Zou Jiayan. 2001. A review of the study of Chinese automatic word segmentation. Mod. Ling. 3, 1 (2001), 22--32.Google ScholarGoogle Scholar
  2. Zhou Jun, Zheng Zhonghua, and Zhang Wei. 2014. Chinese word segmentation based on improved maximum matching algorithm. Comput. Eng Appl. 50, 2, (2014), 124--128.Google ScholarGoogle Scholar
  3. Li Jiangbo, Zhou Qiang, and Chen Zushun. 2006. Research on fast search algorithm for chinese dictionary. Chin. J. Inf. 20, 5 (2006), 31--39.Google ScholarGoogle Scholar
  4. Zhang Bingyi, Wei Bo, and Chen Jiancheng et al. 2014. Chinese segmentation algorithm based on dual coding. J. Nanjing Univ. Sci. Technol. Nat. Sci. 38, 4 (2014), 526--530.Google ScholarGoogle Scholar
  5. HuaPing Zhang, HongKui Yu, and DeYi Xiong et al. 2003. HHMM-based chinese lexical ICTCLAS. In Proceedings of the 2nd SIGHAN Workshop on Language Processing, Volume 17. 184--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jia Xu, Jianfeng Gao, Kristina Toutanova, and Hermann Ney. 2008. Bayesian semi-supervised chinese word segmentation for statistical machine translation. In Proceedings of the International Conference on Computational Linguistics (COLING’08). 1017--1024. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Sproat and T. Emerson. 2003. The first international chinese word segmentation bakeoff. In Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing. ACL, 133--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Xue Nianwen and Shen Libin. 2003. Chinese word segmentation as LMR tagging. In Proceedings of the 2nd ACL SIGHAN Workshop on Chinese Language Processing. ACL, 176--179. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Zhao Hai, Huang Changning, and Li Mu. 2006. An system with conditional random field. Workshop on Chinese Language Processing, improved Chinese word segmentation. In Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing. ACL, 108--117.Google ScholarGoogle Scholar
  10. Huang Degen, Jiao Yang, and Zhou Huiwei. 2010. Double layer CRFs chinese word segmentation based on child words. Comput. Res. Dev. 47, 5 (2010), 962--968.Google ScholarGoogle Scholar
  11. Tun Thura Thet and Jin-Cheon Na. 2008. Word segmentation for the Myanmar language. J. Inf. Sci. 34, 5 (2008), 688--704. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Aye Myat Mon et al. 2010. Analysis of myanmar word boundary and segmentation by using statistical approach. In Proceedings of the 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE‘10), V5:233--237.Google ScholarGoogle Scholar
  13. Ye Kyaw Thu. Integrating dictionaries into an unsupervised model for myanmar word segmentation. In Proceedings of the 5th Workshop on South and Southeast Asian NLP and 25th International Conference on Computational Linguistics. 20--27.Google ScholarGoogle Scholar
  14. Chenchen Ding, Ye Kyaw Thu, Masao Utiyama, and Eiichiro Sumita. 2016. Word segmentation for Burmese (Myanmar). ACM Trans. Asian Low-Resour. Lang. Inf. Process. 15, 4, Article 22 (May 2016), 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Zhou Junsheng, Dai Xinyu, Yin Cunyan et al. 2006. Automatic identification of Chinese organization names based on cascaded conditional random field model {J}. J. Electr. 34, 5 (2006), 6804--809.Google ScholarGoogle Scholar
  16. Yan Yang, Wen Dunwei, Wang Yunji et al. 2014. Chinese medical record naming entity recognition based on cascading conditions with the airport{J}. Journal of Jilin University: Engineering Edition 44, 6 (2014), 1843--1848.Google ScholarGoogle Scholar
  17. Li Yachao, Jiayangji, and Zong Chengqing et al. 2013. Research and implementation of tibetan automatic word segmentation based on conditional random field {J}. Journal of Chinese Information Processing 27, 4 (2013), 52--58.Google ScholarGoogle Scholar
  18. Hla Hla Htay and Kavi Narayana Murthy. 2008. Myanmar Word Segmentation using Syllable level Longest Matching. In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP’8). 41--48.Google ScholarGoogle Scholar

Index Terms

  1. Word Segmentation for Burmese Based on Dual-Layer CRFs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 18, Issue 1
      March 2019
      196 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3292011
      Issue’s Table of Contents

      Copyright © 2018 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 November 2018
      • Accepted: 1 June 2018
      • Revised: 1 March 2018
      • Received: 1 October 2017
      Published in tallip Volume 18, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!