skip to main content
note
Open Access

Word Segmentation for Burmese (Myanmar)

Authors Info & Claims
Published:16 May 2016Publication History
Skip Abstract Section

Abstract

Experiments on various word segmentation approaches for the Burmese language are conducted and discussed in this note. Specifically, dictionary-based, statistical, and machine learning approaches are tested. Experimental results demonstrate that statistical and machine learning approaches perform significantly better than dictionary-based approaches. We believe that this note, based on an annotated corpus of relatively considerable size (containing approximately a half million words), is the first systematic comparison of word segmentation approaches for Burmese. This work aims to discover the properties and proper approaches to Burmese textual processing and to promote further researches on this understudied language.

References

  1. Chenchen Ding, Ye Kyaw Thu, Masao Utiyama, Andrew Finch, and Eiichiro Sumita. 2014. Empirical dependency-based head finalization for statistical Chinese-, English-, and French-to-Myanmar (Burmese) machine translation. In Proc. of IWSLT. 184--191.Google ScholarGoogle Scholar
  2. Hla Hla Htay and Kavi Narayana Murthy. 2008. Myanmar word segmentation using syllable level longest matching. In Proc. of IJCNLP. 41--48.Google ScholarGoogle Scholar
  3. Chang-Ning Huang and Hai Zhao. 2007. Chinese word segmentation: A decade review. J. Chin. Inform. Process. 21, 3 (2007), 8--19. (in Chinese).Google ScholarGoogle Scholar
  4. Gen-ichiro Kikui. 2003. Creating corpora for speech-to-speech translation. In Proc. of INTERSPEECH. 381--384.Google ScholarGoogle Scholar
  5. Taku Kudo, Kaoru Yamamoto, and Yuji Matsumoto. 2004. Applying conditional random fields to Japanese morphological analysis. In Proc. of EMNLP. 230--237.Google ScholarGoogle Scholar
  6. John Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. of ICML. 282--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Seung-Hoon Na. 2015. Conditional random fields for Korean morpheme segmentation and POS tagging. ACM Trans. Asian Low-Res. Lang. Inform. Process. 14, 3 (2015), 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Graham Neubig, Yosuke Nakata, and Shinsuke Mori. 2011. Pointwise prediction for robust, adaptable Japanese morphological analysis. In Proc. of ACL-HLT. 529--533. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Manabu Sassano. 2014. Deterministic word segmentation using maximum matching with fully lexicalized rules. In Proc. of EACL. 79--83.Google ScholarGoogle ScholarCross RefCross Ref
  10. Fei Sha and Fernando Pereira. 2003. Shallow parsing with conditional random fields. In Proc. of HLT-NAACT. 134--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Richard Sproat and Thomas Emerson. 2003. The first international Chinese word segmentation bakeoff. In Proc. of SIGHAN, Vol. 1. 133--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Tun Thura Thet, Jin-Cheon Na, and Wunna Ko Ko. 2008. Word segmentation for the Myanmar language. J. Inform. Sci. 34, 5 (2008), 688--704. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Win Pa Pa and Ni Lar Thein. 2008. Myanmar word segmentation using hybrid approach. In Proc. of ICCA. 166--170.Google ScholarGoogle Scholar
  14. Ye Kyaw Thu, Andrew Finch, Eiichiro Sumita, and Yoshinori Sagisaka. 2014. Integrating dictionaries into an unsupervised model for Myanmar word segmentation. In Proc. of WSSANLP. 20--27.Google ScholarGoogle Scholar
  15. Hai Zhao, Chang-Ning Huang, Mu Li, and Bao-Liang Lu. 2010. A unified character-based tagging framework for Chinese word segmentation. ACM Trans. Asian Lang. Inform. Process. 9, 2 (2010), 5. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Word Segmentation for Burmese (Myanmar)

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 15, Issue 4
      June 2016
      173 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/2915955
      Issue’s Table of Contents

      Copyright © 2016 Owner/Author

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 May 2016
      • Accepted: 1 November 2015
      • Received: 1 July 2015
      Published in tallip Volume 15, Issue 4

      Check for updates

      Qualifiers

      • note
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!