skip to main content
note
Open Access

A Burmese (Myanmar) Treebank: Guideline and Analysis

Authors Info & Claims
Published:09 January 2020Publication History
Skip Abstract Section

Abstract

A 20,000-sentence Burmese (Myanmar) treebank on news articles has been released under a CC BY-NC-SA license. Complete phrase structure annotation was developed for each sentence from the morphologically annotated data prepared in previous work of Ding et al. [1]. As the final result of the Burmese component in the Asian Language Treebank Project, this is the first large-scale, open-access treebank for the Burmese language. The annotation details and features of this treebank are presented.

References

  1. Chenchen Ding, Hnin Thu Zar Aye, Win Pa Pa, Khin Thandar Nwet, Khin Mar Soe, Masao Utiyama, and Eiichiro Sumita. 2019. Towards Burmese (Myanmar) morphological analysis: Syllable-based tokenization and part-of-speech tagging. ACM Trans. Asian Low-Resource Lang. Inf. Process. 19, 1 (2019), 5.Google ScholarGoogle Scholar
  2. Chenchen Ding, Masao Utiyama, and Eiichiro Sumita. 2018. NOVA: A feasible and flexible annotation system for joint tokenization and part-of-speech tagging. ACM Trans. Asian Low-Resource Lang. Inf. Process. 18, 2 (2018), 17.Google ScholarGoogle Scholar
  3. Chenchen Ding, Masao Utiyama, and Eiichiro Sumita. 2019. Burmese (Myanmar) Treebank of Asian Language Treebank Project. Retrieved from DOI:https://doi.org/10.5281/zenodo.3463010Google ScholarGoogle Scholar
  4. Chenchen Ding, Ye Kyaw Thu, Masao Utiyama, and Eiichiro Sumita. 2016. Word segmentation for Burmese (Myanmar). ACM Trans. Asian Low-Resource Lang. Inf. Process. 15, 4 (2016), 22.Google ScholarGoogle Scholar
  5. Daisuke Kawahara, Sadao Kurohashi, and Kôiti Hasida. 2002. Construction of a Japanese relevance-tagged corpus. In Proceedings of the Annual Language Resources and Evaluation Conference (LREC’02). 2008--2013.Google ScholarGoogle Scholar
  6. Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a large annotated corpus of English: The Penn Treebank. Comput. Linguist. 19, 2 (1993), 313--330.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Toshiaki Nakazawa, Katsuhito Sudoh, Shohei Higashiyama, Chenchen Ding, Raj Dabre, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, and Sadao Kurohashi. 2018. Overview of the 5th workshop on Asian translation. In Proceedings of the 5th Workshop on Asian Translation (WAT’18). 1--41.Google ScholarGoogle Scholar
  8. John Okell and Anna Allott. 2001. Burmese/Myanmar Dictionary of Grammatical Forms. Routledge.Google ScholarGoogle Scholar
  9. Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein. 2006. Learning accurate, compact, and interpretable tree annotation. In Proceedings of the Annual Conference of the Association for Computational Linguistics (ACL’06). 433--440.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Slav Petrov, Dipanjan Das, and Ryan McDonald. 2012. A universal part-of-speech tagset. In Proceedings of the Annual Language Resources and Evaluation Conference (LREC’12). 2089--2096.Google ScholarGoogle Scholar
  11. Hammam Riza, Michael Purwoadi, Teduh Uliniansyah, Aw Ai Ti, Sharifah Mahani Aljunied, Luong Chi Mai, Vu Tat Thang, Nguyen Phuong Thai, Vichet Chea, Rapid Sun, Sethserey Sam, Sopheap Seng, Khin Mar Soe, Khin Thandar Nwet, Masao Utiyama, and Chenchen Ding. 2016. Introduction of the Asian language treebank. In Proceedings of the Oriental International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques Conference (O-COCOSDA’16). 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  12. Sann Su Su Yee, Chenchen Ding, Khin Mar Soe, Masao Utiyama, and Eiichiro Sumita. 2019. Modifying NOVA-annotated Myanmar data to universal part-of-speech tagset. In Proceedings of the International Conference on Computing Advancements (ICCA’19). 230--237.Google ScholarGoogle Scholar
  13. Soe Lai Phyue and Aye Thida. 2013. Unknown word detection via syntax analyze. IAES Int. J. Artif. Intell. 2, 3 (2013), 107--116.Google ScholarGoogle Scholar
  14. Win Win Thant, Tin Myat Htwe, and Ni Lar Thein. 2012. Parsing of Myanmar sentences with function tagging. Int. J. Nat. Lang. Comput. 1, 1 (2012), 9--27.Google ScholarGoogle Scholar
  15. Naiwen Xue, Fei Xia, Fu-Dong Chiou, and Marta Palmer. 2005. The Penn Chinese TreeBank: Phrase structure annotation of a large corpus. Nat. Lang. Eng. 11, 2 (2005), 207--238.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Burmese (Myanmar) Treebank: Guideline and Analysis

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 19, Issue 3
      May 2020
      228 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3378675
      Issue’s Table of Contents

      Copyright © 2020 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 January 2020
      • Accepted: 1 November 2019
      • Revised: 1 October 2019
      • Received: 1 June 2019
      Published in tallip Volume 19, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • note
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!