Abstract
The smallest semantic unit of the Burmese language is called the syllable. In the present study, it is intended to propose the first neural joint learning model for Burmese syllable segmentation, word segmentation, and part-of-speech (POS) tagging with the BERT. The proposed model alleviates the error propagation problem of the syllable segmentation. More specifically, it extends the neural joint model for Vietnamese word segmentation, POS tagging, and dependency parsing [28] with the pre-training method of the Burmese character, syllable, and word embedding with BiLSTM-CRF-based neural layers. In order to evaluate the performance of the proposed model, experiments are carried out on Burmese benchmark datasets, and we fine-tune the model of multilingual BERT. Obtained results show that the proposed joint model can result in an excellent performance.
- Chris Alberti, Kenton Lee, and Michael Collins. 2019. A BERT baseline for the natural questions. arXiv: Computation and Language.Google Scholar
- Cunli Mao, Zhibo Man, Zhengtao Yu, Zhenhan Wang, Shengxiang Gao, and Yafei Zhang. 2020. A Burmese dependency parsing method based on transfer learning. In Proceedings of the 2020 International Conference on Asian Language Processing (IALP’20). IEEE, 92–97.Google Scholar
Cross Ref
- Bernd Bohnet, Ryan McDonald, Goncalo Simoes, Daniel Andor, Emily Pitler, and Joshua Maynez. 2018. Morphosyntactic tagging with a meta-BiLSTM model over context sensitive token encodings. arXiv:1805.08237.Google Scholar
- Wanxiang Che, Yijia Liu, Yuxuan Wang, Bo Zheng, and Ting Liu. 2018. Towards better UD parsing: Deep contextualized word embeddings, ensemble, and treebank concatenation. arXiv:1807.03121.Google Scholar
- Xinchi Chen, Xipeng Qiu, and Xuanjing Huang. 2016. A feature-enriched neural model for joint Chinese word segmentation and part-of-speech tagging. arXiv:1611.05384.Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805.Google Scholar
- Chenchen Ding, Hnin Thu Zar Aye, Win Pa Pa, Khin Thandar Nwet, Khin Mar Soe, Masao Utiyama, and Eiichiro Sumita. 2019. Towards Burmese (Myanmar) morphological analysis: Syllable-based tokenization and part-of-speech tagging. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 19, 1 (2019), 1–34. Google Scholar
Digital Library
- Chenchen Ding, Ye Kyaw Thu, Masao Utiyama, and Eiichiro Sumita. 2016. Word segmentation for Burmese (Myanmar). ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 15, 4 (2016), 1–10. Google Scholar
Digital Library
- Chenchen Ding, Masao Utiyama, and Eiichiro Sumita. 2018. NOVA: A feasible and flexible annotation system for joint tokenization and part-of-speech tagging. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 18, 2 (2018), 1–18. Google Scholar
Digital Library
- Chenchen Ding, Sann Su Su Yee, Win Pa Pa, Khin Mar Soe, Masao Utiyama, and Eiichiro Sumita. 2020. A Burmese (Myanmar) treebank: Guideline and analysis. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 19, 3 (2020), 1–13. Google Scholar
Digital Library
- Jun Hatori, Takuya Matsuzaki, Yusuke Miyao, and Jun’ichi Tsujii. 2012. Incremental joint approach to word segmentation, POS tagging, and dependency parsing in Chinese. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Association for Computational Linguistics, 1045–1053. Google Scholar
Digital Library
- Tin Htay Hlaing and Yoshiki Mikami. 2014. Automatic syllable segmentation of Myanmar texts using finite state transducer. ICTer 6, 2 (2014).Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780. Google Scholar
Digital Library
- Hla Hla Htay and Kavi Narayana Murthy. 2008. Myanmar word segmentation using syllable level longest matching. In Proceedings of the 6th Workshop on Asian Language Resources.Google Scholar
- Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv:1508.01991.Google Scholar
- Wenbin Jiang, Liang Huang, Qun Liu, and Yajuan Lü. 2008. A cascaded linear model for joint Chinese word segmentation and part-of-speech tagging. In Proceedings of ACL-08: HLT. 897–904. Google Scholar
Digital Library
- Zhanming Jie and Wei Lu. 2019. Dependency-guided LSTM-CRF for named entity recognition. arXiv:1909.10148.Google Scholar
- Dan Kondratyuk and Milan Straka. 2019. 75 languages, 1 model: Parsing universal dependencies universally. arXiv:1904.02099.Google Scholar
- Canasai Kruengkrai, Kiyotaka Uchimoto, Jun’ichi Kazama, Yiou Wang, Kentaro Torisawa, and Hitoshi Isahara. 2009. An error-driven word-character hybrid model for joint Chinese word segmentation and POS tagging. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1. Association for Computational Linguistics, 513–521. Google Scholar
Digital Library
- John Lafferty. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML. Google Scholar
Digital Library
- Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. arXiv:1603.01360.Google Scholar
- Yang Liu. 2019. Fine-tune BERT for extractive summarization. arXiv:1903.10318.Google Scholar
- Zin Maung Maung and Yoshiki Mikami. 2008. A rule-based syllable segmentation of Myanmar text. In Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages.Google Scholar
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781.Google Scholar
- Aye Myat Mon, Soe Lai Phyue, Myint Myint Thein, Su Su Htay, and Thinn Thinn Win. 2010. Analysis of Myanmar word boundary and segmentation by using statistical approach. In Proceedings of the 2010 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE’10), Vol. 5. IEEE, V5-233–V5-237.Google Scholar
- Cynthia Myint. 2011. A hybrid approach for part-of-speech tagging of Burmese texts. In Proceedings of the 2011 International Conference on Computer and Management (CAMAN’11). IEEE, 1–4.Google Scholar
Cross Ref
- Phyu Hninn Myint, Tin Myat Htwe, and Ni Lar Thein. 2011. Bigram part-of-speech tagger for Myanmar language. In Proceedings of 2011 International Conference on Information Communication and Management, Singapore. 147–152.Google Scholar
- Dat Quoc Nguyen. [n.d.]. A neural joint model for Vietnamese word segmentation, POS tagging and dependency parsing.Google Scholar
- Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage re-ranking with BERT.Google Scholar
- Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. arXiv:1802.05365.Google Scholar
- Myat Lay Phyu and Kiyota Hashimoto. 2017. Burmese word segmentation with character clustering and CRFs. In Proceedings of the 2017 14th International Joint Conference on Computer Science and Software Engineering (JCSSE’17). IEEE, 1–6.Google Scholar
Cross Ref
- Tao Qian, Yue Zhang, Meishan Zhang, Yafeng Ren, and Donghong Ji. 2015. A transition-based model for joint segmentation, POS-tagging and normalization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1837–1846.Google Scholar
Cross Ref
- Xian Qian and Yang Liu. 2012. Joint Chinese word segmentation, POS tagging and parsing. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 501–511. Google Scholar
Digital Library
- Xipeng Qiu, Tianxiang Sun, Yige Xu, Yunfan Shao, Ning Dai, and Xuanjing Huang. 2020. Pre-trained models for natural language processing: A survey. Science China Technological Sciences (2020), 1–26.Google Scholar
Cross Ref
- Nuo Qun, Hang Yan, Xipeng Qiu, and Xuanjing Huang. 2020. Chinese word segmentation via BiLSTM+Semi-CRF with relay node. Journal of Computer Science and Technology 35, 5 (2020), 1115–1126.Google Scholar
Cross Ref
- Lin Songkai, Mao Cunli, Yu Zhengtao, Guo Jianyi, Wang Hongbin, and Zhang Jiafu. 2018. A method of Myanmar word segmentation based on convolution neural network. Journal of Chinese Information Processing 6 (2018), 8.Google Scholar
- Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, and Cordelia Schmid. 2019. VideoBERT: A joint model for video and language representation learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7464–7473.Google Scholar
Cross Ref
- Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to fine-tune BERT for text classification? CoRR abs/1905.05583 (2019). arxiv:1905.05583. http://arxiv.org/abs/1905.05583Google Scholar
- Weiwei Sun. 2011. A stacked sub-word model for joint Chinese word segmentation and part-of-speech tagging. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 1385–1394. Google Scholar
Digital Library
- Ian Tenney, Dipanjan Das, and Ellie Pavlick. 2019. Bert rediscovers the classical nlp pipeline. arXiv:1905.05950.Google Scholar
- Tun Thura Thet, Jin-Cheon Na, and Wunna Ko Ko. 2008. Word segmentation for the Myanmar language. Journal of Information Science 34, 5 (2008), 688–704. Google Scholar
Digital Library
- Ye Kyaw Thu, Andrew Finch, Eiichiro Sumita, and Yoshinori Sagisaka. 2014. Integrating dictionaries into an unsupervised model for Myanmar word segmentation. In Proceedings of the Fifth Workshop on South and Southeast Asian Natural Language Processing. 20–27.Google Scholar
- Ye Kyaw Thu, Win Pa Pa, Masao Utiyama, Andrew Finch, and Eiichiro Sumita. 2016. Introducing the Asian language treebank (ALT). In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16). 1574–1578.Google Scholar
- Yuxuan Wang, Wanxiang Che, Jiang Guo, Yijia Liu, and Ting Liu. 2019. Cross-lingual BERT transformation for zero-shot dependency parsing. arXiv:1909.06775.Google Scholar
- Liner Yang, Meishan Zhang, Yang Liu, Maosong Sun, Nan Yu, and Guohong Fu. 2017. Joint POS tagging and dependence parsing with transition-based neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing 26, 8 (2017), 1352–1358. Google Scholar
Digital Library
- Meishan Zhang, Nan Yu, and Guohong Fu. 2018. A simple and effective neural model for joint word segmentation and POS tagging. IEEE/ACM Transactions on Audio, Speech, and Language Processing 26, 9 (2018), 1528–1538. Google Scholar
Digital Library
- Meishan Zhang, Yue Zhang, Wanxiang Che, and Ting Liu. 2014. Character-level Chinese dependency parsing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1326–1336.Google Scholar
Cross Ref
- Shaoning Zhang, Cunli Mao, Zhengtao Yu, Hongbin Wang, Zhongwei Li, and Jiafu Zhang. 2018. Word segmentation for Burmese based on dual-layer CRFs. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 18, 1 (2018), 1–11. Google Scholar
Digital Library
- Yue Zhang and Stephen Clark. 2008. Joint word segmentation and POS tagging using a single perceptron. In Proceedings of ACL-08: HLT. 888–896.Google Scholar
- Yue Zhang and Stephen Clark. 2010. A fast decoder for joint word segmentation and POS-tagging using a single discriminative model. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 843–852. Google Scholar
Digital Library
- Xiaoqing Zheng, Hanyang Chen, and Tianyu Xu. 2013. Deep learning for Chinese word segmentation and POS tagging. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 647–657.Google Scholar
Index Terms
A Neural Joint Model with BERT for Burmese Syllable Segmentation, Word Segmentation, and POS Tagging
Recommendations
Word Segmentation for Burmese (Myanmar)
Experiments on various word segmentation approaches for the Burmese language are conducted and discussed in this note. Specifically, dictionary-based, statistical, and machine learning approaches are tested. Experimental results demonstrate that ...
A Unified Model for Joint Chinese Word Segmentation and POS Tagging with Heterogeneous Annotation Corpora
IALP '13: Proceedings of the 2013 International Conference on Asian Language ProcessingChinese word segmentation and part-of-speech tagging (S&T) are fundamental steps for more advanced Chinese language processing tasks. Recently, it has attracted more and more research interests to exploit heterogeneous annotation corpora for Chinese S&...
An effective joint model for chinese word segmentation and POS tagging
ICIIP '16: Proceedings of the 1st International Conference on Intelligent Information ProcessingChinese word segmentation and Part-of-speech (POS) tagging have been studied for decades. However, most of the previous works mainly focus on pipeline method which will lead to error propagation. In order to make word segmentation and POS tagging ...






Comments