Abstract
Machine Translation is an effort to bridge language barriers and misinterpretations, making communication more convenient through the automatic translation of languages. The quality of translations produced by corpus-based approaches predominantly depends on the availability of a large parallel corpus. Although machine translation of many Indian languages has progressively gained attention, there is very limited research on machine translation and the challenges of using various machine translation techniques for a low-resource language such as Mizo. In this article, we have implemented and compared statistical-based approaches with modern neural-based approaches for the English–Mizo language pair. We have experimented with different tokenization methods, architectures, and configurations. The performance of translations predicted by the trained models has been evaluated using automatic and human evaluation measures. Furthermore, we have analyzed the prediction errors of the models and the quality of predictions based on variations in sentence length and compared the model performance with the existing baselines.
- Benyamin Ahmadnia and Bonnie J. Dorr. 2019. Augmenting neural machine translation through round-trip training approach. Open Comput. Sci. 9, 1 (01 Jan. 2019), 268–278. DOI:https://doi.org/10.1515/comp-2019-0019Google Scholar
- B. Ahmadnia, P. Kordjamshidi, and G. Haffari. 2018. Neural machine translation advised by statistical machine translation: The case of farsi-spanish bilingually low-resource scenario. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA’18). 1209–1213. DOI:https://doi.org/doi: 10.1109/ICMLA.2018.00196Google Scholar
- Benyamin Ahmadnia, Javier Serrano, and Gholamreza Haffari. 2017. Persian-Spanish low-resource statistical machine translation through english as pivot language. In Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP’17). INCOMA Ltd., 24–30. DOI:https://doi.org/10.26615/978-954-452-049-6_004Google Scholar
Cross Ref
- Ebtesam H. Almansor and Ahmed Al-Ani. 2018. A hybrid neural machine translation technique for translating low resource languages. In Machine Learning and Data Mining in Pattern Recognition, Petra Perner (Ed.). Springer International Publishing, Cham, 347–356. DOI:https://doi.org/10.1007/978-3-319-96133-0_26Google Scholar
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. [n.d.]. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473. Retrieved from https://arxiv.org/abs/1409.0473.Google Scholar
- Jereemi Bentham, Partha Pakray, Goutam Majumder, Sunday Lalbiaknia, and Alexander Gelbukh. 2016. Identification of rules for recognition of named entity classes in Mizo language. In Proceedings of the 2016 15th Mexican International Conference on Artificial Intelligence (MICAI’16). IEEE, 8–13. DOI:https://doi.org/10.1109/MICAI-2016.2016.00010Google Scholar
Cross Ref
- Yun Chen, Yang Liu, Yong Cheng, and Victor O.K. Li. 2017. A teacher-student framework for zero-resource neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1925–1935. DOI:https://doi.org/10.18653/v1/P17-1176Google Scholar
- L. Chhangte. 1993. Mizo Syntax. Ph.D. Dissertation. University of Oregon, Eugene.Google Scholar
- Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). Association for Computational Linguistics, 1724–1734. DOI:https://doi.org/10.3115/v1/D14-1179Google Scholar
Cross Ref
- Michael Denkowski and Alon Lavie. 2014. Meteor universal: Language specific translation evaluation for any target language. In Proceedings of the 9th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 376–380. DOI:https://doi.org/10.3115/v1/W14-3348Google Scholar
Cross Ref
- Bonnie J Dorr, E. Hovy, and L. Levin. 2006. Machine Translation: Interlingual Methods. Elsevier, Oxford, 383–394. Google Scholar
- Indranil Dutta, Irfan S., Pamir Gogoi, and Priyankoo Sarmah. 2017. Nature of contrast and coarticulation: Evidence from Mizo tones and Assamese vowel harmony. In Proceedings of the Conference of the International Speech Communication Association (Interspeech’17). ISCA. DOI:https://doi.org/10.21437/interspeech.2017-1304Google Scholar
Cross Ref
- Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning, Doina Precup and Yee Whye Teh (Eds.), Vol. 70. International Convention Centre,1126–1135. Google Scholar
Digital Library
- Orhan Firat, Baskaran Sankaran, Yaser Al-onaizan, Fatos T. Yarman Vural, and Kyunghyun Cho. 2016. Zero-resource translation with multi-lingual neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 268–277. DOI:https://doi.org/10.18653/v1/D16-1026Google Scholar
Cross Ref
- Parismita Gogoi, Abhishek Dey, Wendy Lalhminghlui, Priyankoo Sarmah, and S. R. Mahadeva Prasanna. 2020. Lexical tone recognition in Mizo using acoustic-prosodic features. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association, 6458–6461. Google Scholar
- Jiatao Gu, Hany Hassan, Jacob Devlin, and Victor O. K. Li. 2018. Universal neural machine translation for extremely low resource languages. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, 344–354. DOI:https://doi.org/10.18653/v1/N18-1032Google Scholar
- Jiatao Gu, Yong Wang, Yun Chen, Victor O. K. Li, and Kyunghyun Cho. 2018. Meta-learning for low-resource neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 3622–3631. DOI:https://doi.org/10.18653/v1/D18-1398Google Scholar
Cross Ref
- Çaglar Gülçehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loïc Barrault, Huei-Chi Lin, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2015. On using monolingual corpora in neural machine translation. arxiv:1503.03535. Retrieved from https://arxiv.org/abs/1503.03535.Google Scholar
- Francisco Guzmán, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, and Marc’Aurelio Ranzato. 2019. The FLORES evaluation datasets for low-resource machine translation: Nepali–english and Sinhala–english. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 6097–6110. DOI:https://doi.org/10.18653/v1/D19-1632Google Scholar
Cross Ref
- Annette Hautli-Janisz. 2015. Pushpak Bhattacharyya: Machine translation. Mach. Transl. 29, 3 (01 Dec. 2015), 285–289. DOI:https://doi.org/10.1007/s10590-015-9170-7 Google Scholar
Digital Library
- Di He, Yingce Xia, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, and Wei-Ying Ma. 2016. Dual learning for machine translation. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16). Curran Associates Inc., 820–828. Google Scholar
Digital Library
- William John Hutchins and Harold L. Somers. 1992. An Introduction to Machine Translation. Vol. 362. Academic Press, London.Google Scholar
- Inigo Jauregi Unanue, Lierni Garmendia Arratibel, Ehsan Zare Borzeshi, and Massimo Piccardi. 2018. English-Basque statistical and neural machine translation. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC’18). European Language Resources Association (ELRA).Google Scholar
- Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2017. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Trans. Assoc. Comput. Ling. 5 (2017), 339–351. DOI:https://doi.org/10.1162/tacl_a_00065Google Scholar
Cross Ref
- Laltluangliana Khiangte. 2008. Mizos of North-east India: An Introduction to Mizo Culture, Folklore, Language & Literature. LTL Publications.Google Scholar
- Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander Rush. 2017. OpenNMT: Open-source toolkit for neural machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’17). Association for Computational Linguistics, 67–72.Google Scholar
Cross Ref
- Philipp Koehn. 2010. Statistical Machine Translation (1st ed.). Cambridge University Press, New York, NY. Google Scholar
Digital Library
- Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondřej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. Association for Computational Linguistics, 177–180. Google Scholar
Digital Library
- Candy Lalrempuii and Badal Soni. 2020. Attention-based english to Mizo neural machine translation. In Machine Learning, Image Processing, Network Security and Data Sciences. Springer Singapore, Singapore, 193–203. Google Scholar
- Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2018. Phrase-based & neural unsupervised machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 5039–5049. DOI:https://doi.org/10.18653/v1/D18-1549Google Scholar
Cross Ref
- Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.Google Scholar
- Sainik Kumar Mahata, Soumil Mandal, Dipankar Das, and Sivaji Bandyopadhyay. 2018. Smt vs nmt: A comparison over hindi & bengali simple sentences. arXiv:arXiv:1812.04898. Retrieved from https://arxiv.org/abs1812.04898.Google Scholar
- Goutam Majumder, Partha Pakray, Zoramdinthara Khiangte, and Alexander Gelbukh. 2018. Multiword expressions (MWE) for Mizo language: Literature survey. In Computational Linguistics and Intelligent Text Processing, Alexander Gelbukh (Ed.). Springer International Publishing, Cham, 623–635. DOI:https://doi.org/10.1007/978-3-319-75477-2_45Google Scholar
- Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Comput. Ling. 29, 1 (2003), 19–51. DOI:https://doi.org/10.1162/089120103321337421 Google Scholar
Digital Library
- Partha Pakray, Arunagshu Pal, Goutam Majumder, and Alexander Gelbukh. 2015. Resource building and parts-of-speech (pos) tagging for the mizo language. In Proceedings of the 2015 14th Mexican International Conference on Artificial Intelligence (MICAI’15). IEEE, 3–7. DOI:https://doi.org/10.1109/MICAI.2015.7 Google Scholar
Digital Library
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL’02). Association for Computational Linguistics, Stroudsburg, 311–318. DOI:https://doi.org/10.3115/1073083.1073135 Google Scholar
Digital Library
- Amarnath Pathak and Partha Pakray. 2019. Neural machine translation for Indian languages. J. Intell. Syst. 28, 3 (2019), 465–477. DOI:https://doi.org/10.1515/jisys-2018-0065Google Scholar
Cross Ref
- Amarnath Pathak, Partha Pakray, and Jereemi Bentham. 2019. English–Mizo machine translation using neural and statistical approaches. Neural Comput. Appl. 31, 11 (01 Nov 2019), 7615–7631. DOI:https://doi.org/10.1007/s00521-018-3601-3Google Scholar
- Sree Harsha Ramesh and Krishna Prasad Sankaranarayanan. 2018. Neural machine translation for low resource languages using bilingual lexicon induced from comparable corpora. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop. Association for Computational Linguistics,112–119. DOI:https://doi.org/10.18653/v1/N18-4016Google Scholar
Cross Ref
- Sandeep Saini and Vineet Sahula. 2015. A survey of machine translation techniques and systems for indian languages. In Proceedings of the 2015 IEEE International Conference on Computational Intelligence & Communication Technology. 676–681. DOI:https://doi.org/doi: 10.1109/CICT.2015.123Google Scholar
Cross Ref
- Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1715–1725. DOI:https://doi.org/10.18653/v1/P16-1162Google Scholar
Cross Ref
- Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the Association for Machine Translation in the Americas, Vol. 200. Cambridge, MA, 223–231.Google Scholar
- Harold Somers. 1999. Review article: Example-based machine translation. Mach. Transl. 14, 2 (1999), 113–157. Google Scholar
Digital Library
- Andreas Stolcke. 2004. Srilm—An extensible language modeling toolkit. In Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP’02).Google Scholar
- Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 (NIPS’14). MIT Press, Cambridge, MA, 3104–3112. Google Scholar
Digital Library
- Sneha Tripathi and Juran Sarkhel. 2011. Approaches to machine translation. Ann. Libr. Inf. Stud. 57, 4 (01 2011), 388–393. Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, 6000–6010. Google Scholar
Digital Library
- Biao Zhang, Deyi Xiong, Jinsong Su, and Hong Duan. 2017. A context-aware recurrent encoder for neural machine translation. IEEE/ACM Trans. Aud. Speech Lang. Process. 25, 12 (2017), 2424–2432. DOI:https://doi.org/doi: 10.1109/TASLP.2017.2751420 Google Scholar
Digital Library
- Hao Zheng, Yong Cheng, and Yang Liu. 2017. Maximum expected likelihood estimation for zero-resource neural machine translation. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). AAAI Press, 4251–4257. DOI:https://doi.org/10.24963/ijcai.2017/594 Google Scholar
Digital Library
- Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. 2016. Transfer learning for low-resource neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1568–1575. DOI:https://doi.org/10.18653/v1/D16-1163Google Scholar
Cross Ref
Index Terms
An Improved English-to-Mizo Neural Machine Translation
Recommendations
A comprehensive understanding of popular machine translation evaluation metrics
Machine translation is one of the pioneer applications of natural language processing and artificial intelligence. Automatic evaluation of the translation performance of the machine translators is one of the most challenging tasks, as manual evaluation of ...
Metrics for MT evaluation: evaluating reordering
Translating between dissimilar languages requires an account of the use of divergent word orders when expressing the same semantic content. Reordering poses a serious problem for statistical machine translation systems and has generated a considerable ...
Human versus automatic quality evaluation of NMT and PBSMT
Neural machine translation (NMT) has recently gained substantial popularity not only in academia, but also in industry. For its acceptance in industry it is important to investigate how NMT performs in comparison to the phrase-based statistical MT (...






Comments