Abstract
How to utilize information sufficiently is a key problem in neural machine translation (NMT), which is effectively improved in rich-resource NMT by leveraging large-scale bilingual sentence pairs. However, for low-resource NMT, lack of bilingual sentence pairs results in poor translation performance; therefore, taking full advantage of global information in the encoding-decoding process is effective for low-resource NMT. In this article, we propose a novel reread-feedback NMT architecture (RFNMT) for using global information. Our architecture builds upon the improved sequence-to-sequence neural network and consists of a double-deck attention-based encoder-decoder framework. In our proposed architecture, the information generated by the first-pass encoding and decoding process flows to the second-pass encoding process for more sufficient parameters initialization and information use. Specifically, we first propose a “reread” mechanism to transfer the outputs of the first-pass encoder to the second-pass encoder, and then the output is used for the initialization of the second-pass encoder. Second, we propose a “feedback” mechanism that transfers the first-pass decoder’s outputs to a second-pass encoder via an important weight model and an improved gated recurrent unit (GRU). Experiments on multiple datasets show that our approach achieves significant improvements over state-of-the-art NMT systems, especially in low-resource settings.
- Mikel Artetxe, Gorka Labaka, Eneko Agirre, and Kyunghyun Cho. 2017. Unsupervised Neural Machine Translation. arxiv:cs.CL/1710.11041Google Scholar
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. arxiv:cs.CL/1409.0473Google Scholar
- Franck Burlot and François Yvon. 2019. Using Monolingual Data in Neural Machine Translation: A Systematic Study. arxiv:cs.CL/1903.11437Google Scholar
- Rajen Chatterjee, José de Souza, Matteo Negri, and Marco Turchi. 2016. The FBK participation in the WMT 2016 automatic post-editing shared task. In Proceedings of the First Conference on Machine Translation, Volume 2: Shared Task Papers. 745--750.Google Scholar
Cross Ref
- Yun Chen, Yang Liu, and Victor O. K. Li. 2018. Zero-Resource Neural Machine Translation with Multi-Agent Communication Game. arxiv:cs.CL/1802.03116Google Scholar
- Kyunghyun Cho, Bart Van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation. Association for Computational Linguistics, Doha, Qatar, 103--111. https://doi.org/10.3115/v1/W14-4012Google Scholar
Cross Ref
- Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). DOI:https://doi.org/10.3115/v1/d14-1179Google Scholar
Cross Ref
- Marta R. Costa-Jussà, Noé Casas, Carlos Escolano, and José A. R. Fonollosa. 2019. Chinese-catalan: A neural machine translation approach based on pivoting and attention mechanisms. ACM Transactions on Asian and Low-Resource Language Information Processing 18, 4, Article 43 (April 2019), 8 pages. DOI:https://doi.org/10.1145/3312575Google Scholar
Digital Library
- Marzieh Fadaee, Arianna Bisazza, and Christof Monz. 2017. Data augmentation for low-resource neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Vancouver, Canada, 567--573. https://doi.org/10.18653/v1/P17-2090Google Scholar
Cross Ref
- Orhan Firat, Kyunghyun Cho, and Yoshua Bengio. 2016. Multi-way, multilingual neural machine translation with a shared attention mechanism. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, San Diego, California, 866--875. https://doi.org/10.18653/v1/N16-1101Google Scholar
Cross Ref
- Nicolas Ford, Daniel Duckworth, Mohammad Norouzi, and George E. Dahl. 2018. The Importance of Generation Order in Language Modeling. arxiv:cs.LG/1808.07910Google Scholar
- Jiatao Gu, Hany Hassan, Jacob Devlin, and Victor O. K. Li. 2018. Universal neural machine translation for extremely low resource languages. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). DOI:https://doi.org/10.18653/v1/n18-1032Google Scholar
- Caglar Gulcehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loic Barrault, Huei Chi Lin, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2015. On Using Monolingual Corpora in Neural Machine Translation. arXiv:cs.CL/1503.03535.Google Scholar
- Sébastien Jean, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio. 2015. On Using Very Large Target Vocabulary for Neural Machine Translation. arXiv:cs.CL/1412.2007.Google Scholar
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arxiv:cs.LG/1412.6980Google Scholar
- Guillaume Lample and Alexis Conneau. 2019. Cross-Lingual Language Model Pretraining. arxiv:cs.CL/1901.07291Google Scholar
- Guillaume Lample, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2017. Unsupervised Machine Translation Using Monolingual Corpora Only. arXiv:cs.CL/1711.00043.Google Scholar
- Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2018. Phrase-Based and Neural Unsupervised Machine Translation. arxiv:cs.CL/1804.07755Google Scholar
- Hideki Nakayama and Noriki Nishida. 2017. Zero-resource machine translation by multimodal encoder-decoder network with multimedia pivot. Machine Translation 31, 1 (2017), 49--64. https://doi.org/10.1007/s10590-017-9197-zGoogle Scholar
Digital Library
- Jan Niehues, Eunah Cho, Thanh Le Ha, and Alex Waibel. 2016. Pre-translation for neural machine translation. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. The COLING 2016 Organizing Committee, Osaka, Japan, 1828--1836. https://www.aclweb.org/anthology/C16-1172.Google Scholar
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL’02). Association for Computational Linguistics, Stroudsburg, PA, 311--318. DOI:https://doi.org/10.3115/1073083.1073135Google Scholar
- Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 11 (Nov 1997), 2673--2681. https://doi.org/10.1109/78.650093Google Scholar
Digital Library
- Rico Sennrich, Barry Haddow, and Alexandra Birch. 2015. Improving Neural Machine Translation Models with Monolingual Data. arxiv:cs.CL/1511.06709Google Scholar
- Matthew Snover, Nitin Madnani, Bonnie J. Dorr, and Richard Schwartz. 2009. Fluency, adequacy, or HTER?: Exploring different human judgments with a tunable MT metric. In Proceedings of the 4th Workshop on Statistical Machine Translation (StatMT’09). Association for Computational Linguistics, Stroudsburg, PA, 259--268. Retrieved from http://dl.acm.org/citation.cfm?id=1626431.1626480.Google Scholar
Cross Ref
- Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. arXiv:cs.CL/1409.3215.Google Scholar
Digital Library
- Zhaopeng Tu, Zhengdong Lu, Liu Yang, Xiaohua Liu, and Li Hang. 2016. Modeling coverage for neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 76--85. https://doi.org/10.18653/v1/P16-1008Google Scholar
Cross Ref
- Yingce Xia, Fei Tian, Lijun Wu, Jianxin Lin, Tao Qin, Nenghai Yu, and Tie-Yan Liu. 2017. Deliberation networks: Sequence generation beyond one-pass decoding. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 1784--1794. Retrieved from http://papers.nips.cc/paper/6775-deliberation-networks-sequence-generation-beyond-one-pass-decoding.pdf.Google Scholar
- Zhilin Yang, Ye Yuan, Yuexin Wu, Ruslan Salakhutdinov, and William W. Cohen. 2016. Review Networks for Caption Generation. arxiv:cs.LG/1605.07912Google Scholar
Digital Library
- Cheng Yong, Xu Wei, Zhongjun He, He Wei, Wu Hua, Maosong Sun, and Liu Yang. 2016. Semi-Supervised Learning for Neural Machine Translation. arXiv:cs.CL/1606.04596.Google Scholar
- Cheng Yong, Liu Yang, Yang Qian, Maosong Sun, and Xu Wei. 2016. Neural Machine Translation with Pivot Languages. arXiv:cs.CL/1611.04928.Google Scholar
- Chen Yun, Liu Yang, Cheng Yong, and Victor O. K. Li. 2017. A teacher-student framework for zero-resource neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, 1925--1935. https://doi.org/10.18653/v1/P17-1176Google Scholar
- Wenyuan Zeng, Wenjie Luo, Sanja Fidler, and Raquel Urtasun. 2016. Efficient summarization with read-again and copy mechanism. arXiv:cs.CL/1611.03382.Google Scholar
- Jiajun Zhang and Chengqing Zong. 2016. Exploiting source-side monolingual data in neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, 1535--1545. https://doi.org/10.18653/v1/D16-1160Google Scholar
Cross Ref
- Yang Zhen, Chen Wei, Wang Feng, and Xu Bo. 2018. Unsupervised neural machine translation with weight sharing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 46--55. https://doi.org/10.18653/v1/P18-1005Google Scholar
- Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. 2016. Transfer learning for low-resource neural machine translation. In Proceeding of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas. 1568--1575. https://doi.org/10.18653/v1/D16-1163Google Scholar
Cross Ref
Index Terms
Efficient Low-Resource Neural Machine Translation with Reread and Feedback Mechanism
Recommendations
A Novel Neural Machine Translation Approach for low-resource Sanskrit-Hindi Language pair
Sanskrit is one of the earliest native languages and is correctly described as "the gods' language" because of its wide use in Indian religious literature from the past. However, it is becoming less popular in modern India. Due in significant part to the ...
Extremely low-resource neural machine translation for Asian languages
AbstractThis paper presents a set of effective approaches to handle extremely low-resource language pairs for self-attention based neural machine translation (NMT) focusing on English and four Asian languages. Starting from an initial set of parallel ...
Low-resource Neural Machine Translation: Methods and Trends
Neural Machine Translation (NMT) brings promising improvements in translation quality, but until recently, these models rely on large-scale parallel corpora. As such corpora only exist on a handful of language pairs, the translation performance is far ...






Comments