Abstract
In this article, we show that word translations can be explicitly incorporated into NMT effectively to avoid wrong translations. Specifically, we propose three cross-lingual encoders to explicitly incorporate word translations into NMT: (1) Factored encoder, which encodes a word and its translation in a vertical way; (2) Gated encoder, which uses a gated mechanism to selectively control the amount of word translations moving forward; and (3) Mixed encoder, which stitchingly learns a word and its translation annotations over sequences where words and their translations are alternatively mixed. Besides, we first use a simple word dictionary approach and then a word sense disambiguation (WSD) approach to effectively model the word context for better word translation. Experimentation on Chinese-to-English translation demonstrates that all proposed encoders are able to improve the translation accuracy for both traditional RNN-based NMT and recent self-attention-based NMT (hereafter referred to as Transformer). Specifically, Mixed encoder yields the most significant improvement of 2.0 in BLEU on the RNN-based NMT, while Gated encoder improves 1.2 in BLEU on Transformer. This indicates the usefulness of an WSD approach in modeling word context for better word translation. This also indicates the effectiveness of our proposed cross-lingual encoders in explicitly modeling word translations to avoid wrong translations in NMT. Finally, we discuss in depth how word translations benefit different NMT frameworks from several perspectives.
- Roee Aharoni and Yoav Goldberg. 2017. Towards string-to-tree neural machine translation. In Proceedings of the ACL 2017 (Volume 2: Short Papers). 132--140.Google Scholar
Cross Ref
- Philip Arthur, Graham Neubig, and Satoshi Nakamura. 2016. Incorporating discrete translation lexicons into neural machine translation. In Proceedings of the EMNLP 2016. 1557--1567.Google Scholar
Cross Ref
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the ICLR 2015.Google Scholar
- Rajen Chatterjee, Matteo Negri, Marco Turchi, Marcello Federico, Lucia Specia, and Frédéric Blain. 2017. Guiding neural machine translation decoding with external knowledge. In Proceedings of the WMT 2017. 157--168.Google Scholar
Cross Ref
- Huadong Chen, Shujian Huang, David Chiang, and Jiajun Chen. 2017. Improved neural machine translation with a syntax-aware encoder and decoder. In Proceedings of the ACL 2017. 1936--1945.Google Scholar
Cross Ref
- Kehai Chen, Rui Wang, Masao Utiyama, Lemao Liu, Akihiro Tamura, Eiichiro Sumita, and Tiejun Zhao. 2017. Neural machine translation with source dependency representation. In Proceedings of the EMNLP 2017. 2846--2852.Google Scholar
Cross Ref
- Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the EMNLP 2014. 1724--1734.Google Scholar
Cross Ref
- Chris Dyer, Victor Chahuneau, and Noah A. Smith. 2013. A simple, fast, and effective reparameterization of IBM model 2. In Proceedings of the NAACL 2013. 644--648.Google Scholar
- Chris Dyer, Adam Lopez, Juri Ganitkevitch, Jonathan Weese, Ferhan Ture, Phil Blunsom, Hendra Setiawan, Vladimir Eidelman, and Philip Resnik. 2010. cdec: A decoder, alignment, and learning framework for finite-state and context-free translation models. In Proceedings of the ACL 2010 System Demonstrations. 7--12. Google Scholar
Digital Library
- Akiko Eriguchi, Kazuma Hashimoto, and Yoshimasa Tsuruoka. 2016. Tree-to-sequence attentional neural machine translation. In Proceedings of the ACL 2016. 823--833.Google Scholar
Cross Ref
- Akiko Eriguchi, Yoshimasa Tsuruoka, and Kyunghyun Cho. 2017. Learning to parse and translate improves neural machine translation. In Proceedings of the ACL 2017 (Volume 2: Short Papers). 72--78.Google Scholar
Cross Ref
- Sébastien Jean, Orhan Firat, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio. 2015. Montreal neural machine translation systems for WMT’15. In Proceedings of the WMT 2015. 134--140.Google Scholar
Cross Ref
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of ICLR.Google Scholar
- Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the EMNLP 2004. 388--395.Google Scholar
- Shaohui Kuang, Junhui Li, Antonio Branco, Weihua Luo, and Deyi Xiong. 2018. Attention focusing for neural machine translation by bridging source and target embeddings. In Proceedings of the ACL 2018. 1767--1776.Google Scholar
Cross Ref
- Junhui Li, Deyi Xiong, Zhaopeng Tu, Muhua Zhu, Min Zhang, and Guodong Zhou. 2017. Modeling source syntax for neural machine translation. In Proceedings of the ACL 2017. 688--697.Google Scholar
Cross Ref
- Frederick Liu, Han Lu, and Graham Neubig. 2018. Handling homographs in neural machine translation. In Proceedings of the NAACL 2018.Google Scholar
Cross Ref
- Lemao Liu, Masao Utiyama, Andrew Finch, and Eiichiro Sumita. 2016. Neural machine translation with supervised attention. In Proceedings of the COLING 2016. 3093--3102.Google Scholar
- Yang Liu and Maosong Sun. 2015. Contrastive unsupervised word alignment with non-local features. In Proceedings of the AAAI 2015. 857--868. Google Scholar
Digital Library
- Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the EMNLP 2015. 1412--1421.Google Scholar
- Benjamin Marie and Atsushi Fujita. 2018. Phrase table induction using monolingual data for low-resource statistical machine translation. ACM Trans. Asian Low-Resource Lang. Inf. Proc. 17, 3 (2018), Article No. 16. Google Scholar
Digital Library
- Haitao Mi, Zhiguo Wang, and Abe Ittycheriah. 2016. Supervised attentions for neural machine translation. In Proceedings of the EMNLP 2016. 2283--2288.Google Scholar
Cross Ref
- Toan Q. Nguyen and David Chiang. 2017. Improving lexical choice in neural machine translation. Retrieved from: arXiv:1710.01329.Google Scholar
- Jan Niehues, Eunah Cho, Thanh-Le Ha, and Alex Waibel. 2016. Pre-translation for neural machine translation. In Proceedings of the COLING 2016. 1828--1836.Google Scholar
- Franz J. Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Comput. Ling. 29, 1 (2003), 19--51. Google Scholar
Digital Library
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the ACL 2002. 311--318. Google Scholar
Digital Library
- Ofir Press and Lior Wolf. 2017. Using the output embedding to improve language models. In Proceedings of the EACL 2017. 157--163.Google Scholar
Cross Ref
- Alessandro Raganato, Claudio Delli Bovi, and Roberto Navigli. 2017. Neural sequence learning models for word sense disambiguation. In Proceedings of the EMNLP 2017. 1156--1167.Google Scholar
Cross Ref
- Annette Rios, Laura Mascarell, and Rico Sennrich. 2017. Improving word sense disambiguation in neural machine translation with sense embeddings. In Proceedings of the WMT 2017.Google Scholar
- Rico Sennrich and Barry Haddow. 2016. Linguistic input features improve neural machine translation. In Proceedings of the 1st Conference on Machine Translation. 83--91.Google Scholar
Cross Ref
- Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the ACL 2016. 1715--1725.Google Scholar
Cross Ref
- Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the AMTA.Google Scholar
- Phuoc Tran, Dien Dinh, and Long H. B. Nguyen. 2016. Word re-segmentation in Chinese-Vietnamese machine translation. ACM Trans. Asian Low-Resource Lang. Inf. Proc. 16, 2 (2016), Article No. 12. Google Scholar
Digital Library
- Zhaopeng Tu, Yang Liu, Lifeng Shang, Xiaohua Liu, and Hang Li. 2017. Neural machine translation with reconstruction. In Proceedings of the AAAI 2017. 3097--3103. Google Scholar
Digital Library
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the NIPS 2017. 6000--6010. Google Scholar
Digital Library
- Xing Wang, Zhengdong Lu, Zhaopeng Tu, Hang Li, Deyi Xiong, and Min Zhang. 2017. Neural machine translation advised by statistical machine translation. In Proceedings of the AAAI 2017. 3330--3336. Google Scholar
Digital Library
- Xing Wang, Zhaopeng Tu, Deyi Xiong, and Min Zhang. 2017. Translating phrases in neural machine translation. In Proceedings of the EMNLP 2017. 1421--1431.Google Scholar
Cross Ref
- Rongxiang Weng, Shujian Huang, Zaixiang Zheng, XIN-YU DAI, and Jiajun CHEN. 2017. Neural machine translation with word predictions. In Proceedings of the EMNLP 2017. 136--145.Google Scholar
Cross Ref
- Shuangzhi Wu, Dongdong Zhang, Nan Yang, Mu Li, and Ming Zhou. 2017. Sequence-to-dependency neural machine translation. In Proceedings of the ACL 2017. 698--707.Google Scholar
Cross Ref
- Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. Retrieved from: arXiv:1609.08144.Google Scholar
- Matthew D. Zeiler. 2012. ADADELTA: An adaptive learning rate method. Retrieved from: arXiv:1212.5701.Google Scholar
- Long Zhou, Wenpeng Hu, Jiajun Zhang, and Chengqing Zong. 2017. Neural system combination for machine translation. In Proceedings of the ACL 2017 (Volume 2: Short Papers). 378--384.Google Scholar
Cross Ref
- Barret Zoph and Kevin Knight. 2016. Multi-source neural translation. In Proceedings of the HLT-NAACL 2016. 30--34.Google Scholar
Index Terms
Explicitly Modeling Word Translations in Neural Machine Translation
Recommendations
Neural Machine Translation Enhancements through Lexical Semantic Network
ICCMS '18: Proceedings of the 10th International Conference on Computer Modeling and SimulationIn most languages, many words have multiple senses, thus machine translation systems have to choose between several candidates representing different senses of an input word. Although neural machine translation has recently become a dominant paradigm ...
Using Translation Memory to Improve Neural Machine Translations
ICDLT '22: Proceedings of the 2022 6th International Conference on Deep Learning TechnologiesIn this paper, we describe a way of using translation memory (TM) to improve the translation quality and stability of neural machine translation (NMT) systems, especially when the sentences to be translated have high similarity with sentences stored in ...
On integrating a language model into neural machine translation
Recent advances in end-to-end neural machine translation models have achieved promising results on high-resource language pairs such as En Fr and En De. One of the major factor behind these successes is the availability of high quality parallel corpora. ...






Comments