Abstract
Nearly all of the work in neural machine translation (NMT) is limited to a quite restricted vocabulary, crudely treating all other words the same as an < unk> symbol. For the translation of language with abundant morphology, unknown (UNK) words also come from the misunderstanding of the translation model to the morphological changes. In this study, we explore two ways to alleviate the UNK problem in NMT: a new generative adversarial network (added value constraints and semantic enhancement) and a preprocessing technique that mixes morphological noise. The training process is like a win-win game in which the players are three adversarial sub models (generator, filter, and discriminator). In this game, the filter is to emphasize the discriminator’s attention to the negative generations that contain noise and improve the training efficiency. Finally, the discriminator cannot easily discriminate the negative samples generated by the generator with filter and human translations. The experimental results show that the proposed method significantly improves over several strong baseline models across various language pairs and the newly emerged Mongolian-Chinese task is state-of-the-art.
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv 1409, 0473.Google Scholar
- Xilun Chen, Yu Sun, et al. 2016. Adversarial deep averaging networks for cross-lingual sentiment classification. In Association for Computational Linguistics (ACL’16). 557--570.Google Scholar
- J. Chung, C. Gulcehre, K. H. Cho, et al. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv 4, 3555.Google Scholar
- P. Dayan and L. F. Abbott. 2003. Theoretical neuroscience: Computational and mathematical modelling of neural systems. Journal of Cognitive Neuroscience 15, 1 (2003), 154--155. Google Scholar
Digital Library
- Yue Wang Fei, Zhang Jie, et al. 2017. PDP: Parallel dynamic programming. IEEE/CAA Journal of Automatica Sinica and IEEE 4, 1 (2017), 1--5.Google Scholar
Cross Ref
- Jonas Gehring, Michael Auli, David Grangier, et al. 2017. Convolutional sequence to sequence learning. In International Conference on Machine Learning (ICML’17). 1243--1252. Google Scholar
Digital Library
- Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv preprint arXiv 1308.0850.Google Scholar
- Caglar Gulcehre, Sungjin Ahn, et al. 2016. Pointing the unknown words. In Association for Computational Linguistics (ACL’16). 140--149.Google Scholar
- Karl Moritz Hermann, Tomáŝ Koĉiský, et al. 2015. Teaching machines to read and comprehend. In Conference and Workshop on Neural Information Processing Systems (NIPS’15). 1693--1701. Google Scholar
Digital Library
- Sébastien Jean, Kyunghyun Cho, et al. 2015. On using very large target vocabulary for neural machine translation. In Association for Computational Linguistics (ACL’15). 1--10.Google Scholar
- Minh-Thang Luong, Ilya Sutskever, et al. 2015. Addressing the rare word problem in neural machine translation. In International Joint Conference on Natural Language Processing (IJCNLP’15). 11--19.Google Scholar
Cross Ref
- F. Meunier and C. M. Longtin. 2007. Morphological decomposition and semantic integration in word processing. Journal of Memory and Language 56, 4 (2007), 457--471.Google Scholar
Cross Ref
- Tomáš Mikolov, Stefan Kombrink, et al. 2011. Extensions of recurrent neural network language model. In International Conference on Acoustics, Speech, and Signal Processing. 5528--5531.Google Scholar
Cross Ref
- Volodymyr Mnih, Koray Kavukcuoglu, et al. 2013. Playing Atari with deep reinforcement learning. arXiv 1312.5602 (2013).Google Scholar
- Frederic Morin and Yoshua Bengio. 2005. Hierarchical probabilistic neural network language model. In International Conference on Artificial Intelligence and Statistics (AiStats’05). 246--252.Google Scholar
- Kishore Papineni, Salim Roukos, et al. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL’02). 311--318. Google Scholar
Digital Library
- Marc’Aurelio Ranzato, Sumit Chopra, et al. 2015. Sequence level training with recurrent neural networks. arXiv 1511.06732 (2015).Google Scholar
- Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Association for Computational Linguistics (ACL’16). 1715--1725.Google Scholar
- Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Conference and Workshop on Neural Information Processing Systems (NIPS’14). 3104--3112. Google Scholar
Digital Library
- A. Tamar, Y. Wu, G. Thomas, et al. 2016. Value iteration networks. In Neural Information Processing Systems (NIPS’16). 2154--2162. Google Scholar
Digital Library
- Ashish Vaswani, Noam Shazeer, et al. 2017. Attention is all you need. In Conference and Workshop on Neural Information Processing Systems (NIPS’17). 5998--6008. Google Scholar
Digital Library
- L. Wu, Y. Xia, L. Zhao, et al. 2018. Adversarial neural machine translation. In Asian Conference on Machine Learning (ACML’18). 374--385.Google Scholar
- Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. 2016. SeqGAN: Sequence generative adversarial nets with policy gradient. In The Association for the Advancement of Artificial Intelligence (AAAI’16). 2852--2858. Google Scholar
Digital Library
- Yuan Zhang, Regina Barzilay, and Tommi Jaakkola. 2017. Aspect-augmented adversarial networks for domain adaptation. Transactions of the Association for Computational Linguistics 5, 1 (2017), 515--528.Google Scholar
Cross Ref
- Zhen Yang, Wei Chen, Feng Wang, and Bo Xu. 2018. Improving neural machine translation with conditional sequence generative adversarial nets. In The North American Chapter of the Association for Computational Linguistics (NAACL’18). 1346--1355.Google Scholar
- Jun Yan Zhu, Taesung Park, et al. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In IEEE International Conference on Computer Vision (ICCV’17). 2223--2232.Google Scholar
Cross Ref
Index Terms
Adversarial Training for Unknown Word Problems in Neural Machine Translation
Recommendations
Explicitly Modeling Word Translations in Neural Machine Translation
In this article, we show that word translations can be explicitly incorporated into NMT effectively to avoid wrong translations. Specifically, we propose three cross-lingual encoders to explicitly incorporate word translations into NMT: (1) Factored ...
Using Translation Memory to Improve Neural Machine Translations
ICDLT '22: Proceedings of the 2022 6th International Conference on Deep Learning TechnologiesIn this paper, we describe a way of using translation memory (TM) to improve the translation quality and stability of neural machine translation (NMT) systems, especially when the sentences to be translated have high similarity with sentences stored in ...
Analysing terminology translation errors in statistical and neural machine translation
AbstractTerminology translation plays a critical role in domain-specific machine translation (MT). Phrase-based statistical MT (PB-SMT) has been the dominant approach to MT for the past 30 years, both in academia and industry. Neural MT (NMT), an end-to-...






Comments