skip to main content
research-article

Improving Neural Machine Translation with Linear Interpolation of a Short-Path Unit

Published:07 February 2020Publication History
Skip Abstract Section

Abstract

In neural machine translation (NMT), the source and target words are at the two ends of a large deep neural network, normally mediated by a series of non-linear activations. The problem with such consequent non-linear activations is that they significantly decrease the magnitude of the gradient in a deep neural network, and thus gradually loosen the interaction between source words and their translations. As a result, a source word may be incorrectly translated into a target word out of its translational equivalents. In this article, we propose short-path units (SPUs) to strengthen the association of source and target words by allowing information flow over adjacent layers effectively via linear interpolation. In particular, we enrich three critical NMT components with SPUs: (1) an enriched encoding model with SPU, which interpolates source word embeddings linearly into source annotations; (2) an enriched decoding model with SPU, which enables the source context linearly flow to target-side hidden states; and (3) an enriched output model with SPU, which further allows linear interpolation of target-side hidden states into output states. Experimentation on Chinese-to-English, English-to-German, and low-resource Tibetan-to-Chinese translation tasks demonstrates that the linear interpolation of SPUs significantly improves the overall translation quality by 1.88, 1.43, and 3.75 BLEU, respectively. Moreover, detailed analysis shows that our approaches much strengthen the association of source and target words. From the preceding, we can see that our proposed model is effective both in rich- and low-resource scenarios.

References

  1. Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. Computing Research Repository. arXiv:1607.06450.Google ScholarGoogle Scholar
  2. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15).Google ScholarGoogle Scholar
  3. Ankur Bapna, Mia Chen, Orhan Firat, Yuan Cao, and Yonghui Wu. 2018. Training deeper neural machine translation models with transparent attention. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’18). 3028--3033. DOI:https://doi.org/10.18653/v1/D18-1338Google ScholarGoogle ScholarCross RefCross Ref
  4. Yoshua Bengio, Patrice Simard, and Paolo Frasconi. 1994. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5, 2 (1994), 157--166.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machinetranslation: Encoder-decoder approaches. In Proceedings of the Workshop on Syntax, Semantics, and Structure in Statistical Translation (SSST’14). 103--111.Google ScholarGoogle Scholar
  6. Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder--decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1724--1734. DOI:https://doi.org/10.3115/v1/D14-1179Google ScholarGoogle ScholarCross RefCross Ref
  7. Tobias Domhan. 2018. How much attention do you need? A granular analysis of neural machine translation architectures. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL’18). 1799--1808. DOI:https://doi.org/10.18653/v1/P18-1167Google ScholarGoogle ScholarCross RefCross Ref
  8. Zi-Yi Dou, Zhaopeng Tu, Xing Wang, Longyue Wang, Shuming Shi, and Tong Zhang. 2019. Dynamic layer aggregation for neural machine translation with routing-by-agreement. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI’19).Google ScholarGoogle ScholarCross RefCross Ref
  9. Jonas Gehring, Michael Auli, David Grangier, and Yann Dauphin. 2017. A convolutional encoder model for neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17). 123--135. DOI:https://doi.org/10.18653/v1/P17-1012Google ScholarGoogle ScholarCross RefCross Ref
  10. Dong Han, Junhui Li, Yachao Li, Min Zhang, and Guodong Zhou. 2019. Explicitly modeling word translations in neural machine translation. ACM Transactions on Asian and Low-Resource Language Information Processing 19, 1 (2019), Article 15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’16). 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  12. Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. Computing Research Repository. arXiv:1207.0580.Google ScholarGoogle Scholar
  13. Sepp Hochreiter and Jürgeni Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Sébastien Jean, Orhan Firat, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio. 2015. Montreal neural machine translation systems for WMT’15. In Proceedings of the 10th Workshop on Statistical Machine Translation (WMT’16). 134--140. DOI:https://doi.org/10.18653/v1/W15-3014Google ScholarGoogle ScholarCross RefCross Ref
  15. Ákos Kádár, Grzegorz Chrupała, and Afra Alishahi. 2017. Representation of linguistic form and function in recurrent neural networks. Computational Linguistics 43, 4 (Dec. 2017), 761--780. DOI:https://doi.org/10.1162/COLI_a_00300Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Nal Kalchbrenner, Ivo Danihelka, and Alex Graves. 2016. Grid long short-term memory. In Proceedings of the 4th International Conference on Learning Representations (ICLR’16).Google ScholarGoogle Scholar
  17. Jaeyoung Kim, Mostafa El-Khamy, and Jungwon Lee. 2017. Residual LSTM: Design of a deep recurrent architecture for distant speech recognition. arXiv:1701.03360v3.Google ScholarGoogle Scholar
  18. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 4th International Conference on Learning Representations (ICLR’15).Google ScholarGoogle Scholar
  19. Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP’04). 388--395. https://www.aclweb.org/anthology/W04-3250Google ScholarGoogle Scholar
  20. Shaohui Kuang, Junhui Li, António Branco, Weihua Luo, and Deyi Xiong. 2018. Attention focusing for neural machine translation by bridging source and target embeddings. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL’18). 1767--1776. DOI:https://doi.org/10.18653/v1/P18-1164Google ScholarGoogle ScholarCross RefCross Ref
  21. Junhui Li, Deyi Xiong, Zhaopeng Tu, Muhua Zhu, Min Zhang, and Guodong Zhou. 2017. Modeling source syntax for neural machine translation. In Proceedings of of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17). 688--697. DOI:https://doi.org/10.18653/v1/P17-1064Google ScholarGoogle ScholarCross RefCross Ref
  22. Yachao Li, Junhui Li, and Min Zhang. 2018. Adaptive weighting for neural machine translation. In Proceedings of the 27th International Conference on Computational Linguistics (COLING’18). 3038--3048. https://www.aclweb.org/anthology/C18-1257Google ScholarGoogle Scholar
  23. Yang Liu and Maosong Sun. 2015. Contrastive unsupervised word alignment with non-local features. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI’15). 857--868.Google ScholarGoogle Scholar
  24. Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods on Natural Language Processing (EMNLP’15). 1412--1421. DOI:https://doi.org/10.18653/v1/D15-1166Google ScholarGoogle Scholar
  25. Antonio Valerio Miceli Barone, Jindřich Helcl, Rico Sennrich, Barry Haddow, and Alexandra Birch. 2017. Deep architectures for neural machine translation. In Proceedings of the 2nd Conference on Machine Translation. 99--107. DOI:https://doi.org/10.18653/v1/W17-4710Google ScholarGoogle ScholarCross RefCross Ref
  26. Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29, 1 (2003), 19--51. DOI:https://doi.org/10.1162/089120103321337421Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’02). 311--318. DOI:https://doi.org/10.3115/1073083.1073135Google ScholarGoogle Scholar
  28. Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 1310--1318.Google ScholarGoogle Scholar
  29. Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’16). 1715--1725. DOI:https://doi.org/10.18653/v1/P16-1162Google ScholarGoogle ScholarCross RefCross Ref
  30. Rupesh Kumar Srivastava, Klaus Greff, and Jurgen Schmidhuber. 2015. Highway networks. In Proceedings of the ICML 2015 Deep Learning Workshop.Google ScholarGoogle Scholar
  31. Jinsong Su, Shan Wu, Deyi Xiong, Yaojie Lu, Xianpei Han, and Biao Zhang. 2018. Variational recurrent neural machine translation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18).Google ScholarGoogle Scholar
  32. Zhaopeng Tu, Yang Liu, Zhengdong Lu, Xiaohua Liu, and Hang Li. 2017. Context gates for neural machine translation. Transactions of the Association for Computational Linguistics 5 (2017), 87--99. DOI:https://doi.org/10.1162/tacl_a_00048Google ScholarGoogle ScholarCross RefCross Ref
  33. Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, and Hang Li. 2016. Modeling coverage for neural machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’16). 76--85. DOI:https://doi.org/10.18653/v1/P16-1008Google ScholarGoogle ScholarCross RefCross Ref
  34. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 2017 Conference on Neural Information Processing Systems (NIPS’17). 5998--6008.Google ScholarGoogle Scholar
  35. Mingxuan Wang, Zhengdong Lu, Jie Zhou, and Qun Liu. 2017. Deep neural machine translation with linear associative unit. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’17). 136--145. DOI:https://doi.org/10.18653/v1/P17-1013Google ScholarGoogle ScholarCross RefCross Ref
  36. Xing Wang, Zhengdong Lu, Zhaopeng Tu, Hang Li, Deyi Xiong, and Min Zhang. 2017. Neural machine translation advised by statistical machine translation. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI’17). 3330--3336.Google ScholarGoogle Scholar
  37. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144.Google ScholarGoogle Scholar
  38. Hao Xiong, Zhongjun He, Xiaoguang Hu, and Hua Wu. 2018. Multi-channel encoder for neural machine translation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18).Google ScholarGoogle Scholar
  39. Kaisheng Yao, Trevor Cohn, Katerina Vylomova, Kevin Duh, and Chris Dyer. 2015. Depth-gated LSTM. arXiv:1508.03790v4.Google ScholarGoogle Scholar
  40. Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, Huanbo Luan, and Yang Liu. 2017. THUMT: An open source toolkit for neural machine translation. arXiv:1706.06415.Google ScholarGoogle Scholar
  41. Jie Zhou, Ying Cao, Xuguang Wang, Peng Li, and Wei Xu. 2016. Deep recurrent models with fast-forward connections for neural machine translation. Transactions of the Association for Computational Linguistics 4 (2016), 371--383. DOI:https://doi.org/10.1162/tacl_a_00105Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Improving Neural Machine Translation with Linear Interpolation of a Short-Path Unit

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 19, Issue 3
      May 2020
      228 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3378675
      Issue’s Table of Contents

      Copyright © 2020 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 February 2020
      • Revised: 1 December 2019
      • Accepted: 1 December 2019
      • Received: 1 April 2019
      Published in tallip Volume 19, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!