skip to main content
research-article

Efficient Low-Resource Neural Machine Translation with Reread and Feedback Mechanism

Authors Info & Claims
Published:09 January 2020Publication History
Skip Abstract Section

Abstract

How to utilize information sufficiently is a key problem in neural machine translation (NMT), which is effectively improved in rich-resource NMT by leveraging large-scale bilingual sentence pairs. However, for low-resource NMT, lack of bilingual sentence pairs results in poor translation performance; therefore, taking full advantage of global information in the encoding-decoding process is effective for low-resource NMT. In this article, we propose a novel reread-feedback NMT architecture (RFNMT) for using global information. Our architecture builds upon the improved sequence-to-sequence neural network and consists of a double-deck attention-based encoder-decoder framework. In our proposed architecture, the information generated by the first-pass encoding and decoding process flows to the second-pass encoding process for more sufficient parameters initialization and information use. Specifically, we first propose a “reread” mechanism to transfer the outputs of the first-pass encoder to the second-pass encoder, and then the output is used for the initialization of the second-pass encoder. Second, we propose a “feedback” mechanism that transfers the first-pass decoder’s outputs to a second-pass encoder via an important weight model and an improved gated recurrent unit (GRU). Experiments on multiple datasets show that our approach achieves significant improvements over state-of-the-art NMT systems, especially in low-resource settings.

References

  1. Mikel Artetxe, Gorka Labaka, Eneko Agirre, and Kyunghyun Cho. 2017. Unsupervised Neural Machine Translation. arxiv:cs.CL/1710.11041Google ScholarGoogle Scholar
  2. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. arxiv:cs.CL/1409.0473Google ScholarGoogle Scholar
  3. Franck Burlot and François Yvon. 2019. Using Monolingual Data in Neural Machine Translation: A Systematic Study. arxiv:cs.CL/1903.11437Google ScholarGoogle Scholar
  4. Rajen Chatterjee, José de Souza, Matteo Negri, and Marco Turchi. 2016. The FBK participation in the WMT 2016 automatic post-editing shared task. In Proceedings of the First Conference on Machine Translation, Volume 2: Shared Task Papers. 745--750.Google ScholarGoogle ScholarCross RefCross Ref
  5. Yun Chen, Yang Liu, and Victor O. K. Li. 2018. Zero-Resource Neural Machine Translation with Multi-Agent Communication Game. arxiv:cs.CL/1802.03116Google ScholarGoogle Scholar
  6. Kyunghyun Cho, Bart Van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation. Association for Computational Linguistics, Doha, Qatar, 103--111. https://doi.org/10.3115/v1/W14-4012Google ScholarGoogle ScholarCross RefCross Ref
  7. Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). DOI:https://doi.org/10.3115/v1/d14-1179Google ScholarGoogle ScholarCross RefCross Ref
  8. Marta R. Costa-Jussà, Noé Casas, Carlos Escolano, and José A. R. Fonollosa. 2019. Chinese-catalan: A neural machine translation approach based on pivoting and attention mechanisms. ACM Transactions on Asian and Low-Resource Language Information Processing 18, 4, Article 43 (April 2019), 8 pages. DOI:https://doi.org/10.1145/3312575Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Marzieh Fadaee, Arianna Bisazza, and Christof Monz. 2017. Data augmentation for low-resource neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Vancouver, Canada, 567--573. https://doi.org/10.18653/v1/P17-2090Google ScholarGoogle ScholarCross RefCross Ref
  10. Orhan Firat, Kyunghyun Cho, and Yoshua Bengio. 2016. Multi-way, multilingual neural machine translation with a shared attention mechanism. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, San Diego, California, 866--875. https://doi.org/10.18653/v1/N16-1101Google ScholarGoogle ScholarCross RefCross Ref
  11. Nicolas Ford, Daniel Duckworth, Mohammad Norouzi, and George E. Dahl. 2018. The Importance of Generation Order in Language Modeling. arxiv:cs.LG/1808.07910Google ScholarGoogle Scholar
  12. Jiatao Gu, Hany Hassan, Jacob Devlin, and Victor O. K. Li. 2018. Universal neural machine translation for extremely low resource languages. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). DOI:https://doi.org/10.18653/v1/n18-1032Google ScholarGoogle Scholar
  13. Caglar Gulcehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loic Barrault, Huei Chi Lin, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2015. On Using Monolingual Corpora in Neural Machine Translation. arXiv:cs.CL/1503.03535.Google ScholarGoogle Scholar
  14. Sébastien Jean, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio. 2015. On Using Very Large Target Vocabulary for Neural Machine Translation. arXiv:cs.CL/1412.2007.Google ScholarGoogle Scholar
  15. Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arxiv:cs.LG/1412.6980Google ScholarGoogle Scholar
  16. Guillaume Lample and Alexis Conneau. 2019. Cross-Lingual Language Model Pretraining. arxiv:cs.CL/1901.07291Google ScholarGoogle Scholar
  17. Guillaume Lample, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2017. Unsupervised Machine Translation Using Monolingual Corpora Only. arXiv:cs.CL/1711.00043.Google ScholarGoogle Scholar
  18. Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2018. Phrase-Based and Neural Unsupervised Machine Translation. arxiv:cs.CL/1804.07755Google ScholarGoogle Scholar
  19. Hideki Nakayama and Noriki Nishida. 2017. Zero-resource machine translation by multimodal encoder-decoder network with multimedia pivot. Machine Translation 31, 1 (2017), 49--64. https://doi.org/10.1007/s10590-017-9197-zGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jan Niehues, Eunah Cho, Thanh Le Ha, and Alex Waibel. 2016. Pre-translation for neural machine translation. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. The COLING 2016 Organizing Committee, Osaka, Japan, 1828--1836. https://www.aclweb.org/anthology/C16-1172.Google ScholarGoogle Scholar
  21. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL’02). Association for Computational Linguistics, Stroudsburg, PA, 311--318. DOI:https://doi.org/10.3115/1073083.1073135Google ScholarGoogle Scholar
  22. Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 11 (Nov 1997), 2673--2681. https://doi.org/10.1109/78.650093Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Rico Sennrich, Barry Haddow, and Alexandra Birch. 2015. Improving Neural Machine Translation Models with Monolingual Data. arxiv:cs.CL/1511.06709Google ScholarGoogle Scholar
  24. Matthew Snover, Nitin Madnani, Bonnie J. Dorr, and Richard Schwartz. 2009. Fluency, adequacy, or HTER?: Exploring different human judgments with a tunable MT metric. In Proceedings of the 4th Workshop on Statistical Machine Translation (StatMT’09). Association for Computational Linguistics, Stroudsburg, PA, 259--268. Retrieved from http://dl.acm.org/citation.cfm?id=1626431.1626480.Google ScholarGoogle ScholarCross RefCross Ref
  25. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. arXiv:cs.CL/1409.3215.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Zhaopeng Tu, Zhengdong Lu, Liu Yang, Xiaohua Liu, and Li Hang. 2016. Modeling coverage for neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 76--85. https://doi.org/10.18653/v1/P16-1008Google ScholarGoogle ScholarCross RefCross Ref
  27. Yingce Xia, Fei Tian, Lijun Wu, Jianxin Lin, Tao Qin, Nenghai Yu, and Tie-Yan Liu. 2017. Deliberation networks: Sequence generation beyond one-pass decoding. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 1784--1794. Retrieved from http://papers.nips.cc/paper/6775-deliberation-networks-sequence-generation-beyond-one-pass-decoding.pdf.Google ScholarGoogle Scholar
  28. Zhilin Yang, Ye Yuan, Yuexin Wu, Ruslan Salakhutdinov, and William W. Cohen. 2016. Review Networks for Caption Generation. arxiv:cs.LG/1605.07912Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Cheng Yong, Xu Wei, Zhongjun He, He Wei, Wu Hua, Maosong Sun, and Liu Yang. 2016. Semi-Supervised Learning for Neural Machine Translation. arXiv:cs.CL/1606.04596.Google ScholarGoogle Scholar
  30. Cheng Yong, Liu Yang, Yang Qian, Maosong Sun, and Xu Wei. 2016. Neural Machine Translation with Pivot Languages. arXiv:cs.CL/1611.04928.Google ScholarGoogle Scholar
  31. Chen Yun, Liu Yang, Cheng Yong, and Victor O. K. Li. 2017. A teacher-student framework for zero-resource neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, 1925--1935. https://doi.org/10.18653/v1/P17-1176Google ScholarGoogle Scholar
  32. Wenyuan Zeng, Wenjie Luo, Sanja Fidler, and Raquel Urtasun. 2016. Efficient summarization with read-again and copy mechanism. arXiv:cs.CL/1611.03382.Google ScholarGoogle Scholar
  33. Jiajun Zhang and Chengqing Zong. 2016. Exploiting source-side monolingual data in neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, 1535--1545. https://doi.org/10.18653/v1/D16-1160Google ScholarGoogle ScholarCross RefCross Ref
  34. Yang Zhen, Chen Wei, Wang Feng, and Xu Bo. 2018. Unsupervised neural machine translation with weight sharing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 46--55. https://doi.org/10.18653/v1/P18-1005Google ScholarGoogle Scholar
  35. Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. 2016. Transfer learning for low-resource neural machine translation. In Proceeding of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas. 1568--1575. https://doi.org/10.18653/v1/D16-1163Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Efficient Low-Resource Neural Machine Translation with Reread and Feedback Mechanism

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Asian and Low-Resource Language Information Processing
          ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 19, Issue 3
          May 2020
          228 pages
          ISSN:2375-4699
          EISSN:2375-4702
          DOI:10.1145/3378675
          Issue’s Table of Contents

          Copyright © 2020 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 9 January 2020
          • Accepted: 1 September 2019
          • Revised: 1 July 2019
          • Received: 1 May 2019
          Published in tallip Volume 19, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!