skip to main content
short-paper
Open Access

Layer-Wise De-Training and Re-Training for ConvS2S Machine Translation

Published:01 November 2019Publication History
Skip Abstract Section

Abstract

The convolutional sequence-to-sequence (ConvS2S) machine translation system is one of the typical neural machine translation (NMT) systems. Training the ConvS2S model tends to get stuck in a local optimum in our pre-studies. To overcome this inferior behavior, we propose to de-train a trained ConvS2S model in a mild way and retrain to find a better solution globally. In particular, the trained parameters of one layer of the NMT network are abandoned by re-initialization while other layers’ parameters are kept at the same time to kick off re-optimization from a new start point and safeguard the new start point not too far from the previous optimum. This procedure is executed layer by layer until all layers of the ConvS2S model are explored. Experiments show that when compared to various measures for escaping from the local optimum, including initialization with random seeds, adding perturbations to the baseline parameters, and continuing training (con-training) with the baseline models, our method consistently improves the ConvS2S translation quality across various language pairs and achieves better performance.

References

  1. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations.Google ScholarGoogle Scholar
  2. Y. Bengio, P. Simard, and P. Frasconi. 2002. Learning long-term dependencies with gradient descent is difficult.IEEE Transactions on Neural Networks 5, 2 (2002), 157--166.Google ScholarGoogle Scholar
  3. Ondrej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Shujian Huang, Matthias Huck, Philipp Koehn, Qun Liu, and Varvara Logacheva. 2017. Findings of the 2017 conference on machine translation (WMT17). In Proceedings of the Conference on Machine Translation. 169--214.Google ScholarGoogle Scholar
  4. James Bradbury, Stephen Merity, Caiming Xiong, and Richard Socher. 2016. Quasi-recurrent neural networks. arXiv:1611.01576.Google ScholarGoogle Scholar
  5. Michael Denkowski and Graham Neubig. 2017. Stronger baselines for trustable results in neural machine translation. arXiv:1706.09733.Google ScholarGoogle Scholar
  6. Jonas Gehring, Michael Auli, David Grangier, and Yann N. Dauphin. 2017. A convolutional encoder model for neural machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’17).Google ScholarGoogle Scholar
  7. Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. 2017. Convolutional sequence to sequence learning. arXiv:1705.03122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  9. Sabastien Jean, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio. 2015. On using very large target vocabulary for neural machine translation. arXiv:1412.2007.Google ScholarGoogle Scholar
  10. Ian Jolliffe. 2011. Principal component analysis. In International Encyclopedia of Statistical Science. Springer, 1094--1096.Google ScholarGoogle Scholar
  11. Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron Van Den Oord, Alex Graves, and Koray Kavukcuoglu. 2016. Neural machine translation in linear time. arXiv:1610.10099.Google ScholarGoogle Scholar
  12. Rajiv Khanna, Joydeep Ghosh, Russell Poldrack, and Oluwasanmi Koyejo. 2017. A deflation method for structured probabilistic PCA. In Proceedings of the 2017 SIAM International Conference on Data Mining. 534--542.Google ScholarGoogle ScholarCross RefCross Ref
  13. Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods on Natural Language Processing (EMNLP’15).Google ScholarGoogle Scholar
  14. Fandong Meng, Zhengdong Lu, Mingxuan Wang, Hang Li, Wenbin Jiang, and Qun Liu. 2015. Encoding source language with convolutional neural network for machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’15).Google ScholarGoogle ScholarCross RefCross Ref
  15. Kishore Papineni, Salim Roukos, Todd Ward, and Wei Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’02). 311--318.Google ScholarGoogle Scholar
  16. Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In Proceedings of the International Conference on Machine Learning. 1310--1318.Google ScholarGoogle Scholar
  17. Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS’17).Google ScholarGoogle Scholar
  18. Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’16).Google ScholarGoogle ScholarCross RefCross Ref
  19. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1 (2014), 1929--1958.Google ScholarGoogle Scholar
  20. Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. 2013. On the importance of initialization and momentum in deep learning. In Proceedings of the International Conference on Machine Learning. 1139--1147.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS’14).Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv:1706.03762.Google ScholarGoogle Scholar
  23. Biao Zhang, Deyi Xiong, and Jinsong Su. 2018. Accelerating neural transformer via an average attention network. arXiv:1805.00631.Google ScholarGoogle Scholar

Index Terms

  1. Layer-Wise De-Training and Re-Training for ConvS2S Machine Translation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 19, Issue 2
      March 2020
      301 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3358605
      Issue’s Table of Contents

      Copyright © 2019 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 November 2019
      • Revised: 1 August 2019
      • Accepted: 1 August 2019
      • Received: 1 December 2018
      Published in tallip Volume 19, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper
      • Research
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)84
      • Downloads (Last 6 weeks)16

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!