skip to main content
research-article

Deep Neural Network--based Machine Translation System Combination

Authors Info & Claims
Published:04 August 2020Publication History
Skip Abstract Section

Abstract

Deep neural networks (DNNs) have provably enhanced the state-of-the-art natural language process (NLP) with their capability of feature learning and representation. As one of the more challenging NLP tasks, neural machine translation (NMT) becomes a new approach to machine translation and generates much more fluent results compared to statistical machine translation (SMT). However, SMT is usually better than NMT in translation adequacy and word coverage. It is therefore a promising direction to combine the advantages of both NMT and SMT. In this article, we propose a deep neural network--based system combination framework leveraging both minimum Bayes-risk decoding and multi-source NMT, which take as input the N-best outputs of NMT and SMT systems and produce the final translation. In particular, we apply the proposed model to both RNN and self-attention networks with different segmentation granularity. We verify our approach empirically through a series of experiments on resource-rich Chinese⇒English and low-resource English⇒Vietnamese translation tasks. Experimental results demonstrate the effectiveness and universality of our proposed approach, which significantly outperforms the conventional system combination methods and the best individual system output.

References

  1. Philip Arthur, Graham Neubig, and Satoshi Nakamura. 2016. Incorporating discrete translation lexicons into neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1557--1567. DOI:https://doi.org/10.18653/v1/D16-1162Google ScholarGoogle ScholarCross RefCross Ref
  2. Necip Fazil Ayan, Jing Zheng, and Wen Wang. 2008. Improving alignments for better confusion networks for combining machine translation systems. In Proceedings of the International Conference on Computational Linguistics (COLING’08).Google ScholarGoogle ScholarCross RefCross Ref
  3. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations (ICLR’15).Google ScholarGoogle Scholar
  4. Srinivas Bangalore, German Bordel, and Giuseppe Richardi. 2001. Computing consensus translation from multiple machine translation systems. In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU’01).Google ScholarGoogle ScholarCross RefCross Ref
  5. Debajyoty Banik, Asif Ekbal, Pushpak Bhattacharyya, and Siddhartha Bhattacharyya. 2019. Assembling translations from multi-engine machine translation outputs. Appl. Soft Comput. 78 (2019), 230--239.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Boxing Chen, Min Zhang, Haizhou Li, and Aiti Aw. 2009. A comparative study of hypothesis alignment and its improvement for machine translation system combination. In Proceedings of the Annual Conference of the Association for Computational Linguistics (ACL’09).Google ScholarGoogle ScholarCross RefCross Ref
  7. David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the Annual Conference of the Association for Computational Linguistics (ACL’05).Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14).Google ScholarGoogle ScholarCross RefCross Ref
  9. Yang Feng, Yang Liu, Haitao Mi, Qun Liu, and Yajuan Lu. 2009. Lattice-based system combination for statistical machine translation. In Proceedings of the Annual Conference of the Association for Computational Linguistics (ACL’09).Google ScholarGoogle ScholarCross RefCross Ref
  10. Orhan Firat, Kyunghyun Cho, and Yoshua Bengio. 2016. Multi-way, multilingual neural machine translation with a shared attention mechanism. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’16).Google ScholarGoogle ScholarCross RefCross Ref
  11. Markus Freitag, Matthias Huck, and Hermann Ney. 2014. Jane: Open source machine translation system combiantion. In Proceedings of the International Conference of the European Association of Chinese Linguistics (EACL’14).Google ScholarGoogle ScholarCross RefCross Ref
  12. Ekaterina Garmash and Christof Monz. 2016. Ensemble learning for multi-source neural machine translation. In Proceedings of the International Conference on Computational Linguistics (COLING’16).Google ScholarGoogle Scholar
  13. Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. 2017. Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research), Doina Precup and Yee Whye Teh (Eds.), Vol. 70. PMLR, International Convention Centre, Sydney, Australia, 1243--1252.Google ScholarGoogle Scholar
  14. Xinwei Geng, Xiaocheng Feng, Bing Qin, and Ting Liu. 2018. Adaptive multi-pass decoder for neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 523--532.Google ScholarGoogle ScholarCross RefCross Ref
  15. Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, et al. 2018. Achieving human parity on automatic Chinese to English news translation. arXiv preprint arXiv:1803.05567 (2018).Google ScholarGoogle Scholar
  16. Wei He, Zhongjun He, Hua Wu, and Haifeng Wang. 2016. Improved neural machine translation with SMT features. In Proceedings of the Annual Confernece on Artificial Intelligence (AAAI’16).Google ScholarGoogle Scholar
  17. Kenneth Heafield and Alon Lavie. 2010. Combining machine translation output with open source. In The Prague Bulletin of Machematical Linguistics.Google ScholarGoogle Scholar
  18. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neur. Comput. 9, 8 (1997), 1735--1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hideki Isozaki, Tsutomu Hirao, Kevin Duh, Katsuhito Sudoh, and Hajime Tsukada. 2010. Automatic evaluation of translation quality for distant language pairs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’10).Google ScholarGoogle Scholar
  20. Marcin Junczys-Dowmunt, Tomasz Dwojak, and Hieu Hoang. 2016. Is neural machine translation ready for deployment? A case study on 30 translation directions. In Proceedings of the International Conference on Spoken Language Translation (IWSLT’16).Google ScholarGoogle Scholar
  21. Philipp Koehn and Rebecca Knowles. 2017. Six challenges for neural machine translation. In Proceedings of the 1st Workshop on Neural Machine Translation. Association for Computational Linguistics, 28--39. DOI:https://doi.org/10.18653/v1/W17-3204Google ScholarGoogle ScholarCross RefCross Ref
  22. Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the Association for Computational Linguistics Annual Conference of the North American Chapter of the Association for Computational Linguistics (ACL NAACL,13).Google ScholarGoogle ScholarCross RefCross Ref
  23. Shankar Kumar and William Byrne. 2004. Minimum bayes-risk decoding for statistical machine translation. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HLT-NAACL’04).Google ScholarGoogle Scholar
  24. Maoxi Li, Jiajun Zhang, Yu Zhou, and Chengqing Zong. 2009. The CASIA statistical machine translation system for IWSLT 2009. In Proceedings of the International Conference on Spoken Language Translation (IWSLT’09).Google ScholarGoogle Scholar
  25. Maoxi Li and Chengqing Zong. 2008. Word reordering alignment for combination of statistical machine translation systems. In Proceedings of the International Symposium on Chinese Spoken Language Processing.Google ScholarGoogle ScholarCross RefCross Ref
  26. Jindřich Libovický and Jindřich Helcl. 2017. Attention strategies for multi-source sequence-to-sequence learning. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 196--202. DOI:https://doi.org/10.18653/v1/P17-2031Google ScholarGoogle ScholarCross RefCross Ref
  27. Jindřich Libovický, Jindřich Helcl, and David Mareček. 2018. Input combination strategies for multi-source transformer decoder. In Proceedings of the T3rd Conference on Machine Translation: Research Papers. Association for Computational Linguistics, 253--260.Google ScholarGoogle ScholarCross RefCross Ref
  28. Yuchen Liu, Long Zhou, Yining Wang, Yang Zhao, Jiajun Zhang, and Chengqing Zong. 2018. A comparable study on model averaging, ensembling and reranking in NMT. In Natural Language Processing and Chinese Computing, Min Zhang, Vincent Ng, Dongyan Zhao, Sujian Li, and Hongying Zan (Eds.). Springer International Publishing, Cham, 299--308.Google ScholarGoogle Scholar
  29. Wei-Yun Ma and Kathleen Mckeown. 2015. System combination for machine translation through paraphrasing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’15).Google ScholarGoogle ScholarCross RefCross Ref
  30. Wolfgang Macherey and Franz Josef Och. 2007. An empirical study on computing consensus translations from multiple machine translation systems. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’07).Google ScholarGoogle Scholar
  31. Benjamin Marie and Atsushi Fujita. 2018. A smorgasbord of features to combine phrase-based and neural machine translation. In Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Papers). Association for Machine Translation in the Americas, Boston, MA, 111--124.Google ScholarGoogle Scholar
  32. Jan Niehues, Eunah Cho, Thanh-Le Ha, and Alex Waibel. 2016. Pre-translation for neural machine translation. In Proceedings of the International Conference on Computational Linguistics (COLING’16).Google ScholarGoogle Scholar
  33. Franz Och and Hermann Ney. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the Annual Conference of the Association for Computational Linguistics (ACL’02).Google ScholarGoogle Scholar
  34. Franz Josef Och and Hermann Ney. 2001. Statistical multi-source translation. In Proceedings of MT Summit.Google ScholarGoogle Scholar
  35. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: A methof for automatic evaluation of machine translation. In Proceedings of the Annual Conference of the Association for Computational Linguistics (ACL’02).Google ScholarGoogle Scholar
  36. Matīss RIKTERS. 2019. Hybrid machine translation by combining output from multiple machine translation systems. Baltic J. Mod. Comput. 7, 3 (2019), 301--341.Google ScholarGoogle Scholar
  37. Antti-Veikko I. Rosti, Spyros Matsoukas, and Richard Schwartz. 2007. Improved word-level system combination for machine translation. In Proceedings of the Annual Conference of the Association for Computational Linguistics (ACL’07).Google ScholarGoogle Scholar
  38. Antti-Veikko I. Rosti, Bing Zhang, Spyros Matsoukas, and Richard Schwartz. 2008. Incremental hypothesis alignment for building confusion networks with appplication to machine translation systems combination. In Proceedings of the 3rd ACL Workshop on Statistical Machine Translation.Google ScholarGoogle ScholarCross RefCross Ref
  39. Rico Sennrich, Alexandra Birch, Anna Currey, Ulrich Germann, Barry Haddow, Kenneth Heafield, Antonio Valerio Miceli Barone, and Philip Williams. 2017. The University of Edinburgh’s neural MT systems for WMT17. In Proceedings of the 2nd Conference on Machine Translation. Association for Computational Linguistics, 389--399. DOI:https://doi.org/10.18653/v1/W17-4739Google ScholarGoogle ScholarCross RefCross Ref
  40. Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of Annual Conference of the Association for Computational Linguistics (ACL’16).Google ScholarGoogle ScholarCross RefCross Ref
  41. Ilya Sutskever, Oriol Vinyals, and Quoc VV Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of NIPS 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Roy Tromble, Shankar Kumar, Franz Och, and Wolfgang Macherey. 2008. Lattice minimum bayes-risk decoding for statistical machine translation. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 620--629.Google ScholarGoogle ScholarCross RefCross Ref
  43. Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, and Hang Li. 2016. Modeling coverage for neural machine translation. In Proceedings of the Annual Conference of the Association for Computational Linguistics (ACL’16).Google ScholarGoogle ScholarCross RefCross Ref
  44. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, 5998--6008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Xing Wang, Zhengdong Lu, Zhaopeng Tu, Hang Li, Deyi Xiong, and Min Zhang. 2017. Neural machine translation advised by statistical machine translation. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  46. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, and Mohammad Norouzi. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. In arXiv preprint arXix:1609.08144.Google ScholarGoogle Scholar
  47. Yingce Xia, Fei Tian, Lijun Wu, Jianxin Lin, Tao Qin, Nenghai Yu, and Tie-Yan Liu. 2017. Deliberation networks: Sequence generation beyond one-pass decoding. In Advances in Neural Information Processing Systems. 1784--1794.Google ScholarGoogle Scholar
  48. Kenji Yamada and Kevin Knight. 2001. A syntax-based statistical translation model. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Toulouse, France, 523--530. DOI:https://doi.org/10.3115/1073012.1073079Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Yang Zhao, Yining Wang, Jiajun Zhang, and Chengqing Zong. 2018. Phrase table as recommendation memory for neural machine translation. In Proceedings of the 27th International Joint Conference on Artificial Intelligence.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Jie Zhou, Ying Cao, Xuguang Wang, Peng Li, and Wei Xu. 2016. Deep recurrent models with fast-forward connections for neural machine translation. arXiv preprint arXiv:1606.04199 (2016).Google ScholarGoogle Scholar
  51. Long Zhou, Wenpeng Hu, Jiajun Zhang, and Chengqing Zong. 2017. Neural system combination for machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Vancouver, Canada, 378--384. DOI:https://doi.org/10.18653/v1/P17-2060Google ScholarGoogle ScholarCross RefCross Ref
  52. Long Zhou, Jiajun Zhang, and Chengqing Zong. 2019. Synchronous bidirectional neural machine translation. Trans. Assoc. Comput. Ling. 7 (Mar. 2019), 91--105. DOI:https://doi.org/10.1162/tacl_a_00256Google ScholarGoogle Scholar
  53. Junguo Zhu, Muyun Yang, Sheng Li, and Tiejun Zhao. 2016. Sentence-level paraphrasing for machine translation system combination. In Proceedings of ICYCSEE 2016.Google ScholarGoogle ScholarCross RefCross Ref
  54. Barret Zoph and Kevin Knight. 2016. Multi-source neural translation. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’16).Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Deep Neural Network--based Machine Translation System Combination

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 19, Issue 5
      September 2020
      278 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3403646
      Issue’s Table of Contents

      Copyright © 2020 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 August 2020
      • Online AM: 7 May 2020
      • Accepted: 1 March 2020
      • Revised: 1 January 2020
      • Received: 1 August 2019
      Published in tallip Volume 19, Issue 5

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!