Abstract
Deep neural networks (DNNs) have provably enhanced the state-of-the-art natural language process (NLP) with their capability of feature learning and representation. As one of the more challenging NLP tasks, neural machine translation (NMT) becomes a new approach to machine translation and generates much more fluent results compared to statistical machine translation (SMT). However, SMT is usually better than NMT in translation adequacy and word coverage. It is therefore a promising direction to combine the advantages of both NMT and SMT. In this article, we propose a deep neural network--based system combination framework leveraging both minimum Bayes-risk decoding and multi-source NMT, which take as input the N-best outputs of NMT and SMT systems and produce the final translation. In particular, we apply the proposed model to both RNN and self-attention networks with different segmentation granularity. We verify our approach empirically through a series of experiments on resource-rich Chinese⇒English and low-resource English⇒Vietnamese translation tasks. Experimental results demonstrate the effectiveness and universality of our proposed approach, which significantly outperforms the conventional system combination methods and the best individual system output.
- Philip Arthur, Graham Neubig, and Satoshi Nakamura. 2016. Incorporating discrete translation lexicons into neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1557--1567. DOI:https://doi.org/10.18653/v1/D16-1162Google Scholar
Cross Ref
- Necip Fazil Ayan, Jing Zheng, and Wen Wang. 2008. Improving alignments for better confusion networks for combining machine translation systems. In Proceedings of the International Conference on Computational Linguistics (COLING’08).Google Scholar
Cross Ref
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations (ICLR’15).Google Scholar
- Srinivas Bangalore, German Bordel, and Giuseppe Richardi. 2001. Computing consensus translation from multiple machine translation systems. In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU’01).Google Scholar
Cross Ref
- Debajyoty Banik, Asif Ekbal, Pushpak Bhattacharyya, and Siddhartha Bhattacharyya. 2019. Assembling translations from multi-engine machine translation outputs. Appl. Soft Comput. 78 (2019), 230--239.Google Scholar
Digital Library
- Boxing Chen, Min Zhang, Haizhou Li, and Aiti Aw. 2009. A comparative study of hypothesis alignment and its improvement for machine translation system combination. In Proceedings of the Annual Conference of the Association for Computational Linguistics (ACL’09).Google Scholar
Cross Ref
- David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the Annual Conference of the Association for Computational Linguistics (ACL’05).Google Scholar
Digital Library
- Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14).Google Scholar
Cross Ref
- Yang Feng, Yang Liu, Haitao Mi, Qun Liu, and Yajuan Lu. 2009. Lattice-based system combination for statistical machine translation. In Proceedings of the Annual Conference of the Association for Computational Linguistics (ACL’09).Google Scholar
Cross Ref
- Orhan Firat, Kyunghyun Cho, and Yoshua Bengio. 2016. Multi-way, multilingual neural machine translation with a shared attention mechanism. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’16).Google Scholar
Cross Ref
- Markus Freitag, Matthias Huck, and Hermann Ney. 2014. Jane: Open source machine translation system combiantion. In Proceedings of the International Conference of the European Association of Chinese Linguistics (EACL’14).Google Scholar
Cross Ref
- Ekaterina Garmash and Christof Monz. 2016. Ensemble learning for multi-source neural machine translation. In Proceedings of the International Conference on Computational Linguistics (COLING’16).Google Scholar
- Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. 2017. Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research), Doina Precup and Yee Whye Teh (Eds.), Vol. 70. PMLR, International Convention Centre, Sydney, Australia, 1243--1252.Google Scholar
- Xinwei Geng, Xiaocheng Feng, Bing Qin, and Ting Liu. 2018. Adaptive multi-pass decoder for neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 523--532.Google Scholar
Cross Ref
- Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, et al. 2018. Achieving human parity on automatic Chinese to English news translation. arXiv preprint arXiv:1803.05567 (2018).Google Scholar
- Wei He, Zhongjun He, Hua Wu, and Haifeng Wang. 2016. Improved neural machine translation with SMT features. In Proceedings of the Annual Confernece on Artificial Intelligence (AAAI’16).Google Scholar
- Kenneth Heafield and Alon Lavie. 2010. Combining machine translation output with open source. In The Prague Bulletin of Machematical Linguistics.Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neur. Comput. 9, 8 (1997), 1735--1780.Google Scholar
Digital Library
- Hideki Isozaki, Tsutomu Hirao, Kevin Duh, Katsuhito Sudoh, and Hajime Tsukada. 2010. Automatic evaluation of translation quality for distant language pairs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’10).Google Scholar
- Marcin Junczys-Dowmunt, Tomasz Dwojak, and Hieu Hoang. 2016. Is neural machine translation ready for deployment? A case study on 30 translation directions. In Proceedings of the International Conference on Spoken Language Translation (IWSLT’16).Google Scholar
- Philipp Koehn and Rebecca Knowles. 2017. Six challenges for neural machine translation. In Proceedings of the 1st Workshop on Neural Machine Translation. Association for Computational Linguistics, 28--39. DOI:https://doi.org/10.18653/v1/W17-3204Google Scholar
Cross Ref
- Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the Association for Computational Linguistics Annual Conference of the North American Chapter of the Association for Computational Linguistics (ACL NAACL,13).Google Scholar
Cross Ref
- Shankar Kumar and William Byrne. 2004. Minimum bayes-risk decoding for statistical machine translation. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HLT-NAACL’04).Google Scholar
- Maoxi Li, Jiajun Zhang, Yu Zhou, and Chengqing Zong. 2009. The CASIA statistical machine translation system for IWSLT 2009. In Proceedings of the International Conference on Spoken Language Translation (IWSLT’09).Google Scholar
- Maoxi Li and Chengqing Zong. 2008. Word reordering alignment for combination of statistical machine translation systems. In Proceedings of the International Symposium on Chinese Spoken Language Processing.Google Scholar
Cross Ref
- Jindřich Libovický and Jindřich Helcl. 2017. Attention strategies for multi-source sequence-to-sequence learning. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 196--202. DOI:https://doi.org/10.18653/v1/P17-2031Google Scholar
Cross Ref
- Jindřich Libovický, Jindřich Helcl, and David Mareček. 2018. Input combination strategies for multi-source transformer decoder. In Proceedings of the T3rd Conference on Machine Translation: Research Papers. Association for Computational Linguistics, 253--260.Google Scholar
Cross Ref
- Yuchen Liu, Long Zhou, Yining Wang, Yang Zhao, Jiajun Zhang, and Chengqing Zong. 2018. A comparable study on model averaging, ensembling and reranking in NMT. In Natural Language Processing and Chinese Computing, Min Zhang, Vincent Ng, Dongyan Zhao, Sujian Li, and Hongying Zan (Eds.). Springer International Publishing, Cham, 299--308.Google Scholar
- Wei-Yun Ma and Kathleen Mckeown. 2015. System combination for machine translation through paraphrasing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’15).Google Scholar
Cross Ref
- Wolfgang Macherey and Franz Josef Och. 2007. An empirical study on computing consensus translations from multiple machine translation systems. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’07).Google Scholar
- Benjamin Marie and Atsushi Fujita. 2018. A smorgasbord of features to combine phrase-based and neural machine translation. In Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Papers). Association for Machine Translation in the Americas, Boston, MA, 111--124.Google Scholar
- Jan Niehues, Eunah Cho, Thanh-Le Ha, and Alex Waibel. 2016. Pre-translation for neural machine translation. In Proceedings of the International Conference on Computational Linguistics (COLING’16).Google Scholar
- Franz Och and Hermann Ney. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the Annual Conference of the Association for Computational Linguistics (ACL’02).Google Scholar
- Franz Josef Och and Hermann Ney. 2001. Statistical multi-source translation. In Proceedings of MT Summit.Google Scholar
- Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: A methof for automatic evaluation of machine translation. In Proceedings of the Annual Conference of the Association for Computational Linguistics (ACL’02).Google Scholar
- Matīss RIKTERS. 2019. Hybrid machine translation by combining output from multiple machine translation systems. Baltic J. Mod. Comput. 7, 3 (2019), 301--341.Google Scholar
- Antti-Veikko I. Rosti, Spyros Matsoukas, and Richard Schwartz. 2007. Improved word-level system combination for machine translation. In Proceedings of the Annual Conference of the Association for Computational Linguistics (ACL’07).Google Scholar
- Antti-Veikko I. Rosti, Bing Zhang, Spyros Matsoukas, and Richard Schwartz. 2008. Incremental hypothesis alignment for building confusion networks with appplication to machine translation systems combination. In Proceedings of the 3rd ACL Workshop on Statistical Machine Translation.Google Scholar
Cross Ref
- Rico Sennrich, Alexandra Birch, Anna Currey, Ulrich Germann, Barry Haddow, Kenneth Heafield, Antonio Valerio Miceli Barone, and Philip Williams. 2017. The University of Edinburgh’s neural MT systems for WMT17. In Proceedings of the 2nd Conference on Machine Translation. Association for Computational Linguistics, 389--399. DOI:https://doi.org/10.18653/v1/W17-4739Google Scholar
Cross Ref
- Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of Annual Conference of the Association for Computational Linguistics (ACL’16).Google Scholar
Cross Ref
- Ilya Sutskever, Oriol Vinyals, and Quoc VV Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of NIPS 2014.Google Scholar
Digital Library
- Roy Tromble, Shankar Kumar, Franz Och, and Wolfgang Macherey. 2008. Lattice minimum bayes-risk decoding for statistical machine translation. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 620--629.Google Scholar
Cross Ref
- Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, and Hang Li. 2016. Modeling coverage for neural machine translation. In Proceedings of the Annual Conference of the Association for Computational Linguistics (ACL’16).Google Scholar
Cross Ref
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, 5998--6008.Google Scholar
Digital Library
- Xing Wang, Zhengdong Lu, Zhaopeng Tu, Hang Li, Deyi Xiong, and Min Zhang. 2017. Neural machine translation advised by statistical machine translation. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.Google Scholar
- Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, and Mohammad Norouzi. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. In arXiv preprint arXix:1609.08144.Google Scholar
- Yingce Xia, Fei Tian, Lijun Wu, Jianxin Lin, Tao Qin, Nenghai Yu, and Tie-Yan Liu. 2017. Deliberation networks: Sequence generation beyond one-pass decoding. In Advances in Neural Information Processing Systems. 1784--1794.Google Scholar
- Kenji Yamada and Kevin Knight. 2001. A syntax-based statistical translation model. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Toulouse, France, 523--530. DOI:https://doi.org/10.3115/1073012.1073079Google Scholar
Digital Library
- Yang Zhao, Yining Wang, Jiajun Zhang, and Chengqing Zong. 2018. Phrase table as recommendation memory for neural machine translation. In Proceedings of the 27th International Joint Conference on Artificial Intelligence.Google Scholar
Digital Library
- Jie Zhou, Ying Cao, Xuguang Wang, Peng Li, and Wei Xu. 2016. Deep recurrent models with fast-forward connections for neural machine translation. arXiv preprint arXiv:1606.04199 (2016).Google Scholar
- Long Zhou, Wenpeng Hu, Jiajun Zhang, and Chengqing Zong. 2017. Neural system combination for machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Vancouver, Canada, 378--384. DOI:https://doi.org/10.18653/v1/P17-2060Google Scholar
Cross Ref
- Long Zhou, Jiajun Zhang, and Chengqing Zong. 2019. Synchronous bidirectional neural machine translation. Trans. Assoc. Comput. Ling. 7 (Mar. 2019), 91--105. DOI:https://doi.org/10.1162/tacl_a_00256Google Scholar
- Junguo Zhu, Muyun Yang, Sheng Li, and Tiejun Zhao. 2016. Sentence-level paraphrasing for machine translation system combination. In Proceedings of ICYCSEE 2016.Google Scholar
Cross Ref
- Barret Zoph and Kevin Knight. 2016. Multi-source neural translation. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’16).Google Scholar
Cross Ref
Index Terms
Deep Neural Network--based Machine Translation System Combination
Recommendations
Low Resource Neural Machine Translation: Assamese to/from Other Indo-Aryan (Indic) Languages
Machine translation (MT) systems have been built using numerous different techniques for bridging the language barriers. These techniques are broadly categorized into approaches like Statistical Machine Translation (SMT) and Neural Machine Translation (...
Improving neural machine translation through phrase-based soft forced decoding
AbstractCompared to traditional statistical machine translation (SMT), such as phrase-based machine translation (PBMT), neural machine translation (NMT) often sacrifices adequacy for the sake of fluency. We propose a method to combine the advantages of ...
Parallel Corpora Preparation for English-Amharic Machine Translation
Advances in Computational IntelligenceAbstractIn this paper, we describe the development of an English-Amharic parallel corpus and Machine Translation (MT) experiments conducted on it. Two different tests have been achieved. Statistical Machine Translation (SMT) and Neural Machine Translation ...






Comments