Abstract
Recent work achieved remarkable results in training neural machine translation (NMT) systems in a fully unsupervised way, with new and dedicated architectures that only rely on monolingual corpora. However, previous work also showed that unsupervised statistical machine translation (USMT) performs better than unsupervised NMT (UNMT), especially for distant language pairs. To take advantage of the superiority of USMT over UNMT, and considering that SMT suffers from well-known limitations overcome by NMT, we propose to define UNMT as NMT trained with the supervision of synthetic parallel data generated by USMT. This way we can exploit USMT up to its limits while ultimately relying on full-fledged NMT models to generate translations. We show significant improvements in translation quality over previous work and also that further improvements can be obtained by alternatively and iteratively training USMT and UNMT. Without the need of a dedicated architecture for UNMT, our simple approach can straightforwardly benefit from any recent and future advances in supervised NMT. Our systems achieve a new state-of-the-art for unsupervised machine translation in all of our six translation tasks for five diverse language pairs, surpassing even supervised SMT or NMT in some tasks. Furthermore, our analysis shows how crucial the comparability between the monolingual corpora used for unsupervised training is in improving translation quality.
- Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2018. A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 789–798. https://aclweb.org/anthology/P18-1073.Google Scholar
Cross Ref
- Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2018. Unsupervised statistical machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 3632–3642. https://aclweb.org/anthology/D18-1399.Google Scholar
Cross Ref
- Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2019. An effective approach to unsupervised machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 194–203. https://aclweb.org/anthology/P19-1019.Google Scholar
Cross Ref
- Mikel Artetxe, Gorka Labaka, Eneko Agirre, and Kyunghyun Cho. 2018. Unsupervised neural machine translation. In Proceedings of the 6th International Conference on Learning Representations. 12. https://openreview.net/forum?id=Sy2ogebAW.Google Scholar
- Loïc Barrault, Ondřej Bojar, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, Shervin Malmasi, Christof Monz, Mathias Müller, Santanu Pal, Matt Post, and Marcos Zampieri. 2019. Findings of the 2019 Conference on Machine Translation (WMT19). In Proceedings of the 4th Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1). Association for Computational Linguistics, 1–61. https://aclweb.org/anthology/W19-5301.Google Scholar
- Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5 (2017), 135–146. https://aclweb.org/anthology/Q17-1010.Google Scholar
Cross Ref
- Ondřej Bojar, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Philipp Koehn, and Christof Monz. 2018. Findings of the 2018 conference on machine translation (WMT18). In Proceedings of the 3rd Conference on Machine Translation: Shared Task Papers. Association for Computational Linguistics, 272–303. https://aclweb.org/anthology/W18-6401.Google Scholar
- Yong Cheng, Qian Yang, Yang Liu, Maosong Sun, and Wei Xu. 2017. Joint training for pivot-based neural machine translation. In Proceedings of the 26th International Joint Conferences on Artificial Intelligence. International Joint Conferences on Artificial Intelligence, 3974–3980. https://www.ijcai.org/proceedings/2017/0555.pdf.Google Scholar
Cross Ref
- Colin Cherry and George Foster. 2012. Batch tuning strategies for statistical machine translation. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 427–436. https://aclweb.org/anthology/N12-1047.Google Scholar
- Chenchen Ding, Masao Utiyama, and Eiichiro Sumita. 2015. Improving fast_align by reordering. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1034–1039. https://aclweb.org/anthology/D15-1119.Google Scholar
Cross Ref
- Chris Dyer, Victor Chahuneau, and Noah A. Smith. 2013. A simple, fast, and effective reparameterization of IBM Model 2. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 644–648. https://aclweb.org/anthology/N13-1073.Google Scholar
- Sergey Edunov, Myle Ott, Michael Auli, and David Grangier. 2018. Understanding back-translation at scale. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 489–500. https://aclweb.org/anthology/D18-1045.Google Scholar
Cross Ref
- Isao Goto, Ka Po Chow, Bin Lu, Eiichiro Sumita, and Benjamin K. Tsou. 2013. Overview of the Patent Machine Translation Task at the NTCIR Workshop. In Proceedings of the 10th NTCIR Conference on Evaluation of Information Access Technologies. 260–286. http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings9/NTCIR/01-NTCIR9-PATENTMT-GotoI.pdf.Google Scholar
- Kenneth Heafield, Ivan Pouzyrevsky, Jonathan H. Clark, and Philipp Koehn. 2013. Scalable modified Kneser-Ney language model estimation. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 690–696. https://aclweb.org/anthology/P13-2121.Google Scholar
- Liang Huang and David Chiang. 2007. Forest rescoring: Faster decoding with integrated language models. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Association for Computational Linguistics, 144–151. https://aclweb.org/anthology/P07-1019.Google Scholar
- Howard Johnson, Joel Martin, George Foster, and Roland Kuhn. 2007. Improving translation quality by discarding most of the phrasetable. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Association for Computational Linguistics, 967–975. https://aclweb.org/anthology/D07-1103.Google Scholar
- Marcin Junczys-Dowmunt, Roman Grundkiewicz, Tomasz Dwojak, Hieu Hoang, Kenneth Heafield, Tom Neckermann, Frank Seide, Ulrich Germann, Alham Fikri Aji, Nikolay Bogoychev, André F. T. Martins, and Alexandra Birch. 2018. Marian: Fast neural machine translation in C++. In Proceedings of ACL 2018, System Demonstrations. Association for Computational Linguistics, 116–121. https://aclweb.org/anthology/P18-4020.Google Scholar
Cross Ref
- Tomoyuki Kajiwara and Mamoru Komachi. 2016. Building a monolingual parallel corpus for text simplification using sentence similarity based on alignment between word embeddings. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. The COLING 2016 Organizing Committee, 1147–1158. https://aclweb.org/anthology/C16-1109.Google Scholar
- Alexandre Klementiev, Ann Irvine, Chris Callison-Burch, and David Yarowsky. 2012. Toward statistical machine translation without parallel corpora. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 130–140. https://aclweb.org/anthology/E12-1014.Google Scholar
- Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. Association for Computational Linguistics, 177–180. https://aclweb.org/anthology/P07-2045.Google Scholar
Digital Library
- Patrik Lambert, Holger Schwenk, Christophe Servan, and Sadaf Abdul-Rauf. 2011. Investigations on translation model adaptation using monolingual data. In Proceedings of the 6th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 284–293. https://aclweb.org/anthology/W11-2132.Google Scholar
- Guillaume Lample and Alexis Conneau. 2019. Cross-lingual language model pretraining. CoRR abs/1901.07291 (2019), 10. http://arxiv.org/abs/1901.07291.Google Scholar
- Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2018. Unsupervised machine translation using monolingual corpora only. In Proceedings of the 6th International Conference on Learning Representations. 14. https://openreview.net/forum?id=rkYTTf-AZ.Google Scholar
- Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2018. Phrase-based 8 neural unsupervised machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 5039–5049. https://aclweb.org/anthology/D18-1549.Google Scholar
Cross Ref
- Benjamin Marie and Atsushi Fujita. 2018. Phrase table induction using monolingual data for low-resource statistical machine translation. ACM Transactions on Asian and Low-Resource Language Information Processing 17, 3, Article 16 (Feb. 2018), 25 pages. http://doi.acm.org/10.1145/3168054Google Scholar
Digital Library
- Benjamin Marie and Atsushi Fujita. 2018. Unsupervised neural machine translation initialized by unsupervised statistical machine translation. CoRR abs/1810.12703 (2018), 13. http://arxiv.org/abs/1810.12703.Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 3111–3119. https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.Google Scholar
Digital Library
- Franz Josef Och and Hermann Ney. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 295–302. https://aclweb.org/anthology/P02-1038.Google Scholar
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 311–318. https://aclweb.org/anthology/P02-1040.Google Scholar
- Matt Post. 2018. A call for clarity in reporting BLEU scores. In Proceedings of the 3rd Conference on Machine Translation: Research Papers. Association for Computational Linguistics, 186–191. https://aclweb.org/anthology/W18-6319.Google Scholar
Cross Ref
- Shuo Ren, Zhirui Zhang, Shujie Liu, Ming Zhou, and Shuai Ma. 2019. Unsupervised neural machine translation with SMT as posterior regularization. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Association for the Advancement of Artificial Intelligence, 241–248. https://aaai.org/ojs/index.php/AAAI/article/view/3791.Google Scholar
Cross Ref
- Hammam Riza, Michael Purwoadi, Gunarso, Teduh Uliniansyah, Aw Ai Ti, Sharifah Mahani Aljunied, Luong Chi Mai, Vu Tat Thang, Nguyen Phuong Thai, Vichet Chea, Rapid Sun, Sethserey Sam, Sopheap Seng, Khin Mar Soe, K hin Thandar Nwet, Masao Utiyama, and Chenchen Ding. 2016. Introduction of the Asian Language Treebank. In Proceedings of the 2016 Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Technique (O-COCOSDA). 1–6. https://ieeexplore.ieee.org/document/7918974.Google Scholar
- Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong, and Francisco Guzmán. 2019. WikiMatrix: Mining 135M parallel sentences in 1620 language pairs from Wikipedia. CoRR abs/1907.05791 (2019), 13. https://arxiv.org/abs/1907.05791.Google Scholar
- Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Improving neural machine translation models with monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 86–96. https://aclweb.org/anthology/P16-1009.Google Scholar
Cross Ref
- Samuel L. Smith, David H. P. Turban, Steven Hamblin, and Nils Y. Hammerla. 2017. Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In Proceedings of the 5th International Conference on Learning Representations. 10. https://openreview.net/forum?id=r1Aab85gg.Google Scholar
- Anders Søgaard, Sebastian Ruder, and Ivan Vulić. 2018. On the limitations of unsupervised bilingual dictionary induction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 778–788. https://aclweb.org/anthology/P18-1072.Google Scholar
Cross Ref
- Yangqiu Song and Dan Roth. 2015. Unsupervised sparse vector densification for short text similarity. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1275–1280. https://aclweb.org/anthology/N15-1138.Google Scholar
Cross Ref
- Nicola Ueffing, Gholamreza Haffari, and Anoop Sarkar. 2007. Transductive learning for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Association for Computational Linguistics, 25–32. https://aclweb.org/anthology/P07-1004.Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30. Curran Associates, Inc., 5998–6008. https://papers.nips.cc/paper/7181-attention-is-all-you-need.Google Scholar
- Zhen Yang, Wei Chen, Feng Wang, and Bo Xu. 2018. Unsupervised neural machine translation with weight sharing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 46–55. https://aclweb.org/anthology/P18-1005.Google Scholar
Cross Ref
- Richard Zens, Daisy Stanton, and Peng Xu. 2012. A systematic comparison of phrase table pruning techniques. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 972–983. https://aclweb.org/anthology/D12-1089.Google Scholar
- Biao Zhang, Deyi Xiong, and Jinsong Su. 2018. Accelerating neural transformer via an average attention network. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1789–1798. https://aclweb.org/anthology/P18-1166.Google Scholar
Cross Ref
- Kai Zhao, Hany Hassan, and Michael Auli. 2015. Learning translation models from monolingual continuous representations. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1527–1536. https://aclweb.org/anthology/N15-1176.Google Scholar
Cross Ref
Index Terms
Iterative Training of Unsupervised Neural and Statistical Machine Translation Systems
Recommendations
Phrase Table Induction Using Monolingual Data for Low-Resource Statistical Machine Translation
We propose a new method for inducing a phrase-based translation model from a pair of unrelated monolingual corpora. Our method is able to deal with phrases of arbitrary length and to find phrase pairs that are useful for statistical machine translation, ...
A Neural Network Classifier Based on Dependency Tree for English-Vietnamese Statistical Machine Translation
Computational Linguistics and Intelligent Text ProcessingAbstractReordering in MT is a major challenge when translating between languages with different of sentence structures. In Phrase-based statistical machine translation (PBSMT) systems, syntactic pre-ordering is a commonly used pre-processing technique. ...
Post-Ordering by Parsing with ITG for Japanese-English Statistical Machine Translation
Word reordering is a difficult task for translation between languages with widely different word orders, such as Japanese and English. A previously proposed post-ordering method for Japanese-to-English translation first translates a Japanese sentence ...






Comments