skip to main content
research-article
Open Access

Iterative Training of Unsupervised Neural and Statistical Machine Translation Systems

Authors Info & Claims
Published:01 June 2020Publication History
Skip Abstract Section

Abstract

Recent work achieved remarkable results in training neural machine translation (NMT) systems in a fully unsupervised way, with new and dedicated architectures that only rely on monolingual corpora. However, previous work also showed that unsupervised statistical machine translation (USMT) performs better than unsupervised NMT (UNMT), especially for distant language pairs. To take advantage of the superiority of USMT over UNMT, and considering that SMT suffers from well-known limitations overcome by NMT, we propose to define UNMT as NMT trained with the supervision of synthetic parallel data generated by USMT. This way we can exploit USMT up to its limits while ultimately relying on full-fledged NMT models to generate translations. We show significant improvements in translation quality over previous work and also that further improvements can be obtained by alternatively and iteratively training USMT and UNMT. Without the need of a dedicated architecture for UNMT, our simple approach can straightforwardly benefit from any recent and future advances in supervised NMT. Our systems achieve a new state-of-the-art for unsupervised machine translation in all of our six translation tasks for five diverse language pairs, surpassing even supervised SMT or NMT in some tasks. Furthermore, our analysis shows how crucial the comparability between the monolingual corpora used for unsupervised training is in improving translation quality.

References

  1. Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2018. A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 789–798. https://aclweb.org/anthology/P18-1073.Google ScholarGoogle ScholarCross RefCross Ref
  2. Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2018. Unsupervised statistical machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 3632–3642. https://aclweb.org/anthology/D18-1399.Google ScholarGoogle ScholarCross RefCross Ref
  3. Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2019. An effective approach to unsupervised machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 194–203. https://aclweb.org/anthology/P19-1019.Google ScholarGoogle ScholarCross RefCross Ref
  4. Mikel Artetxe, Gorka Labaka, Eneko Agirre, and Kyunghyun Cho. 2018. Unsupervised neural machine translation. In Proceedings of the 6th International Conference on Learning Representations. 12. https://openreview.net/forum?id=Sy2ogebAW.Google ScholarGoogle Scholar
  5. Loïc Barrault, Ondřej Bojar, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, Shervin Malmasi, Christof Monz, Mathias Müller, Santanu Pal, Matt Post, and Marcos Zampieri. 2019. Findings of the 2019 Conference on Machine Translation (WMT19). In Proceedings of the 4th Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1). Association for Computational Linguistics, 1–61. https://aclweb.org/anthology/W19-5301.Google ScholarGoogle Scholar
  6. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5 (2017), 135–146. https://aclweb.org/anthology/Q17-1010.Google ScholarGoogle ScholarCross RefCross Ref
  7. Ondřej Bojar, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Philipp Koehn, and Christof Monz. 2018. Findings of the 2018 conference on machine translation (WMT18). In Proceedings of the 3rd Conference on Machine Translation: Shared Task Papers. Association for Computational Linguistics, 272–303. https://aclweb.org/anthology/W18-6401.Google ScholarGoogle Scholar
  8. Yong Cheng, Qian Yang, Yang Liu, Maosong Sun, and Wei Xu. 2017. Joint training for pivot-based neural machine translation. In Proceedings of the 26th International Joint Conferences on Artificial Intelligence. International Joint Conferences on Artificial Intelligence, 3974–3980. https://www.ijcai.org/proceedings/2017/0555.pdf.Google ScholarGoogle ScholarCross RefCross Ref
  9. Colin Cherry and George Foster. 2012. Batch tuning strategies for statistical machine translation. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 427–436. https://aclweb.org/anthology/N12-1047.Google ScholarGoogle Scholar
  10. Chenchen Ding, Masao Utiyama, and Eiichiro Sumita. 2015. Improving fast_align by reordering. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1034–1039. https://aclweb.org/anthology/D15-1119.Google ScholarGoogle ScholarCross RefCross Ref
  11. Chris Dyer, Victor Chahuneau, and Noah A. Smith. 2013. A simple, fast, and effective reparameterization of IBM Model 2. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 644–648. https://aclweb.org/anthology/N13-1073.Google ScholarGoogle Scholar
  12. Sergey Edunov, Myle Ott, Michael Auli, and David Grangier. 2018. Understanding back-translation at scale. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 489–500. https://aclweb.org/anthology/D18-1045.Google ScholarGoogle ScholarCross RefCross Ref
  13. Isao Goto, Ka Po Chow, Bin Lu, Eiichiro Sumita, and Benjamin K. Tsou. 2013. Overview of the Patent Machine Translation Task at the NTCIR Workshop. In Proceedings of the 10th NTCIR Conference on Evaluation of Information Access Technologies. 260–286. http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings9/NTCIR/01-NTCIR9-PATENTMT-GotoI.pdf.Google ScholarGoogle Scholar
  14. Kenneth Heafield, Ivan Pouzyrevsky, Jonathan H. Clark, and Philipp Koehn. 2013. Scalable modified Kneser-Ney language model estimation. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 690–696. https://aclweb.org/anthology/P13-2121.Google ScholarGoogle Scholar
  15. Liang Huang and David Chiang. 2007. Forest rescoring: Faster decoding with integrated language models. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Association for Computational Linguistics, 144–151. https://aclweb.org/anthology/P07-1019.Google ScholarGoogle Scholar
  16. Howard Johnson, Joel Martin, George Foster, and Roland Kuhn. 2007. Improving translation quality by discarding most of the phrasetable. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Association for Computational Linguistics, 967–975. https://aclweb.org/anthology/D07-1103.Google ScholarGoogle Scholar
  17. Marcin Junczys-Dowmunt, Roman Grundkiewicz, Tomasz Dwojak, Hieu Hoang, Kenneth Heafield, Tom Neckermann, Frank Seide, Ulrich Germann, Alham Fikri Aji, Nikolay Bogoychev, André F. T. Martins, and Alexandra Birch. 2018. Marian: Fast neural machine translation in C++. In Proceedings of ACL 2018, System Demonstrations. Association for Computational Linguistics, 116–121. https://aclweb.org/anthology/P18-4020.Google ScholarGoogle ScholarCross RefCross Ref
  18. Tomoyuki Kajiwara and Mamoru Komachi. 2016. Building a monolingual parallel corpus for text simplification using sentence similarity based on alignment between word embeddings. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. The COLING 2016 Organizing Committee, 1147–1158. https://aclweb.org/anthology/C16-1109.Google ScholarGoogle Scholar
  19. Alexandre Klementiev, Ann Irvine, Chris Callison-Burch, and David Yarowsky. 2012. Toward statistical machine translation without parallel corpora. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 130–140. https://aclweb.org/anthology/E12-1014.Google ScholarGoogle Scholar
  20. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. Association for Computational Linguistics, 177–180. https://aclweb.org/anthology/P07-2045.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Patrik Lambert, Holger Schwenk, Christophe Servan, and Sadaf Abdul-Rauf. 2011. Investigations on translation model adaptation using monolingual data. In Proceedings of the 6th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 284–293. https://aclweb.org/anthology/W11-2132.Google ScholarGoogle Scholar
  22. Guillaume Lample and Alexis Conneau. 2019. Cross-lingual language model pretraining. CoRR abs/1901.07291 (2019), 10. http://arxiv.org/abs/1901.07291.Google ScholarGoogle Scholar
  23. Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2018. Unsupervised machine translation using monolingual corpora only. In Proceedings of the 6th International Conference on Learning Representations. 14. https://openreview.net/forum?id=rkYTTf-AZ.Google ScholarGoogle Scholar
  24. Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2018. Phrase-based 8 neural unsupervised machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 5039–5049. https://aclweb.org/anthology/D18-1549.Google ScholarGoogle ScholarCross RefCross Ref
  25. Benjamin Marie and Atsushi Fujita. 2018. Phrase table induction using monolingual data for low-resource statistical machine translation. ACM Transactions on Asian and Low-Resource Language Information Processing 17, 3, Article 16 (Feb. 2018), 25 pages. http://doi.acm.org/10.1145/3168054Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Benjamin Marie and Atsushi Fujita. 2018. Unsupervised neural machine translation initialized by unsupervised statistical machine translation. CoRR abs/1810.12703 (2018), 13. http://arxiv.org/abs/1810.12703.Google ScholarGoogle Scholar
  27. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 3111–3119. https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Franz Josef Och and Hermann Ney. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 295–302. https://aclweb.org/anthology/P02-1038.Google ScholarGoogle Scholar
  29. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 311–318. https://aclweb.org/anthology/P02-1040.Google ScholarGoogle Scholar
  30. Matt Post. 2018. A call for clarity in reporting BLEU scores. In Proceedings of the 3rd Conference on Machine Translation: Research Papers. Association for Computational Linguistics, 186–191. https://aclweb.org/anthology/W18-6319.Google ScholarGoogle ScholarCross RefCross Ref
  31. Shuo Ren, Zhirui Zhang, Shujie Liu, Ming Zhou, and Shuai Ma. 2019. Unsupervised neural machine translation with SMT as posterior regularization. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Association for the Advancement of Artificial Intelligence, 241–248. https://aaai.org/ojs/index.php/AAAI/article/view/3791.Google ScholarGoogle ScholarCross RefCross Ref
  32. Hammam Riza, Michael Purwoadi, Gunarso, Teduh Uliniansyah, Aw Ai Ti, Sharifah Mahani Aljunied, Luong Chi Mai, Vu Tat Thang, Nguyen Phuong Thai, Vichet Chea, Rapid Sun, Sethserey Sam, Sopheap Seng, Khin Mar Soe, K hin Thandar Nwet, Masao Utiyama, and Chenchen Ding. 2016. Introduction of the Asian Language Treebank. In Proceedings of the 2016 Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Technique (O-COCOSDA). 1–6. https://ieeexplore.ieee.org/document/7918974.Google ScholarGoogle Scholar
  33. Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong, and Francisco Guzmán. 2019. WikiMatrix: Mining 135M parallel sentences in 1620 language pairs from Wikipedia. CoRR abs/1907.05791 (2019), 13. https://arxiv.org/abs/1907.05791.Google ScholarGoogle Scholar
  34. Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Improving neural machine translation models with monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 86–96. https://aclweb.org/anthology/P16-1009.Google ScholarGoogle ScholarCross RefCross Ref
  35. Samuel L. Smith, David H. P. Turban, Steven Hamblin, and Nils Y. Hammerla. 2017. Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In Proceedings of the 5th International Conference on Learning Representations. 10. https://openreview.net/forum?id=r1Aab85gg.Google ScholarGoogle Scholar
  36. Anders Søgaard, Sebastian Ruder, and Ivan Vulić. 2018. On the limitations of unsupervised bilingual dictionary induction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 778–788. https://aclweb.org/anthology/P18-1072.Google ScholarGoogle ScholarCross RefCross Ref
  37. Yangqiu Song and Dan Roth. 2015. Unsupervised sparse vector densification for short text similarity. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1275–1280. https://aclweb.org/anthology/N15-1138.Google ScholarGoogle ScholarCross RefCross Ref
  38. Nicola Ueffing, Gholamreza Haffari, and Anoop Sarkar. 2007. Transductive learning for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Association for Computational Linguistics, 25–32. https://aclweb.org/anthology/P07-1004.Google ScholarGoogle Scholar
  39. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30. Curran Associates, Inc., 5998–6008. https://papers.nips.cc/paper/7181-attention-is-all-you-need.Google ScholarGoogle Scholar
  40. Zhen Yang, Wei Chen, Feng Wang, and Bo Xu. 2018. Unsupervised neural machine translation with weight sharing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 46–55. https://aclweb.org/anthology/P18-1005.Google ScholarGoogle ScholarCross RefCross Ref
  41. Richard Zens, Daisy Stanton, and Peng Xu. 2012. A systematic comparison of phrase table pruning techniques. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 972–983. https://aclweb.org/anthology/D12-1089.Google ScholarGoogle Scholar
  42. Biao Zhang, Deyi Xiong, and Jinsong Su. 2018. Accelerating neural transformer via an average attention network. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1789–1798. https://aclweb.org/anthology/P18-1166.Google ScholarGoogle ScholarCross RefCross Ref
  43. Kai Zhao, Hany Hassan, and Michael Auli. 2015. Learning translation models from monolingual continuous representations. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1527–1536. https://aclweb.org/anthology/N15-1176.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Iterative Training of Unsupervised Neural and Statistical Machine Translation Systems

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!