Abstract
In the present study, we propose novel sequence-to-sequence pre-training objectives for low-resource machine translation (NMT): Japanese-specific sequence to sequence (JASS) for language pairs involving Japanese as the source or target language, and English-specific sequence to sequence (ENSS) for language pairs involving English. JASS focuses on masking and reordering Japanese linguistic units known as bunsetsu, whereas ENSS is proposed based on phrase structure masking and reordering tasks. Experiments on ASPEC Japanese–English & Japanese–Chinese, Wikipedia Japanese–Chinese, News English–Korean corpora demonstrate that JASS and ENSS outperform MASS and other existing language-agnostic pre-training methods by up to +2.9 BLEU points for the Japanese–English tasks, up to +7.0 BLEU points for the Japanese–Chinese tasks and up to +1.3 BLEU points for English–Korean tasks. Empirical analysis, which focuses on the relationship between individual parts in JASS and ENSS, reveals the complementary nature of the subtasks of JASS and ENSS. Adequacy evaluation using LASER, human evaluation, and case studies reveals that our proposed methods significantly outperform pre-training methods without injected linguistic knowledge and they have a larger positive impact on the adequacy as compared to the fluency.
- [1] . 2020. Optimizing transformer for low-resource neural machine translation. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Barcelona, Spain (Online), 3429–3435.Google Scholar
Cross Ref
- [2] . 2019. Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Transactions of the Association for Computational Linguistics 7 (
March 2019), 597–610.Google ScholarCross Ref
- [3] . 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR). (San Diego, CA).Google Scholar
- [4] . 2014. Constructing a Chinese—Japanese parallel corpus from Wikipedia. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14). European Language Resources Association (ELRA), Reykjavik, Iceland, 642–647.Google Scholar
- [5] . 2016. Integrated parallel sentence and fragment extraction from comparable corpora: A case study on Chinese-Japanese Wikipedia. ACM Trans. Asian Low Resour. Lang. Inf. Process. 15, 2 (2016), 10:1–10:22. Google Scholar
Digital Library
- [6] . 2019. Cross-lingual language model pretraining. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, (NeurIPS 2019), (December 8-14, 2019, Vancouver, BC, Canada). 7057–7067. Google Scholar
Digital Library
- [7] . 2019. Exploiting multilingualism through multistage fine-tuning for low-resource neural machine translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 1410–1416.Google Scholar
Cross Ref
- [8] . 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186.Google Scholar
- [9] . 2015. Multi-task learning for multiple language translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Beijing, China, 1723–1732.Google Scholar
Cross Ref
- [10] . 2018. Understanding back-translation at scale. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 489–500.Google Scholar
Cross Ref
- [11] . 2012. Head finalization reordering for Chinese-to-Japanese machine translation. In Proceedings of the 6thWorkshop on Syntax, Semantics and Structure in Statistical Translation. Association for Computational Linguistics, Jeju, Republic of Korea, 57–66. Google Scholar
Digital Library
- [12] . 2018. Iterative back-translation for neural machine translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation. Association for Computational Linguistics, Melbourne, Australia, 18–24.Google Scholar
Cross Ref
- [13] . 2013. Two-stage pre-ordering for Japanese-to-English statistical machine translation. In Proceedings of the 6th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, Nagoya, Japan, 1062–1066.Google Scholar
- [14] . 2010. Head finalization: A simple reordering rule for SOV languages. In Proceedings of the Joint 5th Workshop on Statistical Machine Translation and Metrics (MATR). Association for Computational Linguistics, Uppsala, Sweden, 244–251. Google Scholar
Digital Library
- [15] . 2017. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics 5 (2017), 339–351.Google Scholar
Cross Ref
- [16] . 2020. SpanBERT: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics 8 (2020), 64–77.Google Scholar
Cross Ref
- [17] . 2017. OpenNMT: Open-source toolkit for neural machine translation. In Proceedings of ACL 2017, System Demonstrations. Association for Computational Linguistics, Vancouver, Canada, 67–72.Google Scholar
Cross Ref
- [18] . 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Barcelona, Spain, 388–395.Google Scholar
- [19] . 2006. Phrase reordering for statistical machine translation based on predicate-argument structure. In Proceedings of the 2006 International Workshop on Spoken Language Translation (IWSLT 2006) (Keihanna Science City, Kyoto, Japan, November 27-28, 2006). 77–82.Google Scholar
- [20] . 1994. Improvements of Japanese morphological analyzer JUMAN. In Proceedings of the International Workshop on Sharable Natural Language Resources. 22–28.Google Scholar
- [21] . 2020. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 7871–7880.Google Scholar
Cross Ref
- [22] . 2020. Pre-training multilingual neural machine translation by leveraging alignment information. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 2649–2663.Google Scholar
Cross Ref
- [23] . 2020. Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics 8 (2020), 726–742.Google Scholar
Cross Ref
- [24] . 2020. JASS: Japanese-specific sequence to sequence pre-training for neural machine translation. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, 3683–3691.Google Scholar
- [25] . 2018. Mixed precision training. In Conference Track Proceedings of the 6th International Conference on Learning Representations (ICLR 2018) (Vancouver, BC, Canada, April 30 - May 3, 2018).Google Scholar
- [26] . 2015. Morphological analysis for unsegmented languages using recurrent neural network language model. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Lisbon, Portugal, 2292–2297.Google Scholar
Cross Ref
- [27] . 2019. Addressing word-order divergence in multilingual neural machine translation for extremely low resource languages. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 3868–3873.Google Scholar
Cross Ref
- [28] . 2015. Overview of the 2nd workshop on Asian translation. In Proceedings of the 2nd Workshop on Asian Translation (WAT2015). Workshop on Asian Translation, Kyoto, Japan, 1–28.Google Scholar
- [29] . 2018. Overview of the 5th workshop on Asian translation. In Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation. Association for Computational Linguistics, Hong Kong.Google Scholar
- [30] . 2016. ASPEC: Asian scientific paper excerpt corpus. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16). European Language Resources Association (ELRA), Portorož, Slovenia, 2204–2208.Google Scholar
- [31] . 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Philadelphia, PA, 311–318. Google Scholar
Digital Library
- [32] . 2016. Korean language resources for everyone. In Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation: Oral Papers. Seoul, South Korea, 49–58.Google Scholar
- [33] . 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, LA, 2227–2237.Google Scholar
Cross Ref
- [34] . 1988. Information-Based Syntax and Semantics: Vol. 1: Fundamentals. Center for the Study of Language and Information. Google Scholar
Digital Library
- [35] . 1994. Head-Driven Phrase Structure Grammar. The University of Chicago Press, Chicago, IL.Google Scholar
- [36] . 2020. ProphetNet: Predicting future N-gram for sequence-to-sequence pre-training. In Findings of the Association for Computational Linguistics (EMNLP 2020). Association for Computational Linguistics, Online, 2401–2410.Google Scholar
- [37] . 2018. When and why are pre-trained word embeddings useful for neural machine translation? In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, New Orleans, LA, 529–535.Google Scholar
Cross Ref
- [38] Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf.Google Scholar
- [39] . 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21 (2020), 140:1–140:67.Google Scholar
- [40] . 2019. Explicit cross-lingual pre-training for unsupervised machine translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 770–779.Google Scholar
Cross Ref
- [41] . 2016. Linguistic input features improve neural machine translation. In Proceedings of the 1st Conference on Machine Translation: Volume 1, Research Papers. Association for Computational Linguistics, Berlin, Germany, 83–91.Google Scholar
Cross Ref
- [42] . 2016. Improving neural machine translation models with monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 86–96.Google Scholar
Cross Ref
- [43] . 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 1715–1725.Google Scholar
Cross Ref
- [44] . 2019. Revisiting low-resource neural machine translation: A case study. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 211–221.Google Scholar
Cross Ref
- [45] . 2020. Leveraging monolingual data with self-supervision for multilingual neural machine translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 2827–2835.Google Scholar
Cross Ref
- [46] . 2020. Pre-training via leveraging assisting languages for neural machine translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Association for Computational Linguistics, Online, 279–285.Google Scholar
Cross Ref
- [47] . 2019. MASS: Masked sequence to sequence pre-training for language generation. In Proceedings of the 36th International Conference on Machine Learning (ICML 2019) (9-15 June 2019, Long Beach, CA). 5926–5936.Google Scholar
- [48] . 2020. ERNIE 2.0: A continual pre-training framework for language understanding. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI 2020), The 32nd Innovative Applications of Artificial Intelligence Conference (IAAI 2020), The 10th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI 2020) (New York, NY, February 7-12, 2020). 8968–8975.Google Scholar
- [49] . 2014. Sequence-to-sequence learning with neural networks. In Proceedings of the 27th Neural Information Processing Systems Conference (NIPS). Montréal, Canada, 3104–3112. Google Scholar
Digital Library
- [50] . 2017. Attention is all you need. In Proceedings of the 30th Neural Information Processing Systems Conference (NIPS) (Long Beach, CA), 5998–6008. Google Scholar
Digital Library
- [51] . 2019. Denoising based sequence-to-sequence pre-training for text generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 4003–4015.Google Scholar
Cross Ref
- [52] . 2020. Multi-task learning for multilingual neural machine translation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 1022–1034.Google Scholar
Cross Ref
- [53] . 2019. XLNet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019 (NeurIPS 2019) (December 8-14, 2019, Vancouver, BC, Canada). 5754–5764. Google Scholar
Digital Library
- [54] . 2020. CSP: Code-switching pre-training for neural machine translation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 2624–2636.Google Scholar
Cross Ref
- [55] . 2019. Quality estimation and translation metrics via pre-trained word and sentence embeddings. In Proceedings of the 4th Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2). Association for Computational Linguistics, Florence, Italy, 101–105.Google Scholar
Cross Ref
- [56] . 2016. Exploiting source-side monolingual data in neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, 1535–1545.Google Scholar
Cross Ref
- [57] . 2020. Semantics-aware BERT for language understanding. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI 2020), the 32nd Innovative Applications of Artificial Intelligence Conference (IAAI 2020), the 10th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI 2020). (New York, NY, February 7-12, 2020). 9628–9635.Google Scholar
- [58] . 2019. Handling syntactic divergence in low-resource machine translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 1388–1394.Google Scholar
Cross Ref
- [59] . 2020. LIMIT-BERT: Linguistics informed multi-task BERT. In Proceedings of the Findings of the Association for Computational Linguistics (EMNLP 2020). Association for Computational Linguistics, Online, 4450–4461.Google Scholar
Cross Ref
- [60] . 2016. Transfer learning for low-resource neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, 1568–1575.Google Scholar
Cross Ref
Index Terms
Linguistically Driven Multi-Task Pre-Training for Low-Resource Neural Machine Translation
Recommendations
Character-Aware Low-Resource Neural Machine Translation with Weight Sharing and Pre-training
Chinese Computational LinguisticsAbstractNeural Machine Translation (NMT) has recently achieved the state-of-the-art in many machine translation tasks, but one of the challenges that NMT faces is the lack of parallel corpora, especially for low-resource language pairs. And the result is ...
Loanword Identification in Low-Resource Languages with Minimal Supervision
Bilingual resources play a very important role in many natural language processing tasks, especially the tasks in cross-lingual scenarios. However, it is expensive and time consuming to build such resources. Lexical borrowing happens in almost every ...






Comments