skip to main content
research-article

Improving Data Augmentation for Low-Resource NMT Guided by POS-Tagging and Paraphrase Embedding

Authors Info & Claims
Published:12 August 2021Publication History
Skip Abstract Section

Abstract

Data augmentation is an approach for several text generation tasks. Generally, in the machine translation paradigm, mainly in low-resource language scenarios, many data augmentation methods have been proposed. The most used approaches for generating pseudo data mainly lay in word omission, random sampling, or replacing some words in the text. However, previous methods barely guarantee the quality of augmented data. In this work, we try to build the data by using paraphrase embedding and POS-Tagging. Namely, we generate the fake monolingual corpus by replacing the main four POS-Tagging labels, such as noun, adjective, adverb, and verb, based on both the paraphrase table and their similarity. We select the bigger corpus size of the paraphrase table with word level and obtain the word embedding of each word in the table, then calculate the cosine similarity between these words and tagged words in the original sequence. In addition, we exploit the ranking algorithm to choose highly similar words to reduce semantic errors and leverage the POS-Tagging replacement to mitigate syntactic error to some extent. Experimental results show that our augmentation method consistently outperforms all previous SOTA methods on the low-resource language pairs in seven language pairs from four corpora by 1.16 to 2.39 BLEU points.

References

  1. M. Artetxe, Gorka Labaka, Eneko Agirre, and Kyunghyun Cho. 2018. Unsupervised neural machine translation. In Proceedings of ICLR.Google ScholarGoogle Scholar
  2. Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. In Proceedings of NIPS.Google ScholarGoogle Scholar
  3. Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of ICLR.Google ScholarGoogle Scholar
  4. S. Banerjee and A. Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of [email protected].Google ScholarGoogle Scholar
  5. Steven Bird. 2006. NLTK: The natural language toolkit. In Proceedings of COLING/ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19, 2 (1993), 263–311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Yun Chen, Yang Liu, Yong Cheng, and Victor O. K. Li. 2017. A teacher-student framework for zero-resource neural machine translation. In Proceedings of ACL.Google ScholarGoogle Scholar
  8. Yun Chen, Yang Liu, and Victor O. K. Li. 2018. Zero-resource neural machine translation with multi-agent communication game. arXiv:1802.03116.Google ScholarGoogle Scholar
  9. Yong Cheng, Wei Xu, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. 2016. Semi-supervised learning for neural machine translation. In Proceedings of ACL.Google ScholarGoogle ScholarCross RefCross Ref
  10. Yong Cheng, Qian Yang, Yang Liu, Maosong Sun, and Wei Xu. 2017. Joint training for pivot-based neural machine translation. In Proceedings of IJCAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of EMNLP.Google ScholarGoogle ScholarCross RefCross Ref
  13. Chenhui Chu, Raj Dabre, and Sadao Kurohashi. 2017. An empirical comparison of simple domain adaptation methods for neural machine translation. arXiv:1701.03214.Google ScholarGoogle Scholar
  14. E. Cubuk, Barret Zoph, Dandelion Mané, V. Vasudevan, and Quoc V. Le. 2019. AutoAugment: Learning augmentation strategies from data. In Proceedings of CVPR.113–123.Google ScholarGoogle Scholar
  15. J. Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceeding of NAACL-HLT.Google ScholarGoogle Scholar
  16. Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and Haifeng Wang. 2015. Multi-task learning for multiple language translation. In Proceedings of ACL.Google ScholarGoogle ScholarCross RefCross Ref
  17. Marzieh Fadaee, Arianna Bisazza, and Christof Monz. 2017. Data augmentation for low-resource neural machine translation. In Proceedings of ACL.Google ScholarGoogle ScholarCross RefCross Ref
  18. Orhan Firat, Kyunghyun Cho, and Yoshua Bengio. 2016a. Multi-way, multilingual neural machine translation with a shared attention mechanism. In Proceedings of NAACL.Google ScholarGoogle ScholarCross RefCross Ref
  19. Orhan Firat, Baskaran Sankaran, Yaser Al-Onaizan, Fatos T. Yarman Vural, and Kyunghyun Cho. 2016b. Zero-resource translation with multi-lingual neural machine translation. In Proceedings of EMNLP.Google ScholarGoogle ScholarCross RefCross Ref
  20. Fei Gao, Jinhua Zhu, Lijun Wu, Yingce Xia, Tao Qin, Xueqi Cheng, Wengang Zhou, and Tie-Yan Liu. 2019. Soft contextual data augmentation for neural machine translation. In Proceedings of ACL. 5539–5544.Google ScholarGoogle ScholarCross RefCross Ref
  21. Silin Gao, Yichi Zhang, Zhijian Ou, and Zhou Yu. 2020. Paraphrase augmented task-oriented dialog generation. In Proceedings of ACL.Google ScholarGoogle ScholarCross RefCross Ref
  22. Jiatao Gu, Yong Wang, Yun Chen, Kyunghyun Cho, and V. Li. 2018. Meta-learning for low-resource neural machine translation. In Proceedings of EMNLP.Google ScholarGoogle Scholar
  23. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of CVPR. 770–778.Google ScholarGoogle ScholarCross RefCross Ref
  24. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Mohit Iyyer, V. Manjunatha, Jordan L. Boyd-Graber, and Hal Daumé. 2015. Deep unordered composition rivals syntactic methods for text classification. In Proceedings of ACL.Google ScholarGoogle ScholarCross RefCross Ref
  26. Marcin Junczys-Dowmunt, Tomasz Dwojak, and Hieu Hoang. 2016. Is neural machine translation ready for deployment? A case study on 30 translation directions. arXiv:1610.01108v2.Google ScholarGoogle Scholar
  27. Pei Ke, Fei Huang, Minlie Huang, and Xiaoyan Zhu. 2019. ARAML: A stable adversarial training framework for text generation. In Proceedings of EMNLP/IJCNLP.Google ScholarGoogle ScholarCross RefCross Ref
  28. S. Kobayashi. 2017. Contextual augmentation: Data augmentation by words with paradigmatic relations. In Proceedings of NAACL.Google ScholarGoogle Scholar
  29. Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of NAACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Guillaume Lample, Ludovic Denoyer, and Marc'Aurelio Ranzato. 2018. Unsupervised machine translation using monolingual corpora only. In Proceedings of ICLR.Google ScholarGoogle Scholar
  31. Zhongguo Li and Maosong Sun. 2009. Punctuation as implicit annotations for Chinese word segmentation. Computational Linguistics 35 (2009), 505–512. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74–81. https://www.aclweb.org/anthology/W04-1013.Google ScholarGoogle Scholar
  33. Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, and Wojciech Zaremba. 2015. Addressing the rare word problem in neural machine translation. In Proceedings of ACL.Google ScholarGoogle ScholarCross RefCross Ref
  34. M. Maimaiti, Y. Liu, Huanbo Luan, and M. Sun. 2019. Multi-round transfer learning for low-resource NMT using multiple high-resource languages. ACM Transactions on Asian and Low-Resource Language Information Processing 18 (2019), Article 38, 26 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Maimaiti Mieradilijiang and Zou Xiaohui. 2018. Discussion on bilingual cognition in international exchange activities. In Proceedings of ICIS.Google ScholarGoogle Scholar
  36. Tomas Mikolov, Kai Chen, G. S. Corrado, and J. Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781.Google ScholarGoogle Scholar
  37. Mohammad Norouzi, S. Bengio, Z. Chen, Navdeep Jaitly, Mike Schuster, Y. Wu, and Dale Schuurmans. 2016. Reward augmented maximum likelihood for neural structured prediction. In Proceedings of NIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. A. Radford, Jeffrey Wu, R. Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.Google ScholarGoogle Scholar
  40. Rico Sennrich, B. Haddow, and Alexandra Birch. 2016a. Improving neural machine translation models with monolingual data. In Proceedings of ACL.Google ScholarGoogle ScholarCross RefCross Ref
  41. Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016b. Neural machine translation of rare words with subword units. In Proceedings of ACL.Google ScholarGoogle ScholarCross RefCross Ref
  42. Rico Sennrich and Biao Zhang. 2019. Revisiting low-resource neural machine translation: A case study. In Proceedings of ACL.Google ScholarGoogle ScholarCross RefCross Ref
  43. Matthew Snover, Bonnie J. Dorr, R. Schwartz, and L. Micciulla. 2006. A study of translation edit rate with targeted human annotation. In AMTA.Google ScholarGoogle Scholar
  44. Jinsong Su, Shan Wu, Biao Zhang, Changxing Wu, Yue Qin, and Deyi Xiong. 2018. A neural generative autoencoder for bilingual word embeddings. Information Sciences 424 (2018), 287–300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Jinsong Su, Jiali Zeng, John Xie, H. Wen, Yongjing Yin, and Y. Liu. 2021. Exploring discriminative word-level domain contexts for multi-domain neural machine translation.IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 5 (2021), 1530–1545.Google ScholarGoogle Scholar
  46. Amane Sugiyama and Naoki Yoshinaga. 2019. Data augmentation using back-translation for context-aware neural machine translation. In Proceedings of [email protected].Google ScholarGoogle ScholarCross RefCross Ref
  47. Yibo Sun, Duyu Tang, N. Duan, Yeyun Gong, X. Feng, B. Qin, and D. Jiang. 2020. Neural semantic parsing in low-resource settings with back-translation and meta-learning. In Proceedings of AAAI.Google ScholarGoogle Scholar
  48. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of NIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Ben Tan, Yu Zhang, Sinno Jialin Pan, and Qiang Yang. 2017. Distant domain transfer learning. In Proceedings of AAAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Clara Vania and Adam Lopez. 2017. From characters to words to in between: Do we capture morphology? In Proceedings of ACL.Google ScholarGoogle ScholarCross RefCross Ref
  51. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of NIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Rui Wang, M. Utiyama, A. Finch, L. Liu, Kehai Chen, and Eiichiro Sumita. 2018b. Sentence selection and weighting for neural machine translation domain adaptation. IEEE/ACM Transactions on Audio, Speech, and Language Processing 26 (2018), 1727–1741. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Xinyi Wang, Hieu Pham, Zihang Dai, and Graham Neubig. 2018a. SwitchOut: An efficient data augmentation algorithm for neural machine translation. In Proceedings of EMNLP.Google ScholarGoogle ScholarCross RefCross Ref
  54. Yiren Wang, Lijun Wu, Yingce Xia, Tao Qin, ChengXiang Zhai, and T. Liu. 2020. Transductive ensemble learning for neural machine translation. In Proceedings of AAAI.Google ScholarGoogle Scholar
  55. Jason Wei and K. Zou. 2019. EDA: Easy data augmentation techniques for boosting performance on text classification tasks. In Proceedings of EMNLP/IJCNLP.Google ScholarGoogle Scholar
  56. Xing Wu, Shangwen Lv, Liangjun Zang, Jizhong Han, and S. Hu. 2019. Conditional BERT contextual augmentation. In Proceedings of ICCS.Google ScholarGoogle Scholar
  57. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, et al. 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144.Google ScholarGoogle Scholar
  58. Ziang Xie, Sida I. Wang, J. Li, Daniel Lévy, Allen Nie, Dan Jurafsky, and A. Ng. 2017. Data noising as smoothing in neural network language models. In Proceedings of ICLR.Google ScholarGoogle Scholar
  59. Jiali Zeng, Y. Liu, Jinsong Su, Yubing Ge, Yaojie Lu, Yongjing Yin, and Jiebo Luo. 2019. Iterative dual domain adaptation for neural machine translation. In Proceedings of EMNLP/IJCNLP.Google ScholarGoogle ScholarCross RefCross Ref
  60. Jiali Zeng, Jinsong Su, H. Wen, Yang Liu, J. Xie, Yongjing Yin, and J. Zhao. 2018. Multi-domain neural machine translation with word-level domain context discrimination. In Proceedings of EMNLP.Google ScholarGoogle Scholar
  61. Jiajun Zhang, Y. Zhao, Haoran Li, and C. Zong. 2019. Attention with sparsity regularization for neural machine translation and summarization. IEEE/ACM Transactions on Audio, Speech, and Language Processing 27 (2019), 507–518. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2017. Learning transferable architectures for scalable image recognition. arXiv:1707.07012.Google ScholarGoogle Scholar
  63. Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. 2016. Transfer learning for low-resource neural machine translation. In Proceedings of EMNLP.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Improving Data Augmentation for Low-Resource NMT Guided by POS-Tagging and Paraphrase Embedding

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!