skip to main content
research-article

Multi-task Stack Propagation for Neural Quality Estimation

Authors Info & Claims
Published:21 May 2019Publication History
Skip Abstract Section

Abstract

Quality estimation is an important task in machine translation that has attracted increased interest in recent years. A key problem in translation-quality estimation is the lack of a sufficient amount of the quality annotated training data. To address this shortcoming, the Predictor-Estimator was proposed recently by introducing “word prediction” as an additional pre-subtask that predicts a current target word with consideration of surrounding source and target contexts, resulting in a two-stage neural model composed of a predictor and an estimator. However, the original Predictor-Estimator is not trained on a continuous stacking model but instead in a cascaded manner that separately trains the predictor from the estimator. In addition, the Predictor-Estimator is trained based on single-task learning only, which uses target-specific quality-estimation data without using other training data that are available from other-level quality-estimation tasks. In this article, we thus propose a multi-task stack propagation, which extensively applies stack propagation to fully train the Predictor-Estimator on a continuous stacking architecture and multi-task learning to enhance the training data from related other-level quality-estimation tasks. Experimental results on WMT17 quality-estimation datasets show that the Predictor-Estimator trained with multi-task stack propagation provides statistically significant improvements over the baseline models. In particular, under an ensemble setting, the proposed multi-task stack propagation leads to state-of-the-art performance at all the sentence/word/phrase levels for WMT17 quality estimation tasks.

References

  1. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the ICLR 2015.Google ScholarGoogle Scholar
  2. Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Shujian Huang, Matthias Huck, Philipp Koehn, Qun Liu, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Raphael Rubino, Lucia Specia, and Marco Turchi. 2017. Findings of the 2017 conference on machine translation (WMT17). In Proceedings of the 2nd Conference on Machine Translation, Volume 2: Shared Task Papers. Association for Computational Linguistics, 169--214. Retrieved from http://www.aclweb.org/anthology/W17-4717.Google ScholarGoogle Scholar
  3. Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Aurelie Neveol, Mariana Neves, Martin Popel, Matt Post, Raphael Rubino, Carolina Scarton, Lucia Specia, Marco Turchi, Karin Verspoor, and Marcos Zampieri. 2016. Findings of the 2016 conference on machine translation. In Proceedings of the 1st Conference on Machine Translation. Association for Computational Linguistics, 131--198. Retrieved from http://www.aclweb.org/anthology/W/W16/W16-2301.Google ScholarGoogle Scholar
  4. Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). Association for Computational Linguistics, 1724--1734. Retrieved from http://www.aclweb.org/anthology/D14-1179.Google ScholarGoogle ScholarCross RefCross Ref
  5. Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12 (Nov. 2011), 2493--2537. http://dl.acm.org/citation.cfm?id=1953048.2078186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and Haifeng Wang. 2015. Multi-task learning for multiple language translation. In Proceedings of the 53rd Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 1723--1732.Google ScholarGoogle Scholar
  7. Mariano Felice and Lucia Specia. 2012. Linguistic features for quality estimation. In Proceedings of the 7th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 96--103. Retrieved from http://www.aclweb.org/anthology/W12-3110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jesús González-Rubio, J. Ramón Navarro-Cerdán, and Francisco Casacuberta. 2013. Dimensionality reduction methods for machine translation quality estimation. Mach. Trans. 27, 3 (2013), 281--301. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jesús González-Rubio, Alberto Sanchís, and Francisco Casacuberta. 2012. PRHLT submission to the WMT12 quality estimation task. In Proceedings of the 7th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 104--108. Retrieved from http://www.aclweb.org/anthology/W12-3111.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, and Richard Socher. 2017. A joint many-task model: Growing a neural network for multiple NLP tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1923--1933. Retrieved from http://aclweb.org/anthology/D17-1206.Google ScholarGoogle ScholarCross RefCross Ref
  11. Marcin Junczys-Dowmunt and Roman Grundkiewicz. 2016. Log-linear combinations of monolingual and bilingual neural machine translation models for automatic post-editing. In Proceedings of the 1st Conference on Machine Translation (WMT’16). 751--758.Google ScholarGoogle ScholarCross RefCross Ref
  12. Lukasz Kaiser, Aidan N. Gomez, and François Chollet. 2017. Depthwise separable convolutions for neural machine translation. CoRR abs. Retrieved from http://arxiv.org/abs/1706.03059.Google ScholarGoogle Scholar
  13. Hyun Kim, Hun-Young Jung, Hongseok Kwon, Jong-Hyeok Lee, and Seung-Hoon Na. 2017. Predictor-estimator: Neural quality estimation based on target word prediction for machine translation. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 17, 1, Article 3 (Sept. 2017), 22 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Hyun Kim and Jong-Hyeok Lee. 2016. Recurrent neural network based translation quality estimation. In Proceedings of the 1st Conference on Machine Translation. Association for Computational Linguistics, 787--792. Retrieved from http://www.aclweb.org/anthology/W/W16/W16-2384.Google ScholarGoogle ScholarCross RefCross Ref
  15. Hyun Kim and Jong-Hyeok Lee. 2016. A recurrent neural networks approach for estimating the quality of machine translation output. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 494--498. Retrieved from http://www.aclweb.org/anthology/N16-1059.Google ScholarGoogle ScholarCross RefCross Ref
  16. Hyun Kim, Jong-Hyeok Lee, and Seung-Hoon Na. 2017. Predictor-estimator using multilevel task learning with stack propagation for neural quality estimation. In Proceedings of the 2nd Conference on Machine Translation, Volume 2: Shared Task Papers. Association for Computational Linguistics, 562--568. Retrieved from http://www.aclweb.org/anthology/W17-4763.Google ScholarGoogle ScholarCross RefCross Ref
  17. Anna Kozlova, Mariya Shmatova, and Anton Frolov. 2016. YSDA participation in the WMT’16 quality estimation shared task. In Proceedings of the 1st Conference on Machine Translation. Association for Computational Linguistics, 793--799. Retrieved from http://www.aclweb.org/anthology/W/W16/W16-2385.Google ScholarGoogle ScholarCross RefCross Ref
  18. Julia Kreutzer, Shigehiko Schamoni, and Stefan Riezler. 2015. QUality Estimation from scraTCH (QUETCH): Deep learning for word-level translation quality estimation. In Proceedings of the 10th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 316--322. Retrieved from http://aclweb.org/anthology/W15-3037.Google ScholarGoogle ScholarCross RefCross Ref
  19. Thang Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, and Lukasz Kaiser. 2016. Multi-task sequence to sequence learning. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  20. André F. T. Martins, Ramón Astudillo, Chris Hokamp, and Fabio Kepler. 2016. Unbabel’s participation in the WMT16 word-level translation quality estimation shared task. In Proceedings of the 1st Conference on Machine Translation. Association for Computational Linguistics, 806--811. Retrieved from http://www.aclweb.org/anthology/W/W16/W16-2387.Google ScholarGoogle ScholarCross RefCross Ref
  21. André F. T. Martins, Junczys-Dowmunt Marcin, Fabio Kepler, and Ramón Astudillo. 2017. Pushing the limits of translation quality estimation. Trans. Assoc. Comput. Ling. 5 (2017) 205--218.Google ScholarGoogle ScholarCross RefCross Ref
  22. Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Meeting on Association for Computational Linguistics (ACL’03). Volume 1. 160--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Raj Nath Patel and Sasikumar M. 2016. Translation quality estimation using recurrent neural network. In Proceedings of the First Conference on Machine Translation. Association for Computational Linguistics, 819--824. Retrieved from http://www.aclweb.org/anthology/W/W16/W16-2389.Google ScholarGoogle Scholar
  24. Hao Peng, Sam Thomson, and Noah A. Smith. 2017. Deep multitask learning for semantic dependency parsing. In Proceedings of the 55th Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2037--2048.Google ScholarGoogle Scholar
  25. Raphael Rubino, José de Souza, Jennifer Foster, and Lucia Specia. 2013. Topic models for translation quality estimation for gisting purposes. In Proceedings of the 14th Machine Translation Summit. 295--302.Google ScholarGoogle Scholar
  26. Kashif Shah, Trevor Cohn, and Lucia Specia. 2015. A Bayesian non-linear method for feature selection in machine translation quality estimation. Mach. Trans. 29, 2 (2015), 101--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the Association for Machine Translation in the Americas. 223--231.Google ScholarGoogle Scholar
  28. Anders Søgaard and Yoav Goldberg. 2016. Deep multi-task learning with low level tasks supervised at lower layers. In Proceedings of the 54th Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 231--235.Google ScholarGoogle ScholarCross RefCross Ref
  29. Radu Soricut and Abdessamad Echihabi. 2010. TrustRank: Inducing trust in automatic translations via ranking. In Proceedings of the 48th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 612--621. Retrieved from http://www.aclweb.org/anthology/P10-1063. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Lucia Specia and Varvara Logacheva. 2017. WMT17 Quality estimation shared task training and development data. LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University. Retrieved from http://hdl.handle.net/11372/LRT-1974.Google ScholarGoogle Scholar
  31. Lucia Specia, Kashif Shah, José G. C. de Souza, and Trevor Cohn. 2013. QuEst—A translation quality estimation framework. In Proceedings of the 51st Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, 79--84. Retrieved from http://www.aclweb.org/anthology/P13-4014.Google ScholarGoogle Scholar
  32. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 5998--6008. Retrieved from http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yuan Zhang and David Weiss. 2016. Stack-propagation: Improved representation learning for syntax. In Proceedings of the 54th Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1557--1566. Retrieved from http://www.aclweb.org/anthology/P16-1147.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Multi-task Stack Propagation for Neural Quality Estimation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!