skip to main content
research-article

Towards Machine Translation in Semantic Vector Space

Published:20 April 2015Publication History
Skip Abstract Section

Abstract

Measuring the quality of the translation rules and their composition is an essential issue in the conventional statistical machine translation (SMT) framework. To express the translation quality, the previous lexical and phrasal probabilities are calculated only according to the co-occurrence statistics in the bilingual corpus and may be not reliable due to the data sparseness problem. To address this issue, we propose measuring the quality of the translation rules and their composition in the semantic vector embedding space (VES). We present a recursive neural network (RNN)-based translation framework, which includes two submodels. One is the bilingually-constrained recursive auto-encoder, which is proposed to convert the lexical translation rules into compact real-valued vectors in the semantic VES. The other is a type-dependent recursive neural network, which is proposed to perform the decoding process by minimizing the semantic gap (meaning distance) between the source language string and its translation candidates at each state in a bottom-up structure. The RNN-based translation model is trained using a max-margin objective function that maximizes the margin between the reference translation and the n-best translations in forced decoding. In the experiments, we first show that the proposed vector representations for the translation rules are very reliable for application in translation modeling. We further show that the proposed type-dependent, RNN-based model can significantly improve the translation quality in the large-scale, end-to-end Chinese-to-English translation evaluation.

References

  1. Michael Auli, Michel Galley, Chris Quirk, and Geoffrey Zweig. 2013. Joint language and translation modeling with recurrent neural networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1044--1054.Google ScholarGoogle Scholar
  2. Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. J. Machine Learn. Res. 3, 1137--1155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Yoshua Bengio, Holger Schwenk, Jean-Sébastien Senécal, Fréderic Morin, and Jean-Luc Gauvain. 2006. Neural probabilistic language models. In Innovations in Machine Learning. Studies in Fuzziness and Soft Computing, Springer, Verlag, Berlin, Heidelberg, 137--186.Google ScholarGoogle Scholar
  4. David Chiang. 2007. Hierarchical phrase-based translation. Comput. Linguistics 33, 2, 201--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning. 160--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. J. Machine Learn. Res. 12, 2493--2537. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jacob Devlin, Rabih Zbib, Zhongqiang Huang, Thomas Lamar, Richard Schwartz, and John Makhoul. 2014. Fast and robust neural network joint models for statistical machine translation. In Proceedings of the 52nd ACL. 1370--1380.Google ScholarGoogle ScholarCross RefCross Ref
  8. John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. J. Machine Learn. Res. 12, 2121--2159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Matthias Eck, Stephen Vogal, and Alex Waibel. 2007. Estimating phrase pair relevance for translation model pruning. In Proceedings of the Machine Translation Summit XI.Google ScholarGoogle Scholar
  10. Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang, and Ignacio Thayer. 2006. Scalable inference and training of context-rich syntactic translation models. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 961--968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jianfeng Gao, Xiaodong He, Wen-tau Yih, and Li Deng. 2014. Learning continuous phrase represenations for translation modeling. In Proceedings of the 52nd ACL. 699--709.Google ScholarGoogle Scholar
  12. John Howard Johnson, Joel Martin, George Foster, and Roland Kuhn. 2007. Improving translation quality by discarding most of the phrasetable. In Proceedings of EMNLP.Google ScholarGoogle Scholar
  13. Liang Huang, Kevin Knight, and Aravind Joshi. 2006. Statistical syntax-directed translation with extended domain of locality. In Proceedings of AMTA. 66--73.Google ScholarGoogle Scholar
  14. Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent continuous translation models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1700--1709.Google ScholarGoogle Scholar
  15. Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of EMNLP. 388--395.Google ScholarGoogle Scholar
  16. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyes, Ondřej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. Association for Computational Linguistics, 177--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Peng Li, Yang Liu, and Maosong Sun. 2013. Recursive autoencoders for ITG-based translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.Google ScholarGoogle Scholar
  18. Wang Ling, Joao Graça, Isabel Trancoso, and Alan Black. 2012. Entropy-based pruning for phrase-based machine translation. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 962--971. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lemao Liu, Taro Watanabe, Eiichiro Sumita, and Tiejun Zhao. 2013. Additive neural networks for statistical machine translation. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 791--801.Google ScholarGoogle Scholar
  20. Yang Liu, Qun Liu, and Shouxun Lin. 2006. Tree-to-string alignment template for statistical machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 609--616. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Tomas Mikolov. 2012. Statistical language models based on neural networks. Ph.D Dissertation. Brno University of Technology.Google ScholarGoogle Scholar
  22. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of NIPS.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Franz Josef Och and Hermann Ney. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 295--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Nathan D. Ratliff, J. Andrew Bagnell, and Martin Zinkevich. 2007. (Approximate) Subgradient methods for structured prediction. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 380--387.Google ScholarGoogle Scholar
  25. Holger Schwenk. 2010. Continuous-space language models for statistical machine translation. Prague Bullet. Math. Linguistics 93, 137--146.Google ScholarGoogle Scholar
  26. Holger Schwenk. 2012. Continuous space translation models for phrase-based statistical machine translation. In Proceedings of the 24th COLING. 1071--1080.Google ScholarGoogle Scholar
  27. Richard Socher, John Bauer, Christopher D. Manning, and Andrew Y. Ng. 2013. Parsing with compositional vector grammars. In Proceedings of ACL.Google ScholarGoogle Scholar
  28. Richard Socher, Christopher D. Manning, and Andrew Y. Ng. 2010. Learning continuous phrase representations and syntactic parsing with recursive neural networks. In Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop.Google ScholarGoogle Scholar
  29. Richard Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D. Manning. 2011. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 151--161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Nadi Tomeh, Nicola Cancedda, and Marc Dymetman. 2009. Complexity-based phrase-table filtering for statistical machine translation. In Proceedings of Summit XII. 144--151.Google ScholarGoogle Scholar
  31. Ashish Vaswani, Yinggong Zhao, Victoria Fossum, and David Chiang. 2013. Decoding with large-scale neural language models improves translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1387--1392.Google ScholarGoogle Scholar
  32. Dekai Wu. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computat. Linguistics 23, 3, 377--403. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Deyi Xiong, Qun Liu, and Shouxun Lin. 2006. Maximum entropy based phrase reordering model for statistical machine translation. In Proceedings of ACL-COLING. 505--512. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Richard Zens, Daisy Stanton, and Peng Xu. 2012. A systematic comparison of phrase table pruning techniques. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 972--983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jiajun Zhang, Shujie Liu, Mu Li, Ming Zhou, and Chengqing Zong. 2014a. Bilingually-constrained phrase embeddings for machine translation. In Proceedings of the 52th Annual Meeting on Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  36. Jiajun Zhang, Shujie Liu, Mu Li, Ming Zhou, and Chengqing Zong. 2014b. Mind the gap: Machine translation by minimizing the semantic gap in embedding space. In Proceedings of the 28th AAAI.Google ScholarGoogle Scholar
  37. Jiajun Zhang, Feifei Zhai, and Chengqing Zong. 2011. Augmenting string-to-tree translation models with fuzzy use of source-side syntax. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 204--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Jiajun Zhang, Feifei Zhai, and Chengqing Zong. 2013. Syntax-based translation with bilingually lexicalized synchronous tree substitution grammars. IEEE Trans. Audio, Speech, Lang. Process. 21, 8, 1586--1597. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Min Zhang, Hongfei Jiang, Aiti Aw, Haizhou Li, Chew Lim Tan, and Sheng Li. 2008. A tree sequence alignment-based tree-to-tree translation model. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL). 559--567.Google ScholarGoogle Scholar
  40. Will Y. Zou, Richard Socher, Daniel Cer, and Christopher D. Manning. 2013. Bilingual word embeddings for phrase-based machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1393--1398.Google ScholarGoogle Scholar

Index Terms

  1. Towards Machine Translation in Semantic Vector Space

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!