Abstract
Statistical machine translation (SMT) models rely on word-, phrase-, and syntax-level alignments. But neural machine translation (NMT) models rarely explicitly learn the phrase- and syntax-level alignments. In this article, we propose to improve NMT by explicitly learning the bilingual syntactic constituent alignments. Specifically, we first utilize syntactic parsers to induce syntactic structures of sentences, and then we propose two ways to utilize the syntactic constituents in a perceptual (not adversarial) generator-discriminator training framework. One way is to use them to measure the alignment score of sentence-level training examples, and the other is to directly score the alignments of constituent-level examples generated with an algorithm based on word-level alignments from SMT. In our generator-discriminator framework, the discriminator is pre-trained to learn constituent alignments and distinguish the ground-truth translation from the fake ones, while the generative translation model is fine-tuned to receive the alignment knowledge and to generate translations that best approximate the true ones. Experiments and analysis show that the learned constituent alignments can help improve the translation results.
- [1] . 2017. Toward string-to-tree neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 132–140. Google Scholar
Cross Ref
- [2] . 2019. Syntactically supervised transformers for faster neural machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1269–1281. Google Scholar
Cross Ref
- [3] . 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15). Retrieved from http://arxiv.org/abs/1409.0473.Google Scholar
- [4] . 1993. The mathematics of statistical machine translation: Parameter estimation. Comput. Linguist. 19, 2 (1993), 263–311. Retrieved from https://www.aclweb.org/anthology/J93-2003.Google Scholar
Digital Library
- [5] . 2010. Ltp: A chinese language technology platform. In Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations. Association for Computational Linguistics, 13–16.Google Scholar
Digital Library
- [6] . 2017. Improved neural machine translation with a syntax-aware encoder and decoder. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1936–1945. Google Scholar
Cross Ref
- [7] . 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics. The Association for Computer Linguistics, 263–270. Retrieved from http://aclweb.org/anthology/P/P05/P05-1033.pdf.Google Scholar
Digital Library
- [8] . 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). Association for Computational Linguistics, 1724–1734. Retrieved from http://aclweb.org/anthology/D14-1179.Google Scholar
Cross Ref
- [9] . 2018. Learning to compose task-specific tree structures. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18). 5094–5101.Google Scholar
- [10] . 2002. Syntactic Structures. Walter de Gruyter.Google Scholar
Cross Ref
- [11] . 2017. Neural machine translation leveraging phrase-based models in a hybrid search. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1411–1420. Retrieved from https://www.aclweb.org/anthology/D17-1148.Google Scholar
Cross Ref
- [12] . 2014. Meteor universal: Language specific translation evaluation for any target language. In Proceedings of the 9th Workshop on Statistical Machine Translation ([email protected]’14). 376–380. Retrieved from http://aclweb.org/anthology/W/W14/W14-3348.pdf.Google Scholar
Cross Ref
- [13] . 2016. Recurrent neural network grammars. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 199–209. Google Scholar
Cross Ref
- [14] . 2016. Tree-to-Sequence attentional neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 823–833. Retrieved from http://aclweb.org/anthology/P16-1078.Google Scholar
Cross Ref
- [15] . 2017. Learning to parse and translate improves neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 72–78. Google Scholar
Cross Ref
- [16] . 2018. Neural Phrase-to-Phrase Machine Translation. Retrieved from https://arxiv:cs.CL/1811.02172.Google Scholar
- [17] . 2019. Non-autoregressive neural machine translation with enhanced decoder input. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3723–3730.Google Scholar
Digital Library
- [18] . 2018. Towards neural phrase-based machine translation. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=HktJec1RZ.Google Scholar
- [19] . 2010. Automatic evaluation of translation quality for distant language pairs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’10). 944–952. Retrieved from http://dl.acm.org/citation.cfm?id=1870658.1870750.Google Scholar
- [20] . 2017. Categorical reparameterization with gumbel-softmax. In Proceedings of the International Conference on Learning Representations (ICLR’17).Google Scholar
- [21] . 2015. A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR’15), Vol. 5.Google Scholar
- [22] . 2018. Illustrative language understanding: Large-Scale visual grounding with image search. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 922–933. Retrieved from https://www.aclweb.org/anthology/P18-1085.Google Scholar
Cross Ref
- [23] . 2003. Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 423–430. Google Scholar
Digital Library
- [24] . 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the Association for Computational Linguistics (ACL’07). The Association for Computational Linguistics. Retrieved from http://aclweb.org/anthology-new/P/P07/P07-2045.pdf.Google Scholar
Cross Ref
- [25] . 2003. Statistical phrase-based translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics—Human Language Technologies (HLT-NAACL’03). The Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/N03-1017/.Google Scholar
Cross Ref
- [26] . 2003. Is it harder to parse Chinese, or the Chinese treebank?. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 439–446. Google Scholar
Digital Library
- [27] . 2006. Tree-to-String alignment template for statistical machine translation. In Proceedings of the Association for Computational Linguistics (ACL’06). The Association for Computer Linguistics. Retrieved from http://aclweb.org/anthology/P06-1077.Google Scholar
Digital Library
- [28] . 2016. Multi-task sequence to sequence learning. In Proceedings of the 4th International Conference on Learning Representations (ICLR’16), and (Eds.). Retrieved from http://arxiv.org/abs/1511.06114.Google Scholar
- [29] . 2006. Champollion: A robust parallel text sentence aligner. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC’06).489–492. Retrieved from http://www.lrec-conf.org/proceedings/lrec2006/pdf/746_pdf.pdf.Google Scholar
- [30] . 2017. The concrete distribution: A continuous relaxation of discrete random variables. In Proceedings of the International Conference on Learning Representations (ICLR’17).Google Scholar
- [31] . 2016. Vocabulary Manipulation for Neural Machine Translation. Retrieved from https://arxiv:cs.CL/1605.03209.Google Scholar
- [32] . 2008. Feature forest models for probabilistic HPSG parsing. Comput. Linguist. 34, 1 (2008), 35–80. Google Scholar
Digital Library
- [33] . 2018. Phrase-Based Attentions. Retrieved from https://arxiv:cs.CL/1810.03444.Google Scholar
- [34] . 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics311–318. Retrieved from http://www.aclweb.org/anthology/P02-1040.pdf.Google Scholar
Digital Library
- [35] . 2019. Learning to generate word- and phrase-embeddings for efficient phrase-based neural machine translation. In Proceedings of the 3rd Workshop on Neural Generation and Translation. 241–248. Retrieved from https://www.aclweb.org/anthology/D19-5626.Google Scholar
Cross Ref
- [36] . 2008. Parsing three German treebanks: Lexicalized and unlexicalized baselines. In Proceedings of the Workshop on Parsing German. Association for Computational Linguistics, 40–46. Retrieved from https://aclanthology.org/W08-1006.Google Scholar
Digital Library
- [37] . 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1715–1725. Google Scholar
Cross Ref
- [38] . 2019. Visually grounded neural syntax acquisition. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 1842–1861. Retrieved from https://www.aclweb.org/anthology/P19-1180.Google Scholar
Cross Ref
- [39] . 2019. Improving neural machine translation by achieving knowledge transfer with sentence alignment learning. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL’19). 260–270. Retrieved from https://www.aclweb.org/anthology/K19-1025.Google Scholar
Cross Ref
- [40] . 2016. Improved deep metric learning with multi-class n-pair loss objective. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16). Curran Associates, 1857–1865.Google Scholar
- [41] . 2014. Sequence to sequence learning with neural networks. In Proceedings of the Annual Conference on Neural Information Processing Systems. 3104–3112. Retrieved from http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.Google Scholar
- [42] . 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 1556–1566. Retrieved from http://aclanthology.coli.uni-saarland.de/pdf/P/P15/P15-1150.pdf.Google Scholar
Cross Ref
- [43] . 2007. Robust language pair-independent sub-tree alignment. In Proceedings of Machine Translation Summit XI: Papers. https://aclanthology.org/2007.mtsummit-papers.62.Google Scholar
- [44] . 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30. MIT Press, 5998–6008. Retrieved from http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf.Google Scholar
- [45] . 2017. Sequence modeling via segmentations. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). 3674–3683.Google Scholar
- [46] . 2018. A tree-based decoder for neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 4772–4777. Google Scholar
Cross Ref
- [47] . 2017. Translating phrases in neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1421–1431. Google Scholar
Cross Ref
- [48] . 2018. Learning neural templates for text generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 3174–3187. Retrieved from https://www.aclweb.org/anthology/D18-1356.Google Scholar
Cross Ref
- [49] . 2017. Sequence-to-dependency neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Vancouver, 698–707. Google Scholar
Cross Ref
- [50] . 2018. Dependency-to-Dependency neural machine translation. IEEE/ACM Trans. Audio, Speech, Lang. Process. 26, 11 (2018), 2132–2141. Google Scholar
Digital Library
- [51] . 2017. Improved neural machine translation with source syntax. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). 4179–4185. Google Scholar
Cross Ref
- [52] . 2019. Visual Agreement Regularized Training for Multi-Modal Machine Translation. Retrieved from https://arxiv:cs.CL/1912.12014.Google Scholar
- [53] . 2016. Online segment to segment neural transduction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1307–1316. Retrieved from https://www.aclweb.org/anthology/D16-1138.Google Scholar
Cross Ref
- [54] . 2018. Incorporating syntactic uncertainty in neural machine translation with a forest-to-sequence model. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, 1421–1429. Retrieved from https://www.aclweb.org/anthology/C18-1120.Google Scholar
- [55] . 2018. Phrase table as recommendation memory for neural machine translation. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18). International Joint Conferences on Artificial Intelligence Organization, 4609–4615. Google Scholar
Cross Ref
- [56] . 2015. Long short-term memory over recursive structures. In Proceedings of the International Conference on Machine Learning. 1604–1612.Google Scholar
Index Terms
Improving Neural Machine Translation by Transferring Knowledge from Syntactic Constituent Alignment Learning
Recommendations
Large aligned treebanks for syntax-based machine translation
We present a collection of parallel treebanks that have been automatically aligned on both the terminal and the non-terminal constituent level for use in syntax-based machine translation. We describe how they were constructed and applied to a syntax- ...
Using Translation Memory to Improve Neural Machine Translations
ICDLT '22: Proceedings of the 2022 6th International Conference on Deep Learning TechnologiesIn this paper, we describe a way of using translation memory (TM) to improve the translation quality and stability of neural machine translation (NMT) systems, especially when the sentences to be translated have high similarity with sentences stored in ...
Syntax-aware neural machine translation directed by syntactic dependency degree
AbstractThere are various ways to incorporate syntax knowledge into neural machine translation (NMT). However, quantifying the dependency syntactic intimacy (DSI) between word pairs in a dependency tree has not being considered to use in attentional and ...






Comments