skip to main content
research-article

Improving Neural Machine Translation by Transferring Knowledge from Syntactic Constituent Alignment Learning

Authors Info & Claims
Published:29 April 2022Publication History
Skip Abstract Section

Abstract

Statistical machine translation (SMT) models rely on word-, phrase-, and syntax-level alignments. But neural machine translation (NMT) models rarely explicitly learn the phrase- and syntax-level alignments. In this article, we propose to improve NMT by explicitly learning the bilingual syntactic constituent alignments. Specifically, we first utilize syntactic parsers to induce syntactic structures of sentences, and then we propose two ways to utilize the syntactic constituents in a perceptual (not adversarial) generator-discriminator training framework. One way is to use them to measure the alignment score of sentence-level training examples, and the other is to directly score the alignments of constituent-level examples generated with an algorithm based on word-level alignments from SMT. In our generator-discriminator framework, the discriminator is pre-trained to learn constituent alignments and distinguish the ground-truth translation from the fake ones, while the generative translation model is fine-tuned to receive the alignment knowledge and to generate translations that best approximate the true ones. Experiments and analysis show that the learned constituent alignments can help improve the translation results.

REFERENCES

  1. [1] Aharoni Roee and Goldberg Yoav. 2017. Toward string-to-tree neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 132140. Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Akoury Nader, Krishna Kalpesh, and Iyyer Mohit. 2019. Syntactically supervised transformers for faster neural machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 12691281. Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Bahdanau Dzmitry, Cho Kyunghyun, and Bengio Yoshua. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15). Retrieved from http://arxiv.org/abs/1409.0473.Google ScholarGoogle Scholar
  4. [4] Brown Peter F., Pietra Stephen A. Della, Pietra Vincent J. Della, and Mercer Robert L.. 1993. The mathematics of statistical machine translation: Parameter estimation. Comput. Linguist. 19, 2 (1993), 263311. Retrieved from https://www.aclweb.org/anthology/J93-2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Che Wanxiang, Li Zhenghua, and Liu Ting. 2010. Ltp: A chinese language technology platform. In Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations. Association for Computational Linguistics, 1316.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Chen Huadong, Huang Shujian, Chiang David, and Chen Jiajun. 2017. Improved neural machine translation with a syntax-aware encoder and decoder. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 19361945. Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Chiang D.. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics. The Association for Computer Linguistics, 263270. Retrieved from http://aclweb.org/anthology/P/P05/P05-1033.pdf.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Cho Kyunghyun, Merrienboer Bart van, Gulcehre Caglar, Bahdanau Dzmitry, Bougares Fethi, Schwenk Holger, and Bengio Yoshua. 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). Association for Computational Linguistics, 17241734. Retrieved from http://aclweb.org/anthology/D14-1179.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Choi Jihun, Yoo Kang Min, and Lee Sang-goo. 2018. Learning to compose task-specific tree structures. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18). 50945101.Google ScholarGoogle Scholar
  10. [10] Chomsky Noam. 2002. Syntactic Structures. Walter de Gruyter.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Dahlmann Leonard, Matusov Evgeny, Petrushkov Pavel, and Khadivi Shahram. 2017. Neural machine translation leveraging phrase-based models in a hybrid search. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 14111420. Retrieved from https://www.aclweb.org/anthology/D17-1148.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Denkowski M. J. and Lavie A.. 2014. Meteor universal: Language specific translation evaluation for any target language. In Proceedings of the 9th Workshop on Statistical Machine Translation ([email protected]’14). 376380. Retrieved from http://aclweb.org/anthology/W/W14/W14-3348.pdf.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Dyer Chris, Kuncoro Adhiguna, Ballesteros Miguel, and Smith Noah A.. 2016. Recurrent neural network grammars. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 199209. Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Eriguchi Akiko, Hashimoto Kazuma, and Tsuruoka Yoshimasa. 2016. Tree-to-Sequence attentional neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 823833. Retrieved from http://aclweb.org/anthology/P16-1078.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Eriguchi Akiko, Tsuruoka Yoshimasa, and Cho Kyunghyun. 2017. Learning to parse and translate improves neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 7278. Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Feng Jiangtao, Kong Lingpeng, Huang Po-Sen, Wang Chong, Huang Da, Mao Jiayuan, Qiao Kan, and Zhou Dengyong. 2018. Neural Phrase-to-Phrase Machine Translation. Retrieved from https://arxiv:cs.CL/1811.02172.Google ScholarGoogle Scholar
  17. [17] Guo Junliang, Tan Xu, He Di, Qin Tao, Xu Linli, and Liu Tie-Yan. 2019. Non-autoregressive neural machine translation with enhanced decoder input. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 37233730.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Huang Po-Sen, Wang Chong, Huang Sitao, Zhou Dengyong, and Deng Li. 2018. Towards neural phrase-based machine translation. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=HktJec1RZ.Google ScholarGoogle Scholar
  19. [19] Isozaki Hideki, Hirao Tsutomu, Duh Kevin, Sudoh Katsuhito, and Tsukada Hajime. 2010. Automatic evaluation of translation quality for distant language pairs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’10). 944952. Retrieved from http://dl.acm.org/citation.cfm?id=1870658.1870750.Google ScholarGoogle Scholar
  20. [20] Jang Eric, Gu Shixiang, and Poole Ben. 2017. Categorical reparameterization with gumbel-softmax. In Proceedings of the International Conference on Learning Representations (ICLR’17).Google ScholarGoogle Scholar
  21. [21] Kingma D. and Adam J. Ba. 2015. A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR’15), Vol. 5.Google ScholarGoogle Scholar
  22. [22] Kiros Jamie, Chan William, and Hinton Geoffrey. 2018. Illustrative language understanding: Large-Scale visual grounding with image search. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 922933. Retrieved from https://www.aclweb.org/anthology/P18-1085.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Klein Dan and Manning Christopher D.. 2003. Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 423430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Koehn P., Hoang H., Birch A., Callison-Burch C., Federico M., Bertoldi N., Cowan B., Shen W., Moran C., Zens R., Dyer C., Bojar O., Constantin A., and Herbst E.. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the Association for Computational Linguistics (ACL’07). The Association for Computational Linguistics. Retrieved from http://aclweb.org/anthology-new/P/P07/P07-2045.pdf.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Koehn Philipp, Och Franz Josef, and Marcu Daniel. 2003. Statistical phrase-based translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics—Human Language Technologies (HLT-NAACL’03). The Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/N03-1017/.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Levy Roger and Manning Christopher D.. 2003. Is it harder to parse Chinese, or the Chinese treebank?. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 439446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Liu Y., Liu Q., and Lin S.. 2006. Tree-to-String alignment template for statistical machine translation. In Proceedings of the Association for Computational Linguistics (ACL’06). The Association for Computer Linguistics. Retrieved from http://aclweb.org/anthology/P06-1077.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Luong Minh-Thang, Le Quoc V., Sutskever Ilya, Vinyals Oriol, and Kaiser Lukasz. 2016. Multi-task sequence to sequence learning. In Proceedings of the 4th International Conference on Learning Representations (ICLR’16), Bengio Yoshua and LeCun Yann (Eds.). Retrieved from http://arxiv.org/abs/1511.06114.Google ScholarGoogle Scholar
  29. [29] Ma Xiaoyi. 2006. Champollion: A robust parallel text sentence aligner. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC’06).489492. Retrieved from http://www.lrec-conf.org/proceedings/lrec2006/pdf/746_pdf.pdf.Google ScholarGoogle Scholar
  30. [30] Maddison Chris J., Mnih Andriy, and Teh Yee Whye. 2017. The concrete distribution: A continuous relaxation of discrete random variables. In Proceedings of the International Conference on Learning Representations (ICLR’17).Google ScholarGoogle Scholar
  31. [31] Mi Haitao, Wang Zhiguo, and Ittycheriah Abe. 2016. Vocabulary Manipulation for Neural Machine Translation. Retrieved from https://arxiv:cs.CL/1605.03209.Google ScholarGoogle Scholar
  32. [32] Miyao Yusuke and Tsujii Jun’ichi. 2008. Feature forest models for probabilistic HPSG parsing. Comput. Linguist. 34, 1 (2008), 3580. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Nguyen Phi Xuan and Joty Shafiq. 2018. Phrase-Based Attentions. Retrieved from https://arxiv:cs.CL/1810.03444.Google ScholarGoogle Scholar
  34. [34] Papineni K., Roukos S., Ward T., and Zhu W. J.. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics311318. Retrieved from http://www.aclweb.org/anthology/P02-1040.pdf.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Park Chan Young and Tsvetkov Yulia. 2019. Learning to generate word- and phrase-embeddings for efficient phrase-based neural machine translation. In Proceedings of the 3rd Workshop on Neural Generation and Translation. 241248. Retrieved from https://www.aclweb.org/anthology/D19-5626.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Rafferty Anna and Manning Christopher D.. 2008. Parsing three German treebanks: Lexicalized and unlexicalized baselines. In Proceedings of the Workshop on Parsing German. Association for Computational Linguistics, 4046. Retrieved from https://aclanthology.org/W08-1006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Sennrich Rico, Haddow Barry, and Birch Alexandra. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 17151725. Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Shi Haoyue, Mao Jiayuan, Gimpel Kevin, and Livescu Karen. 2019. Visually grounded neural syntax acquisition. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 18421861. Retrieved from https://www.aclweb.org/anthology/P19-1180.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Shi Xuewen, Huang Heyan, Wang Wenguan, Jian Ping, and Tang Yi-Kun. 2019. Improving neural machine translation by achieving knowledge transfer with sentence alignment learning. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL’19). 260270. Retrieved from https://www.aclweb.org/anthology/K19-1025.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Sohn Kihyuk. 2016. Improved deep metric learning with multi-class n-pair loss objective. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16). Curran Associates, 1857–1865.Google ScholarGoogle Scholar
  41. [41] Sutskever Ilya, Vinyals Oriol, and Le Quoc V.. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Annual Conference on Neural Information Processing Systems. 31043112. Retrieved from http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.Google ScholarGoogle Scholar
  42. [42] Tai Kai Sheng, Socher Richard, and Manning Christopher D.. 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 15561566. Retrieved from http://aclanthology.coli.uni-saarland.de/pdf/P/P15/P15-1150.pdf.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Tinsley John, Zhechev Ventsislav, Hearne Mary, and Way Andy. 2007. Robust language pair-independent sub-tree alignment. In Proceedings of Machine Translation Summit XI: Papers. https://aclanthology.org/2007.mtsummit-papers.62.Google ScholarGoogle Scholar
  44. [44] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30. MIT Press, 59986008. Retrieved from http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf.Google ScholarGoogle Scholar
  45. [45] Wang Chong, Wang Yining, Huang Po-Sen, Mohamed Abdelrahman, Zhou Dengyong, and Deng Li. 2017. Sequence modeling via segmentations. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). 36743683.Google ScholarGoogle Scholar
  46. [46] Wang Xinyi, Pham Hieu, Yin Pengcheng, and Neubig Graham. 2018. A tree-based decoder for neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 47724777. Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Wang Xing, Tu Zhaopeng, Xiong Deyi, and Zhang Min. 2017. Translating phrases in neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 14211431. Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Wiseman Sam, Shieber Stuart, and Rush Alexander. 2018. Learning neural templates for text generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 31743187. Retrieved from https://www.aclweb.org/anthology/D18-1356.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Wu Shuangzhi, Zhang Dongdong, Yang Nan, Li Mu, and Zhou Ming. 2017. Sequence-to-dependency neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Vancouver, 698707. Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Wu S., Zhang D., Zhang Z., Yang N., Li M., and Zhou M.. 2018. Dependency-to-Dependency neural machine translation. IEEE/ACM Trans. Audio, Speech, Lang. Process. 26, 11 (2018), 21322141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Wu Shuangzhi, Zhou Ming, and Zhang Dongdong. 2017. Improved neural machine translation with source syntax. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). 41794185. Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Yang Pengcheng, Chen Boxing, Zhang Pei, and Sun Xu. 2019. Visual Agreement Regularized Training for Multi-Modal Machine Translation. Retrieved from https://arxiv:cs.CL/1912.12014.Google ScholarGoogle Scholar
  53. [53] Yu Lei, Buys Jan, and Blunsom Phil. 2016. Online segment to segment neural transduction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 13071316. Retrieved from https://www.aclweb.org/anthology/D16-1138.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Zaremoodi Poorya and Haffari Gholamreza. 2018. Incorporating syntactic uncertainty in neural machine translation with a forest-to-sequence model. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, 14211429. Retrieved from https://www.aclweb.org/anthology/C18-1120.Google ScholarGoogle Scholar
  55. [55] Zhao Yang, Wang Yining, Zhang Jiajun, and Zong Chengqing. 2018. Phrase table as recommendation memory for neural machine translation. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18). International Joint Conferences on Artificial Intelligence Organization, 46094615. Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Zhu Xiaodan, Sobhani Parinaz, and Guo Hongyu. 2015. Long short-term memory over recursive structures. In Proceedings of the International Conference on Machine Learning. 16041612.Google ScholarGoogle Scholar

Index Terms

  1. Improving Neural Machine Translation by Transferring Knowledge from Syntactic Constituent Alignment Learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 5
      September 2022
      486 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3533669
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 April 2022
      • Online AM: 23 March 2022
      • Accepted: 1 January 2022
      • Revised: 1 October 2021
      • Received: 1 August 2020
      Published in tallip Volume 21, Issue 5

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!