skip to main content
short-paper

POS Tag-enhanced Coarse-to-fine Attention for Neural Machine Translation

Published:22 April 2019Publication History
Skip Abstract Section

Abstract

Although neural machine translation (NMT) has certain capability to implicitly learn semantic information of sentences, we explore and show that Part-of-Speech (POS) tags can be explicitly incorporated into the attention mechanism of NMT effectively to yield further improvements. In this article, we propose an NMT model with tag-enhanced attention mechanism. In our model, NMT and POS tagging are jointly modeled via multi-task learning. Besides following common practice to enrich encoder annotations by introducing predicted source POS tags, we exploit predicted target POS tags to refine attention model in a coarse-to-fine manner. Specifically, we first implement a coarse attention operation solely on source annotations and target hidden state, where the produced context vector is applied to update target hidden state used for target POS tagging. Then, we perform a fine attention operation that extends the coarse one by further exploiting the predicted target POS tags. Finally, we facilitate word prediction by simultaneously utilizing the context vector from fine attention and the predicted target POS tags. Experimental results and further analyses on Chinese-English and Japanese-English translation tasks demonstrate the superiority of our proposed model over the conventional NMT models. We release our code at https://github.com/middlekisser/PEA-NMT.git.

References

  1. Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the ICLR.Google ScholarGoogle Scholar
  2. Maria Barrett, Joachim Bingel, Frank Keller, and Anders Søgaard. 2016. Weakly supervised part-of-speech tagging using eye-tracking data. In Proceedings of the ACL2016.Google ScholarGoogle ScholarCross RefCross Ref
  3. Joost Bastings, Ivan Titov, Wilker Aziz, Diego Marcheggiani, and Khalil Simaan. 2017. Graph convolutional encoders for syntax-aware neural machine translation. In Proceedings of the EMNLP.Google ScholarGoogle ScholarCross RefCross Ref
  4. Franck Burlot, Mercedes García-Martínez, Loïc Barrault, and Fethi Bougares. 2017. Word representations in factored neural machine translation. In Proceedings of the WMT.Google ScholarGoogle ScholarCross RefCross Ref
  5. Huadong Chen, Shujian Huang, David Chiang, and Jiajun Chen. 2017. Improved neural machine translation with a syntax-aware encoder and decoder. In Proceedings of the ACL.Google ScholarGoogle ScholarCross RefCross Ref
  6. Kehai Chen, Rui Wang, Masao Utiyama, Lemao Liu, Akihiro Tamura, Eiichiro Sumita, and Tiejun Zhao. 2017. Neural machine translation with source dependency representation. In Proceedings of the EMNLP.Google ScholarGoogle ScholarCross RefCross Ref
  7. Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, and Tiejun Zhao. 2018. Syntax-directed attention for neural machine translation. In Proceedings of the AAAI.Google ScholarGoogle Scholar
  8. Yong Cheng, Shiqi Shen, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. 2016. Agreement-based joint training for bidirectional attention-based neural machine translation. In Proceedings of the IJCAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the EMNLP.Google ScholarGoogle ScholarCross RefCross Ref
  10. Greg Coppola Chris Alberti, David Weiss and Slav Petrov. 2015. Improved transition-based parsing and tagging with neural networks. In Proceedings of the EMNLP.Google ScholarGoogle Scholar
  11. Trevor Cohn, Cong Duy Vu Hoang, Ekaterina Vymolova, Kaisheng Yao, Chris Dyer, and Gholamreza Haffari. 2016. Incorporating structural alignment biases into an attentional neural translation model. In Proceedings of the NAACL.Google ScholarGoogle ScholarCross RefCross Ref
  12. Fabien Cromieres. 2016. Kyoto-NMT: A neural machine translation implementation in chainer. In Proceedings of the COLING.Google ScholarGoogle Scholar
  13. David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov, Daniel Andor, Chris Alberti, and Michael Collins. 2016. Globally normalized transition-based neural networks. In Proceedings of the ACL.Google ScholarGoogle Scholar
  14. V. Demberg and F. Keller. 2008. Data from eye-tracking corpora as evidence for theories of syntactic processing complexity. Cognition 109, 2 (2008), 193.Google ScholarGoogle ScholarCross RefCross Ref
  15. Jimmy Ba Diederik P. Kingma. 2015. Adam: A method for stochastic optimization. In Proceedings of the ICLR.Google ScholarGoogle Scholar
  16. Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and Haifeng Wang. 2015. Multi-task learning for multiple language translation. In Proceedings of the ACL.Google ScholarGoogle ScholarCross RefCross Ref
  17. Akiko Eriguchi, Kazuma Hashimoto, and Yoshimasa Tsuruoka. 2016. Tree-to-sequence attentional neural machine translation. In Proceedings of the ACL.Google ScholarGoogle ScholarCross RefCross Ref
  18. Akiko Eriguchi, Yoshimasa Tsuruoka, and Kyunghyun Cho. 2017. Learning to parse and translate improves neural machine translation. In Proceedings of the ACL.Google ScholarGoogle ScholarCross RefCross Ref
  19. Shi Feng, Shujie Liu, Nan Yang, Mu Li, Ming Zhou, and Kenny Q. Zhu. 2016. Improving attention modeling with implicit distortion and fertility for machine translation. In Proceedings of the COLING.Google ScholarGoogle Scholar
  20. Mercedes García-Martínez, Loïc Barrault, and Fethi Bougares. 2016. Factored neural machine translation architectures. In Proceedings of the IWSLT.Google ScholarGoogle Scholar
  21. Mercedes García-Martínez, Loïc Barrault, and Fethi Bougares. 2017. Neural machine translation by generating multiple linguistic factors. In Arxiv:1712.01821v1.Google ScholarGoogle Scholar
  22. Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, and Richard Socher. 2017. A joint many-task model: Growing a neural network for multiple NLP tasks. In Proceedings of the EMNLP.Google ScholarGoogle ScholarCross RefCross Ref
  23. Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent continuous translation models. In Proceedings of the EMNLP.Google ScholarGoogle Scholar
  24. Junhui Li, Deyi Xiong, Zhaopeng Tu, Muhua Zhu, Min Zhang, and Guodong Zhou. 2017. Modeling source syntax for neural machine translation. In Proceedings of the ACL.Google ScholarGoogle ScholarCross RefCross Ref
  25. Lemao Liu, Masao Utiyama, Andrew Finch, and Eiichiro Sumita. 2016. Neural machine translation with supervised attention. In Proceedings of the COLING.Google ScholarGoogle Scholar
  26. Yang Liu and Maosong Sun. 2015. Contrastive unsupervised word alignment with non-local features. In Proceedings of the AAAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Minh-Thang Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, and Lukasz Kaiser. 2016. Multi-task sequence to sequence learning. In Proceedings of the ICLR.Google ScholarGoogle Scholar
  28. Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the EMNLP.Google ScholarGoogle Scholar
  29. Rico Sennrich, Tomasz Dwojak, Marcin Junczys-Dowmunt, Philipp Koehn, Maria Nadejde, Siva Reddy, and Alexandra Birch. 2017. Predicting target language CCG supertags improves neural machine translation. In Proceedings of the WMT.Google ScholarGoogle Scholar
  30. Graham Neubig, Yosuke Nakata, and Shinsuke Mori. 2011. Pointwise prediction for robust, adaptable japanese morphological analysis. In Proceedings of the ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Franz Joseph Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. In Proceedings of the ACL.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Rico Sennrich and Alexandra Birch. 2016. Linguistic input features improve neural machine translation. In Proceedings of the CMT.Google ScholarGoogle ScholarCross RefCross Ref
  33. Jinsong Su, Shan Wu, Deyi Xiong, Yaojie Lu, Xianpei Han, and Biao Zhang. 2018. Variational recurrent neural machine translation. In Proceedings of the AAAI 2018.Google ScholarGoogle Scholar
  34. Jinsong Su, Jiali Zeng, Deyi Xiong, Yang Liu, Mingxuan Wang, and Jun Xie. 2018. A hierarchy-to-sequence attentional neural machine translation model. IEEE/ACM Trans. Audio Speech Lang. Process. 26, 3 (2018), 623–632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the NIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, and Hang Li. 2016. Modeling coverage for neural machine translation. In Proceedings of the ACL.Google ScholarGoogle ScholarCross RefCross Ref
  37. Mingxuan Wang, Zhengdong Lu, Jie Zhou, and Qun Liu. 2017. Deep neural machine translation with linear associative unit. In Proceedings of the ACL.Google ScholarGoogle ScholarCross RefCross Ref
  38. Shaonan Wang, Jiajun Zhang, and Chengqing Zong. 2017. Learning sentence representation with guidance of human attention. In Proceedings of the IJCAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Shuangzhi Wu, Dongdong Zhang, Nan Yang, Mu Li, and Ming Zhou. 2017. Sequence-to-dependency neural machine translation. In Proceedings of the ACL.Google ScholarGoogle ScholarCross RefCross Ref
  40. Shuangzhi Wu, Ming Zhou, and Dongdong Zhang. 2017. Improved neural machine translation with source syntax. In Proceedings of the IJCAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Biao Zhang, Deyi Xiong, Jinsong Su, Hong Duan, and Min Zhang. 2016. Variational neural machine translation. In Proceedings of the EMNLP 2016.Google ScholarGoogle ScholarCross RefCross Ref
  42. Jinchao Zhang, Mingxuan Wang, Qun Liu, and Jie Zhou. 2017. Incorporating word reordering knowledge into attention-based neural machine translation. In Proceedings of the ACL.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. POS Tag-enhanced Coarse-to-fine Attention for Neural Machine Translation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Asian and Low-Resource Language Information Processing
          ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 18, Issue 4
          December 2019
          305 pages
          ISSN:2375-4699
          EISSN:2375-4702
          DOI:10.1145/3327969
          Issue’s Table of Contents

          Copyright © 2019 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 22 April 2019
          • Revised: 1 February 2019
          • Accepted: 1 February 2019
          • Received: 1 June 2018
          Published in tallip Volume 18, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • short-paper
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!