skip to main content
research-article

Syntax-Based Chinese-Vietnamese Tree-to-Tree Statistical Machine Translation with Bilingual Features

Authors Info & Claims
Published:31 May 2019Publication History
Skip Abstract Section

Abstract

Because of the scarcity of bilingual corpora, current Chinese--Vietnamese machine translation is far from satisfactory. Considering the differences between Chinese and Vietnamese, we investigate whether linguistic differences can be used to supervise machine translation and propose a method of syntax-based Chinese--Vietnamese tree-to-tree statistical machine translation with bilingual features. Analyzing the syntax differences between Chinese and Vietnamese, we define some linguistic difference-based rules, such as attributive position, time adverbial position, and locative adverbial position, and create rewards for similar rules. These rewards are integrated into the extraction of tree-to-tree translation rules, and we optimize the pruning of the search space during the decoding phase. The experiments on Chinese--Vietnamese bilingual sentence translation show that the proposed method performs better than several compared methods. Further, the results show that syntactic difference features, with search pruning, can improve the accuracy of machine translation without degrading the efficiency.

References

  1. Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent continuous translation models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1700--1709.Google ScholarGoogle Scholar
  2. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems Conference. 3104--3112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. Retrieved from arXiv preprint arXiv:1409.0473.Google ScholarGoogle Scholar
  4. Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2017. Unsupervised machine translation using monolingual corpora only. Retrieved from arXiv preprint arXiv:1711.00043.Google ScholarGoogle Scholar
  5. Jinsong Su, Jiali Zeng, Deyi Xiong, Yang Liu, Mingxuan Wang, and Jun Xie. 2018. A hierarchy-to-sequence attentional neural machine translation model. IEEE/ACM Trans. Aud., Speech Lang. Proc. 26, 3 (2018), 623--632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Yun Chen, Yang Liu, and Victor O. K. Li. 2018. Zero-resource neural machine translation with multi-agent communication game. Retrieved from arXiv preprint arXiv:1802.03116.Google ScholarGoogle Scholar
  7. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. Retrieved from arXiv preprint arXiv:1609.08144.Google ScholarGoogle Scholar
  8. David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43rd Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 263--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang, and Ignacio Thayer. 2006. Scalable inference and training of context-rich syntactic translation models. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 961--968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jiajun Zhang, Feifei Zhai, and Chengqing Zong. 2011. Augmenting string-to-tree translation models with fuzzy use of source-side syntax. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 204--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Yang Liu, Qun Liu, and Shouxun Lin. 2006. Tree-to-string alignment template for statistical machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 609--616. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Graham Neubig and Kevin Duh. 2014. On the elements of an accurate tree-to-string machine translation system. In Proceedings of the 52nd Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 143--149.Google ScholarGoogle ScholarCross RefCross Ref
  13. Min Zhang, Hongfei Jiang, Aiti Aw, Haizhou Li, Chew Lim Tan, and Sheng Li. 2008. A tree sequence alignment-based tree-to-tree translation model. In Proceedings of the Meeting of the Association for Computational Linguistics: Human Language Technologies. 559--567.Google ScholarGoogle Scholar
  14. Tong Xiao and Jingbo Zhu. 2013. Unsupervised sub-tree alignment for tree-to-tree translation. J. Artific. Intell. Res. 48 (2013), 733--782. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Tong Xiao, Jingbo Zhu, Hao Zhang, and Qiang Li. 2012. NiuTrans: An open source toolkit for phrase-based and syntax-based machine translation. In Proceedings of the ACL 2012 System Demonstrations. Association for Computational Linguistics, 19--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Tong Xiao. 2012. On Learning and Decoding Approaches to Tree-to-Tree Statistical Machine Translation. Ph.D. Dissertation. Northeastern University, Shenyang, China.Google ScholarGoogle Scholar
  17. Phuoc Tran, Dien Dinh, and Hien T. Nguyen. 2016. A character level based and word level based approach for Chinese-Vietnamese machine translation. Computat. Intell. Neurosci. 2016 (2016), 9821608. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Phuoc Tran, Dien Dinh, and Linh Tran. 2014. Resolving named entity unknown word in Chinese-Vietnamese machine translation. Adv. Intell. Syst. Comput. 245 (2014), 273--284.Google ScholarGoogle Scholar
  19. Phuoc Tran, Dien Dinh, Tan Le, and Thao Nguyen. 2013. Handling organization name unknown word in Chinese-Vietnamese machine translation. In Proceedings of the IEEE-RIVF International Conference on Computing 8 Communication Technologies.Google ScholarGoogle Scholar
  20. Jianyalin He, Zhengtao Yu, Changtao Lv, Hua Lai, Shengxiang Gao, and Yang Zhang. 2017. Language post positioned characteristic based Chinese-Vietnamese statistical machine translation method. In Proceedings of the International Conference on Asian Language Processing. IEEE, 180--184.Google ScholarGoogle ScholarCross RefCross Ref
  21. Hai Zhao, Tianjiao Yin, and Jingyi Zhang. 2013. Vietnamese to Chinese machine translation via Chinese character as pivot. In Proceedings of the 27th Pacific Asia Conference on Language, Information, and Computation. 250--259.Google ScholarGoogle Scholar
  22. Vu Thi Ha. 2005. A comparison between Vietnamese and Chinese syntactic constituent orders. J. Yunnan Norm. Univ. 3, 6 (2005), 65--68.Google ScholarGoogle Scholar
  23. Haitao Mi, Huang Liang, and Qun Liu. 2008. Forest-based translation. In Proceedings of the Meeting of the Association for Computational Linguistics: Human Language Technologies. 192--199.Google ScholarGoogle Scholar
  24. Hao Zhang, Licheng Fang, Peng Xu, and Xiaoyun Wu. 2011. Binarized forest to string translation. In Proceedings of the 49th Meeting of the Association for Computational Linguistics: Human Language Technologies—Volume 1. Association for Computational Linguistics, 835--845. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Andreas Zollmann and Ashish Venugopal. 2006. Syntax augmented machine translation via chart parsing. In Proceedings of the Workshop on Statistical Machine Translation. Association for Computational Linguistics, 138--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Zhongqiang Huang, Martin Čmejrek, and Bowen Zhou. 2010. Soft syntactic constraints for hierarchical phrase-based translation using latent syntactic distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 138--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Daniel Marcu, Wei Wang, Abdessamad Echihabi, and Kevin Knight. 2006. SPMT: Statistical machine translation with syntactified target language phrases. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 44--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Meeting on Association for Computational Linguistics—Volume 1. Association for Computational Linguistics, 160--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ying Li, Jianyi Guo, Zhengtao Yu, Yantuan Xian, and Yonghua Wen. 2016. Building the Vietnamese phrase treebank by improved probabilistic context-free grammars. In Proceedings of the China Workshop on Machine Translation. Springer, 75--90.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Syntax-Based Chinese-Vietnamese Tree-to-Tree Statistical Machine Translation with Bilingual Features

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!