Abstract
Because of the scarcity of bilingual corpora, current Chinese--Vietnamese machine translation is far from satisfactory. Considering the differences between Chinese and Vietnamese, we investigate whether linguistic differences can be used to supervise machine translation and propose a method of syntax-based Chinese--Vietnamese tree-to-tree statistical machine translation with bilingual features. Analyzing the syntax differences between Chinese and Vietnamese, we define some linguistic difference-based rules, such as attributive position, time adverbial position, and locative adverbial position, and create rewards for similar rules. These rewards are integrated into the extraction of tree-to-tree translation rules, and we optimize the pruning of the search space during the decoding phase. The experiments on Chinese--Vietnamese bilingual sentence translation show that the proposed method performs better than several compared methods. Further, the results show that syntactic difference features, with search pruning, can improve the accuracy of machine translation without degrading the efficiency.
- Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent continuous translation models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1700--1709.Google Scholar
- Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems Conference. 3104--3112. Google Scholar
Digital Library
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. Retrieved from arXiv preprint arXiv:1409.0473.Google Scholar
- Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2017. Unsupervised machine translation using monolingual corpora only. Retrieved from arXiv preprint arXiv:1711.00043.Google Scholar
- Jinsong Su, Jiali Zeng, Deyi Xiong, Yang Liu, Mingxuan Wang, and Jun Xie. 2018. A hierarchy-to-sequence attentional neural machine translation model. IEEE/ACM Trans. Aud., Speech Lang. Proc. 26, 3 (2018), 623--632. Google Scholar
Digital Library
- Yun Chen, Yang Liu, and Victor O. K. Li. 2018. Zero-resource neural machine translation with multi-agent communication game. Retrieved from arXiv preprint arXiv:1802.03116.Google Scholar
- Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. Retrieved from arXiv preprint arXiv:1609.08144.Google Scholar
- David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43rd Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 263--270. Google Scholar
Digital Library
- Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang, and Ignacio Thayer. 2006. Scalable inference and training of context-rich syntactic translation models. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 961--968. Google Scholar
Digital Library
- Jiajun Zhang, Feifei Zhai, and Chengqing Zong. 2011. Augmenting string-to-tree translation models with fuzzy use of source-side syntax. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 204--215. Google Scholar
Digital Library
- Yang Liu, Qun Liu, and Shouxun Lin. 2006. Tree-to-string alignment template for statistical machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 609--616. Google Scholar
Digital Library
- Graham Neubig and Kevin Duh. 2014. On the elements of an accurate tree-to-string machine translation system. In Proceedings of the 52nd Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 143--149.Google Scholar
Cross Ref
- Min Zhang, Hongfei Jiang, Aiti Aw, Haizhou Li, Chew Lim Tan, and Sheng Li. 2008. A tree sequence alignment-based tree-to-tree translation model. In Proceedings of the Meeting of the Association for Computational Linguistics: Human Language Technologies. 559--567.Google Scholar
- Tong Xiao and Jingbo Zhu. 2013. Unsupervised sub-tree alignment for tree-to-tree translation. J. Artific. Intell. Res. 48 (2013), 733--782. Google Scholar
Digital Library
- Tong Xiao, Jingbo Zhu, Hao Zhang, and Qiang Li. 2012. NiuTrans: An open source toolkit for phrase-based and syntax-based machine translation. In Proceedings of the ACL 2012 System Demonstrations. Association for Computational Linguistics, 19--24. Google Scholar
Digital Library
- Tong Xiao. 2012. On Learning and Decoding Approaches to Tree-to-Tree Statistical Machine Translation. Ph.D. Dissertation. Northeastern University, Shenyang, China.Google Scholar
- Phuoc Tran, Dien Dinh, and Hien T. Nguyen. 2016. A character level based and word level based approach for Chinese-Vietnamese machine translation. Computat. Intell. Neurosci. 2016 (2016), 9821608. Google Scholar
Digital Library
- Phuoc Tran, Dien Dinh, and Linh Tran. 2014. Resolving named entity unknown word in Chinese-Vietnamese machine translation. Adv. Intell. Syst. Comput. 245 (2014), 273--284.Google Scholar
- Phuoc Tran, Dien Dinh, Tan Le, and Thao Nguyen. 2013. Handling organization name unknown word in Chinese-Vietnamese machine translation. In Proceedings of the IEEE-RIVF International Conference on Computing 8 Communication Technologies.Google Scholar
- Jianyalin He, Zhengtao Yu, Changtao Lv, Hua Lai, Shengxiang Gao, and Yang Zhang. 2017. Language post positioned characteristic based Chinese-Vietnamese statistical machine translation method. In Proceedings of the International Conference on Asian Language Processing. IEEE, 180--184.Google Scholar
Cross Ref
- Hai Zhao, Tianjiao Yin, and Jingyi Zhang. 2013. Vietnamese to Chinese machine translation via Chinese character as pivot. In Proceedings of the 27th Pacific Asia Conference on Language, Information, and Computation. 250--259.Google Scholar
- Vu Thi Ha. 2005. A comparison between Vietnamese and Chinese syntactic constituent orders. J. Yunnan Norm. Univ. 3, 6 (2005), 65--68.Google Scholar
- Haitao Mi, Huang Liang, and Qun Liu. 2008. Forest-based translation. In Proceedings of the Meeting of the Association for Computational Linguistics: Human Language Technologies. 192--199.Google Scholar
- Hao Zhang, Licheng Fang, Peng Xu, and Xiaoyun Wu. 2011. Binarized forest to string translation. In Proceedings of the 49th Meeting of the Association for Computational Linguistics: Human Language Technologies—Volume 1. Association for Computational Linguistics, 835--845. Google Scholar
Digital Library
- Andreas Zollmann and Ashish Venugopal. 2006. Syntax augmented machine translation via chart parsing. In Proceedings of the Workshop on Statistical Machine Translation. Association for Computational Linguistics, 138--141. Google Scholar
Digital Library
- Zhongqiang Huang, Martin Čmejrek, and Bowen Zhou. 2010. Soft syntactic constraints for hierarchical phrase-based translation using latent syntactic distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 138--147. Google Scholar
Digital Library
- Daniel Marcu, Wei Wang, Abdessamad Echihabi, and Kevin Knight. 2006. SPMT: Statistical machine translation with syntactified target language phrases. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 44--52. Google Scholar
Digital Library
- Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Meeting on Association for Computational Linguistics—Volume 1. Association for Computational Linguistics, 160--167. Google Scholar
Digital Library
- Ying Li, Jianyi Guo, Zhengtao Yu, Yantuan Xian, and Yonghua Wen. 2016. Building the Vietnamese phrase treebank by improved probabilistic context-free grammars. In Proceedings of the China Workshop on Machine Translation. Springer, 75--90.Google Scholar
Cross Ref
Index Terms
Syntax-Based Chinese-Vietnamese Tree-to-Tree Statistical Machine Translation with Bilingual Features
Recommendations
A Syntactic-based Word Re-ordering for English-Vietnamese Statistical Machine Translation System
PRICAI '08: Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial IntelligenceIn machine translation, the re-ordering of word from source to target language is one of the major steps that affect mainly the performance of the system. Among many approaches for this type of problem, syntactic is an effective method for handling word-...
Syntax-based reordering for statistical machine translation
Abstract: In this paper, we develop an approach called syntax-based reordering (SBR) to handling the fundamental problem of word ordering for statistical machine translation (SMT). We propose to alleviate the word order challenge including morpho-...
Integrating source-language context into phrase-based statistical machine translation
The translation features typically used in Phrase-Based Statistical Machine Translation (PB-SMT) model dependencies between the source and target phrases, but not among the phrases in the source language themselves. A swathe of research has demonstrated ...






Comments