Abstract
A rule-based pre-ordering approach is proposed for statistical Japanese-to-English machine translation using the dependency structure of source-side sentences. A Japanese sentence is pre-ordered to an English-like order at the morpheme level for a statistical machine translation system during the training and decoding phase to resolve the reordering problem. In this article, extra-chunk pre-ordering of morphemes is proposed, which allows Japanese functional morphemes to move across chunk boundaries. This contrasts with the intra-chunk reordering used in previous approaches, which restricts the reordering of morphemes within a chunk. Linguistically oriented discussions show that correct pre-ordering cannot be realized without extra-chunk movement of morphemes. The proposed approach is compared with five rule-based pre-ordering approaches designed for Japanese-to-English translation and with a language independent statistical pre-ordering approach on a standard patent dataset and on a news dataset obtained by crawling Internet news sites. Two state-of-the-art statistical machine translation systems, one phrase-based and the other hierarchical phrase-based, are used in experiments. Experimental results show that the proposed approach outperforms the compared approaches on automatic reordering measures (Kendall’s τ, Spearman’s ρ, fuzzy reordering score, and test set RIBES) and on the automatic translation precision measure of test set BLEU score.
- Yaser Al-Onaizan and Kishore Papineni. 2006. Distortion models for statistical machine translation. In Proceedings of COLING-ACL. 529--536. Google Scholar
Digital Library
- David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics 33, 2 (2007), 201--228. Google Scholar
Digital Library
- Chenchen Ding, Takashi Inui, and Mikio Yamamoto. 2011. Long-distance hierarchical structure transformation rules utilizing function words. In Proceedings of IWSLT. 159--166.Google Scholar
- Chenchen Ding, Ye Kyaw Thu, Masao Utiyama, Andrew Finch, and Eiichiro Sumita. 2014. Empirical dependency-based head finalization for statistical Chinese-, English-, and French-to-Myanmar (Burmese) machine translation. In Proceedings of IWSLT. 184--191.Google Scholar
- Daniel Flannery, Yusuke Miayo, Graham Neubig, and Shinsuke Mori. 2011. Training dependency parsers from partially annotated corpora. In Proceedings of IJCNLP. 776--784.Google Scholar
- Atsushi Fujii, Masao Utiyama, Mikio Yamamoto, Takehito Utsuro, Terumasa Ehara, Hiroshi Echizen-ya, and Sayori Shimohata. 2008. Overview of the patent translation task at the NTCIR-7 workshop. In Proceedings of NTCIR. 389--400.Google Scholar
- Michel Galley and Christopher D. Manning. 2008. A simple and effective hierarchical phrase reordering model. In Proceedings of EMNLP. 848--856. Google Scholar
Digital Library
- Dmitriy Genzel. 2010. Automatically learning source-side reordering rules for large scale machine translation. In Proceedings of COLING. 376--384. Google Scholar
Digital Library
- Sho Hoshino, Yusuke Miyao, Katsuhito Sudoh, and Masaaki Nagata. 2013. Two-stage pre-ordering for Japanese-to-English statistical machine translation. In Proceedings of IJCNLP. 1062--1066.Google Scholar
- Ryu Iida and Massimo Poesio. 2011. A cross-lingual ILP solution to zero anaphora resolution. In Proceedings of ACL. 804--813. Google Scholar
Digital Library
- Hideki Isozaki, Tsutomu Hirao, Kevin Duh, Katsuhito Sudoh, and Hajime Tsukada. 2010. Automatic evaluation of translation quality for distant language pairs. In Proceedings of EMNLP. 944--952. Google Scholar
Digital Library
- Hideki Isozaki, Katsuhito Sudoh, Hajime Tsukada, and Kevin Duh. 2012. HPSG-based preprocessing for English-to-Japanese translation. ACM Transactions on Asian Language Information Processing 11, 3 (2012), 8. Google Scholar
Digital Library
- Jason Katz-Brown and Michael Collins. 2008. Syntactic reordering in preprocessing for Japanese → English translation: MIT system description for NTCIR-7 patent translation task. In Proceedings of NTCIR. 409--414.Google Scholar
- Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of EMNLP. 388--395.Google Scholar
- Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In Proceedings of MT Summit. 79--86.Google Scholar
- Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of ACL. 177--180. Google Scholar
Digital Library
- Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of HLT-NAACL. 48--54. Google Scholar
Digital Library
- Mamoru Komachi, Yuji Matsumoto, and Masaaki Nagata. 2006. Phrase reordering for statistical machine translation based on predicate-argument structure. In Proceedings of IWSLT. 77--82.Google Scholar
- Taku Kudo and Yuji Matsumoto. 2002. Japanese dependency analysis using cascaded chunking. In Proceedings of CoNLL. 63--69. Google Scholar
Digital Library
- Uri Lerner and Slav Petrov. 2013. Source-side classifier preordering for machine translation. In Proceedings of EMNLP. 376--384.Google Scholar
- Yang Liu, Qun Liu, and Shouxun Lin. 2006. Tree-to-string alignment template for statistical machine translation. In Proceedings of COLING-ACL. 609--616. Google Scholar
Digital Library
- Jirí Navrátil, Karthik Visweswariah, and Ananthakrishnan Ramanathan. 2012. A comparison of syntactic reordering methods for English-German machine translation. In Proceedings of COLING. 2043--2058.Google Scholar
- Graham Neubig, Taro Watanabe, and Shinsuke Mori. 2012. Inducing a discriminative parser to optimize machine translation reordering. In Proceedings of EMNLP-CoNLL. 843--853. Google Scholar
Digital Library
- Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of ACL. 160--167. Google Scholar
Digital Library
- Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29, 1 (2003), 19--51. Google Scholar
Digital Library
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of ACL. 311--318. Google Scholar
Digital Library
- Chris Quirk, Arul Menezes, and Colin Cherry. 2005. Dependency treelet translation: Syntactically informed phrasal SMT. In Proceedings of ACL. 271--279. Google Scholar
Digital Library
- Andreas Stolcke. 2002. SRILM—An extensible language modeling toolkit. In Proceedings of ICSLP 2002. 901--904.Google Scholar
- Katsuhito Sudoh, Kevin Duh, Hajime Tsukada, Masaaki Nagata, Xianchao Wu, Takuya Matsuzaki, and Jun’ichi Tsujii. 2011. NTT-UT statistical machine translation in NTCIR-9 PatentMT. In Proceedings of NTCIR. 585--592.Google Scholar
- Katsuhito Sudoh, Xianchao Wu, Kevin Duh, Hajime Tsukada, and Masaaki Nagata. 2013. Syntax-based post-ordering for efficient Japanese-to-English Translation. ACM Transactions on Asian Language Information Processing 12, 3 (2013), 12. Google Scholar
Digital Library
- David Talbot, Hideto Kazawa, Hiroshi Ichikawa, Jason Katz-Brown, Masakazu Seno, and Franz J. Och. 2011. A lightweight evaluation framework for machine translation reordering. In Proceedings of the Workshop on SMT. 12--21. Google Scholar
Digital Library
- Takaaki Tanaka and Masaaki Nagata. 2013. Constructing a practical constituent parser from a Japanese treebank with function labels. In Proceedings of SPMRL. 108--118.Google Scholar
- Christoph Tillmann. 2004. A unigram orientation model for statistical machine translation. In Proceedings of HLT-NAACL. 101--104. Google Scholar
Digital Library
- Masao Utiyama and Hitoshi Isahara. 2007. A Japanese-English patent parallel corpus. In Proceedings of MT Summit. 475--482.Google Scholar
- Xianchao Wu, Katsuhito Sudoh, Kevin Duh, Hajime Tsukada, and Masaaki Nagata. 2011. Extracting pre-ordering rules from predicate-argument structures. In Proceedings of IJCNLP. 29--37.Google Scholar
- Deyi Xiong, Qun Liu, and Shouxun Lin. 2006. Maximum entropy based phrase reordering model for statistical machine translation. In Proceedings of ACL. 521--528. Google Scholar
Digital Library
- Deyi Xiong, Min Zhang, and Haizhou Li. 2012. Modeling the translation of predicate-argument structure for SMT. In Proceedings of ACL. 902--911. Google Scholar
Digital Library
- Peng Xu, Jaeho Kang, Michael Ringgaard, and Franz Och. 2009. Using a dependency parser to improve SMT for subject-object-verb languages. In Proceedings of HLT-NAACL. 245--253. Google Scholar
Digital Library
- Nan Yang, Mu Li, Dongdong Zhang, and Nenghai Yu. 2012. A ranking-based approach to word reordering for statistical machine translation. In Proceedings of ACL, Vol. 1. 912--920. Google Scholar
Digital Library
Index Terms
Inter-, Intra-, and Extra-Chunk Pre-Ordering for Statistical Japanese-to-English Machine Translation
Recommendations
Syntax-Based Post-Ordering for Efficient Japanese-to-English Translation
This article proposes a novel reordering method for efficient two-step Japanese-to-English statistical machine translation (SMT) that isolates reordering from SMT and solves it after lexical translation. This reordering problem, called post-ordering, is ...
HPSG-Based Preprocessing for English-to-Japanese Translation
Japanese sentences have completely different word orders from corresponding English sentences. Typical phrase-based statistical machine translation (SMT) systems such as Moses search for the best word permutation within a given distance limit (...
A Neural Network Classifier Based on Dependency Tree for English-Vietnamese Statistical Machine Translation
Computational Linguistics and Intelligent Text ProcessingAbstractReordering in MT is a major challenge when translating between languages with different of sentence structures. In Phrase-based statistical machine translation (PBSMT) systems, syntactic pre-ordering is a commonly used pre-processing technique. ...






Comments