skip to main content
research-article

Inter-, Intra-, and Extra-Chunk Pre-Ordering for Statistical Japanese-to-English Machine Translation

Published:09 January 2016Publication History
Skip Abstract Section

Abstract

A rule-based pre-ordering approach is proposed for statistical Japanese-to-English machine translation using the dependency structure of source-side sentences. A Japanese sentence is pre-ordered to an English-like order at the morpheme level for a statistical machine translation system during the training and decoding phase to resolve the reordering problem. In this article, extra-chunk pre-ordering of morphemes is proposed, which allows Japanese functional morphemes to move across chunk boundaries. This contrasts with the intra-chunk reordering used in previous approaches, which restricts the reordering of morphemes within a chunk. Linguistically oriented discussions show that correct pre-ordering cannot be realized without extra-chunk movement of morphemes. The proposed approach is compared with five rule-based pre-ordering approaches designed for Japanese-to-English translation and with a language independent statistical pre-ordering approach on a standard patent dataset and on a news dataset obtained by crawling Internet news sites. Two state-of-the-art statistical machine translation systems, one phrase-based and the other hierarchical phrase-based, are used in experiments. Experimental results show that the proposed approach outperforms the compared approaches on automatic reordering measures (Kendall’s τ, Spearman’s ρ, fuzzy reordering score, and test set RIBES) and on the automatic translation precision measure of test set BLEU score.

References

  1. Yaser Al-Onaizan and Kishore Papineni. 2006. Distortion models for statistical machine translation. In Proceedings of COLING-ACL. 529--536. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics 33, 2 (2007), 201--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chenchen Ding, Takashi Inui, and Mikio Yamamoto. 2011. Long-distance hierarchical structure transformation rules utilizing function words. In Proceedings of IWSLT. 159--166.Google ScholarGoogle Scholar
  4. Chenchen Ding, Ye Kyaw Thu, Masao Utiyama, Andrew Finch, and Eiichiro Sumita. 2014. Empirical dependency-based head finalization for statistical Chinese-, English-, and French-to-Myanmar (Burmese) machine translation. In Proceedings of IWSLT. 184--191.Google ScholarGoogle Scholar
  5. Daniel Flannery, Yusuke Miayo, Graham Neubig, and Shinsuke Mori. 2011. Training dependency parsers from partially annotated corpora. In Proceedings of IJCNLP. 776--784.Google ScholarGoogle Scholar
  6. Atsushi Fujii, Masao Utiyama, Mikio Yamamoto, Takehito Utsuro, Terumasa Ehara, Hiroshi Echizen-ya, and Sayori Shimohata. 2008. Overview of the patent translation task at the NTCIR-7 workshop. In Proceedings of NTCIR. 389--400.Google ScholarGoogle Scholar
  7. Michel Galley and Christopher D. Manning. 2008. A simple and effective hierarchical phrase reordering model. In Proceedings of EMNLP. 848--856. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Dmitriy Genzel. 2010. Automatically learning source-side reordering rules for large scale machine translation. In Proceedings of COLING. 376--384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Sho Hoshino, Yusuke Miyao, Katsuhito Sudoh, and Masaaki Nagata. 2013. Two-stage pre-ordering for Japanese-to-English statistical machine translation. In Proceedings of IJCNLP. 1062--1066.Google ScholarGoogle Scholar
  10. Ryu Iida and Massimo Poesio. 2011. A cross-lingual ILP solution to zero anaphora resolution. In Proceedings of ACL. 804--813. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hideki Isozaki, Tsutomu Hirao, Kevin Duh, Katsuhito Sudoh, and Hajime Tsukada. 2010. Automatic evaluation of translation quality for distant language pairs. In Proceedings of EMNLP. 944--952. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hideki Isozaki, Katsuhito Sudoh, Hajime Tsukada, and Kevin Duh. 2012. HPSG-based preprocessing for English-to-Japanese translation. ACM Transactions on Asian Language Information Processing 11, 3 (2012), 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jason Katz-Brown and Michael Collins. 2008. Syntactic reordering in preprocessing for Japanese → English translation: MIT system description for NTCIR-7 patent translation task. In Proceedings of NTCIR. 409--414.Google ScholarGoogle Scholar
  14. Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of EMNLP. 388--395.Google ScholarGoogle Scholar
  15. Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In Proceedings of MT Summit. 79--86.Google ScholarGoogle Scholar
  16. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of ACL. 177--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of HLT-NAACL. 48--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mamoru Komachi, Yuji Matsumoto, and Masaaki Nagata. 2006. Phrase reordering for statistical machine translation based on predicate-argument structure. In Proceedings of IWSLT. 77--82.Google ScholarGoogle Scholar
  19. Taku Kudo and Yuji Matsumoto. 2002. Japanese dependency analysis using cascaded chunking. In Proceedings of CoNLL. 63--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Uri Lerner and Slav Petrov. 2013. Source-side classifier preordering for machine translation. In Proceedings of EMNLP. 376--384.Google ScholarGoogle Scholar
  21. Yang Liu, Qun Liu, and Shouxun Lin. 2006. Tree-to-string alignment template for statistical machine translation. In Proceedings of COLING-ACL. 609--616. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jirí Navrátil, Karthik Visweswariah, and Ananthakrishnan Ramanathan. 2012. A comparison of syntactic reordering methods for English-German machine translation. In Proceedings of COLING. 2043--2058.Google ScholarGoogle Scholar
  23. Graham Neubig, Taro Watanabe, and Shinsuke Mori. 2012. Inducing a discriminative parser to optimize machine translation reordering. In Proceedings of EMNLP-CoNLL. 843--853. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of ACL. 160--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29, 1 (2003), 19--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of ACL. 311--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Chris Quirk, Arul Menezes, and Colin Cherry. 2005. Dependency treelet translation: Syntactically informed phrasal SMT. In Proceedings of ACL. 271--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Andreas Stolcke. 2002. SRILM—An extensible language modeling toolkit. In Proceedings of ICSLP 2002. 901--904.Google ScholarGoogle Scholar
  29. Katsuhito Sudoh, Kevin Duh, Hajime Tsukada, Masaaki Nagata, Xianchao Wu, Takuya Matsuzaki, and Jun’ichi Tsujii. 2011. NTT-UT statistical machine translation in NTCIR-9 PatentMT. In Proceedings of NTCIR. 585--592.Google ScholarGoogle Scholar
  30. Katsuhito Sudoh, Xianchao Wu, Kevin Duh, Hajime Tsukada, and Masaaki Nagata. 2013. Syntax-based post-ordering for efficient Japanese-to-English Translation. ACM Transactions on Asian Language Information Processing 12, 3 (2013), 12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. David Talbot, Hideto Kazawa, Hiroshi Ichikawa, Jason Katz-Brown, Masakazu Seno, and Franz J. Och. 2011. A lightweight evaluation framework for machine translation reordering. In Proceedings of the Workshop on SMT. 12--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Takaaki Tanaka and Masaaki Nagata. 2013. Constructing a practical constituent parser from a Japanese treebank with function labels. In Proceedings of SPMRL. 108--118.Google ScholarGoogle Scholar
  33. Christoph Tillmann. 2004. A unigram orientation model for statistical machine translation. In Proceedings of HLT-NAACL. 101--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Masao Utiyama and Hitoshi Isahara. 2007. A Japanese-English patent parallel corpus. In Proceedings of MT Summit. 475--482.Google ScholarGoogle Scholar
  35. Xianchao Wu, Katsuhito Sudoh, Kevin Duh, Hajime Tsukada, and Masaaki Nagata. 2011. Extracting pre-ordering rules from predicate-argument structures. In Proceedings of IJCNLP. 29--37.Google ScholarGoogle Scholar
  36. Deyi Xiong, Qun Liu, and Shouxun Lin. 2006. Maximum entropy based phrase reordering model for statistical machine translation. In Proceedings of ACL. 521--528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Deyi Xiong, Min Zhang, and Haizhou Li. 2012. Modeling the translation of predicate-argument structure for SMT. In Proceedings of ACL. 902--911. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Peng Xu, Jaeho Kang, Michael Ringgaard, and Franz Och. 2009. Using a dependency parser to improve SMT for subject-object-verb languages. In Proceedings of HLT-NAACL. 245--253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Nan Yang, Mu Li, Dongdong Zhang, and Nenghai Yu. 2012. A ranking-based approach to word reordering for statistical machine translation. In Proceedings of ACL, Vol. 1. 912--920. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Inter-, Intra-, and Extra-Chunk Pre-Ordering for Statistical Japanese-to-English Machine Translation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 15, Issue 3
      March 2016
      220 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/2876004
      Issue’s Table of Contents

      Copyright © 2016 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 January 2016
      • Revised: 1 August 2015
      • Accepted: 1 August 2015
      • Received: 1 June 2014
      Published in tallip Volume 15, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!