skip to main content
research-article

Modeling Monolingual Character Alignment for Automatic Evaluation of Chinese Translation

Authors Info & Claims
Published:28 January 2016Publication History
Skip Abstract Section

Abstract

Automatic evaluation of machine translations is an important task. Most existing evaluation metrics rely on matching the same word or letter n-grams. This strategy leads to poor results on Chinese translations because one has to rely merely on matching identical characters. In this article, we propose a new evaluation metric that allows different characters with the same or similar meaning to match. An Indirect Hidden Markov Model (IHMM) is proposed to align the Chinese translation with human references at the character level. In the model, the emission probabilities are estimated by character similarity, including character semantic similarity and character surface similarity, and transition probabilities are estimated by a heuristic distance-based distortion model. When evaluating the submitted output of English-to-Chinese translation systems in the IWSLT’08 CT-EC and NIST’08 EC tasks, the experimental results indicate that the proposed metric has a significantly better correlation with human evaluation than the state-of-the-art machine translation metrics (i.e., BLEU, Meteor Universal, and TESLA-CELAB). This study shows that it is important to allow different characters to match in the evaluation of Chinese translations and that the IHMM is a reasonable approach for the alignment of Chinese characters.

References

  1. S. Banerjee and A. Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Association for Computational Linguistics, Ann Arbor, Michigan, 65--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. F. Brown, J. Cocke, S. A. D. Pietra, V. J. D. Pietra, F. Jelinek, J. D. Lafferty, R. L. Mercer, and P. S. Roossin. 1990. A statistical approach to machine translation. Computational Linguistics 16, 2, 79--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. F. Brown, S. A. D. Pietra, V. J. D. Pietra, and R. L. Mercer. 1993. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19, 2, 263--311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. O. Bojar, C. Buck, C. Callison-Burch, C. Federmann, B. Haddow, P. Koehn, C. Monz, M. Post, R. Soricut, and L. Specia. 2013. Findings of the 2013 workshop on statistical machine translation. In Proceedings of the 8th Workshop on Statistical Machine Translation (WMT). Association for Computational Linguistics, Sofia, Bulgaria, 1--44.Google ScholarGoogle Scholar
  5. O. Bojar, C. Buck, C. Federmann, B. Haddow, P. Koehn, J. Leveling, C. Monz, P. Pecina, M. Post, H. Saint-Amand, R. Soricut, L. Specia, and A. V. S. Tamchyna. 2014. Findings of the 2014 workshop on statistical machine translation. In Proceedings of the 9th Workshop on Statistical Machine Translation (WMT). Association for Computational Linguistics, Baltimore, USA, 12--58.Google ScholarGoogle Scholar
  6. C. Callison-Burch, P. Koehn, C. Monz, M. Post, R. Soricut, and L. Specia. 2012. Findings of the 2012 workshop on statistical machine translation. In Proceedings of the 7th Workshop on Statistical Machine Translation (WMT). Association for Computational Linguistics, Montreal, Quebec, Canada, 10--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. S. Chan and H. T. Ng. 2008. MAXSIM: A maximum similarity metric for machine translation evaluation. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, Columbus, Ohio, 55--62.Google ScholarGoogle Scholar
  8. B. Chen, R. Kuhn, and S. Larkin. 2012. PORT: A precision-order-recall MT evaluation metric for tuning. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, Jeju Island, Korea, 930--939. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Denkowski and A. Lavie. 2014. Meteor universal: Language specific translation evaluation for any target language. In Proceedings of the 9th Workshop on Statistical Machine Translation (WMT). Association for Computational Linguistics, Baltimore, USA, 376--380.Google ScholarGoogle Scholar
  10. Y. Ding and M. Palmer. 2005. Machine translation using probabilistic synchronous dependency insertion grammars. In Proceedings of the 43rd Annual Meeting of the Association of Computational Linguistics (ACL). Association for Computational Linguistics, Ann Arbor, Michigan, 541--548. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Doddington. 2002. Automatic evaluation of machine translation quality using N-gram co-occurrence statistics. In Proceedings of the 2nd International Conference on Human Language Technology Research (HLT). Association for Computational Linguistics, San Diego, California, USA, 138--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. X. He, M. Yang, J. Gao, P. Nguyen, and R. Moore. 2008. Indirect-HMM-based hypothesis alignment for combining outputs from machine translation systems. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Honolulu, Hawaii, 98--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. He, J. Zhang, M. Li, L. Fang, Y. Chen, Y. Zhou, and C. Zong. 2008. The CASIA statistical machine translation system for IWSLT’2008. In Proceedings of the International Workshop on Spoken Language Translation (IWSLT). Association for Computational Linguistics, Hawaii, USA, 85--91.Google ScholarGoogle Scholar
  14. M. Fishel, O. Bojar, D. Zeman, and J. Berka. 2011. Automatic translation error analysis. In Proceedings of the 14th International Conference on Text, Speech and Dialogue (TSD). Springer-Verlag Berlin Heidelberg, Pilsen, Czech Republic, 72--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Galley, J. Graehl, K. Knight, D. Marcu, S. Deneefe, W. Wang, and I. Thayer. 2006. Scalable inference and training of context-rich syntactic translation models. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (ACL/COLING). Association for Computational Linguistics, Sydney, Australia, 961--968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Hopkins and J. May. 2013. Models of translation competitions. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, Sofia, Bulgaria, 1416--1424.Google ScholarGoogle Scholar
  17. B. Jones, J. Andreas, D. Bauer, K. M. Hermann, and K. Knight. 2012. Semantics-based machine translation with hyperedge replacement grammars. In Proceedings of the 24th International Conference on Computational Linguistics (COLING). The COLING 2012 Organizing Committee, Mumbai, India, 1359--1376.Google ScholarGoogle Scholar
  18. P. Koehn. 2012. Simulating human judgment in machine translation evaluation campaigns. In Proceedings of the International Workshop on Spoken Language Translation (IWSLT). Association for Computational Linguistics, Hong Kong, 179--183.Google ScholarGoogle Scholar
  19. P. Koehn, F. J. Och, and D. Marcu. 2003. Statistical phrase-based translation. In Proceedings of the Human Language Technology Conference and the North American Association for Computational Linguistics (HLT-NAACL). Association for Computational Linguistics, Sapporo, Japan, 127--133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Li, C. Zong, and H. T. Ng. 2011. Automatic evaluation of Chinese translation output: Word-level or character-level? In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL/HLT). Association for Computational Linguistics, Portland, Oregon, USA, 159--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Liu and H. T. Ng. 2012. Character-level machine translation evaluation for languages with ambiguous word boundaries. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, Jeju Island, Korea, 921--929. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Liu, Q. Liu, and S. Lin. 2006. Tree-to-string alignment template for statistical machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of Association for Computational Linguistics (COLING/ACL). Association for Computational Linguistics, Sydney, Australia, 609--616. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Liu, D. Dahlmeier, and H. T. Ng. 2010. TESLA: Translation evaluation of sentences with linear-programming-based analysis. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR. Association for Computational Linguistics, Uppsala, Sweden, 354--359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Machacek and O. Bojar. 2014. Results of the WMT14 metrics shared task. In Proceedings of the 9th Workshop on Statistical Machine Translation (WMT). Association for Computational Linguistics, Baltimore, USA, 293--301.Google ScholarGoogle Scholar
  25. F. J. Och and H. Ney. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL). Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 295--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL). Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 311--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Paul. 2008. Overview of the IWSLT’2008 evaluation campaign. In Proceedings of IWSLT ’2008. Association for Computational Linguistics, Hawaii, USA, 1--17.Google ScholarGoogle Scholar
  28. K. Sakaguchi, M. Post, and B. Van Durme. 2014. Efficient elicitation of annotations for human evaluation of machine translation. In Proceedings of the 9th Workshop on Statistical Machine Translation (WMT). Association for Computational Linguistics, Baltimore, USA, 1--11.Google ScholarGoogle Scholar
  29. M. Snover, B. Dorr, R. Schwartz, J. Makhoul, L. Micciulla, and R. Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of Association for Machine Translation in the Americas (AMTA). Association for Machine Translation in the Americas, Boston Marriott, Cambridge, Massachusetts, USA, 223--231.Google ScholarGoogle Scholar
  30. S. Vogel, H. Ney, and C. Tillmann. 1996. HMM-Based word alignment in statistical translation. In Proceedings of the International Conference on Computational Linguistics (COLING). Association for Computer Linguistics, Copenhagen, Denmark, 836--841. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. K. Wang, C. Zong, and K.-Y. Su. 2010. A character-based joint model for chinese word segmentation. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING). Coling 2010 Organizing Committee, Beijing, China, 1173--1181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. T. Xiao and J. Zhu. 2013. Unsupervised sub-tree alignment for tree-to-tree translation. Journal of Artificial Intelligence Research (JAIR) 48(2013), 733--782. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. Xiong, Q. Liu, and S. Lin, 2006. Maximum entropy based phrase reordering model for statistical machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL). Association for Computational Linguistics, Sydney, Australia, 521--528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. D. Xiong and M. Zhang. 2014. A sense-based translation model for statistical machine translation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, Baltimore, Maryland, 1459--1469.Google ScholarGoogle Scholar
  35. F. Zhai, J. Zhang, Y. Zhou, and C. Zong. 2013. Unsupervised tree induction for tree-based translation. Transactions of Association for Computational Linguistics (TACL) 1(2013), 243--254.Google ScholarGoogle ScholarCross RefCross Ref
  36. J. Zhang, S. Liu, M. Li, M. Zhou, and C. Zong. 2014. Mind the gap: machine translation by minimizing the semantic gap in embedding space. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI). Association for the Advancement of Artificial Intelligence, Québec, Canada, 1657--1663.Google ScholarGoogle Scholar

Index Terms

  1. Modeling Monolingual Character Alignment for Automatic Evaluation of Chinese Translation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!