Abstract
Automatic evaluation of machine translations is an important task. Most existing evaluation metrics rely on matching the same word or letter n-grams. This strategy leads to poor results on Chinese translations because one has to rely merely on matching identical characters. In this article, we propose a new evaluation metric that allows different characters with the same or similar meaning to match. An Indirect Hidden Markov Model (IHMM) is proposed to align the Chinese translation with human references at the character level. In the model, the emission probabilities are estimated by character similarity, including character semantic similarity and character surface similarity, and transition probabilities are estimated by a heuristic distance-based distortion model. When evaluating the submitted output of English-to-Chinese translation systems in the IWSLT’08 CT-EC and NIST’08 EC tasks, the experimental results indicate that the proposed metric has a significantly better correlation with human evaluation than the state-of-the-art machine translation metrics (i.e., BLEU, Meteor Universal, and TESLA-CELAB). This study shows that it is important to allow different characters to match in the evaluation of Chinese translations and that the IHMM is a reasonable approach for the alignment of Chinese characters.
- S. Banerjee and A. Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Association for Computational Linguistics, Ann Arbor, Michigan, 65--72. Google Scholar
Digital Library
- P. F. Brown, J. Cocke, S. A. D. Pietra, V. J. D. Pietra, F. Jelinek, J. D. Lafferty, R. L. Mercer, and P. S. Roossin. 1990. A statistical approach to machine translation. Computational Linguistics 16, 2, 79--85. Google Scholar
Digital Library
- P. F. Brown, S. A. D. Pietra, V. J. D. Pietra, and R. L. Mercer. 1993. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19, 2, 263--311. Google Scholar
Digital Library
- O. Bojar, C. Buck, C. Callison-Burch, C. Federmann, B. Haddow, P. Koehn, C. Monz, M. Post, R. Soricut, and L. Specia. 2013. Findings of the 2013 workshop on statistical machine translation. In Proceedings of the 8th Workshop on Statistical Machine Translation (WMT). Association for Computational Linguistics, Sofia, Bulgaria, 1--44.Google Scholar
- O. Bojar, C. Buck, C. Federmann, B. Haddow, P. Koehn, J. Leveling, C. Monz, P. Pecina, M. Post, H. Saint-Amand, R. Soricut, L. Specia, and A. V. S. Tamchyna. 2014. Findings of the 2014 workshop on statistical machine translation. In Proceedings of the 9th Workshop on Statistical Machine Translation (WMT). Association for Computational Linguistics, Baltimore, USA, 12--58.Google Scholar
- C. Callison-Burch, P. Koehn, C. Monz, M. Post, R. Soricut, and L. Specia. 2012. Findings of the 2012 workshop on statistical machine translation. In Proceedings of the 7th Workshop on Statistical Machine Translation (WMT). Association for Computational Linguistics, Montreal, Quebec, Canada, 10--51. Google Scholar
Digital Library
- Y. S. Chan and H. T. Ng. 2008. MAXSIM: A maximum similarity metric for machine translation evaluation. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, Columbus, Ohio, 55--62.Google Scholar
- B. Chen, R. Kuhn, and S. Larkin. 2012. PORT: A precision-order-recall MT evaluation metric for tuning. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, Jeju Island, Korea, 930--939. Google Scholar
Digital Library
- M. Denkowski and A. Lavie. 2014. Meteor universal: Language specific translation evaluation for any target language. In Proceedings of the 9th Workshop on Statistical Machine Translation (WMT). Association for Computational Linguistics, Baltimore, USA, 376--380.Google Scholar
- Y. Ding and M. Palmer. 2005. Machine translation using probabilistic synchronous dependency insertion grammars. In Proceedings of the 43rd Annual Meeting of the Association of Computational Linguistics (ACL). Association for Computational Linguistics, Ann Arbor, Michigan, 541--548. Google Scholar
Digital Library
- G. Doddington. 2002. Automatic evaluation of machine translation quality using N-gram co-occurrence statistics. In Proceedings of the 2nd International Conference on Human Language Technology Research (HLT). Association for Computational Linguistics, San Diego, California, USA, 138--145. Google Scholar
Digital Library
- X. He, M. Yang, J. Gao, P. Nguyen, and R. Moore. 2008. Indirect-HMM-based hypothesis alignment for combining outputs from machine translation systems. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Honolulu, Hawaii, 98--107. Google Scholar
Digital Library
- Y. He, J. Zhang, M. Li, L. Fang, Y. Chen, Y. Zhou, and C. Zong. 2008. The CASIA statistical machine translation system for IWSLT’2008. In Proceedings of the International Workshop on Spoken Language Translation (IWSLT). Association for Computational Linguistics, Hawaii, USA, 85--91.Google Scholar
- M. Fishel, O. Bojar, D. Zeman, and J. Berka. 2011. Automatic translation error analysis. In Proceedings of the 14th International Conference on Text, Speech and Dialogue (TSD). Springer-Verlag Berlin Heidelberg, Pilsen, Czech Republic, 72--79. Google Scholar
Digital Library
- M. Galley, J. Graehl, K. Knight, D. Marcu, S. Deneefe, W. Wang, and I. Thayer. 2006. Scalable inference and training of context-rich syntactic translation models. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (ACL/COLING). Association for Computational Linguistics, Sydney, Australia, 961--968. Google Scholar
Digital Library
- M. Hopkins and J. May. 2013. Models of translation competitions. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, Sofia, Bulgaria, 1416--1424.Google Scholar
- B. Jones, J. Andreas, D. Bauer, K. M. Hermann, and K. Knight. 2012. Semantics-based machine translation with hyperedge replacement grammars. In Proceedings of the 24th International Conference on Computational Linguistics (COLING). The COLING 2012 Organizing Committee, Mumbai, India, 1359--1376.Google Scholar
- P. Koehn. 2012. Simulating human judgment in machine translation evaluation campaigns. In Proceedings of the International Workshop on Spoken Language Translation (IWSLT). Association for Computational Linguistics, Hong Kong, 179--183.Google Scholar
- P. Koehn, F. J. Och, and D. Marcu. 2003. Statistical phrase-based translation. In Proceedings of the Human Language Technology Conference and the North American Association for Computational Linguistics (HLT-NAACL). Association for Computational Linguistics, Sapporo, Japan, 127--133. Google Scholar
Digital Library
- M. Li, C. Zong, and H. T. Ng. 2011. Automatic evaluation of Chinese translation output: Word-level or character-level? In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL/HLT). Association for Computational Linguistics, Portland, Oregon, USA, 159--164. Google Scholar
Digital Library
- C. Liu and H. T. Ng. 2012. Character-level machine translation evaluation for languages with ambiguous word boundaries. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, Jeju Island, Korea, 921--929. Google Scholar
Digital Library
- Y. Liu, Q. Liu, and S. Lin. 2006. Tree-to-string alignment template for statistical machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of Association for Computational Linguistics (COLING/ACL). Association for Computational Linguistics, Sydney, Australia, 609--616. Google Scholar
Digital Library
- C. Liu, D. Dahlmeier, and H. T. Ng. 2010. TESLA: Translation evaluation of sentences with linear-programming-based analysis. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR. Association for Computational Linguistics, Uppsala, Sweden, 354--359. Google Scholar
Digital Library
- M. Machacek and O. Bojar. 2014. Results of the WMT14 metrics shared task. In Proceedings of the 9th Workshop on Statistical Machine Translation (WMT). Association for Computational Linguistics, Baltimore, USA, 293--301.Google Scholar
- F. J. Och and H. Ney. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL). Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 295--302. Google Scholar
Digital Library
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL). Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 311--318. Google Scholar
Digital Library
- M. Paul. 2008. Overview of the IWSLT’2008 evaluation campaign. In Proceedings of IWSLT ’2008. Association for Computational Linguistics, Hawaii, USA, 1--17.Google Scholar
- K. Sakaguchi, M. Post, and B. Van Durme. 2014. Efficient elicitation of annotations for human evaluation of machine translation. In Proceedings of the 9th Workshop on Statistical Machine Translation (WMT). Association for Computational Linguistics, Baltimore, USA, 1--11.Google Scholar
- M. Snover, B. Dorr, R. Schwartz, J. Makhoul, L. Micciulla, and R. Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of Association for Machine Translation in the Americas (AMTA). Association for Machine Translation in the Americas, Boston Marriott, Cambridge, Massachusetts, USA, 223--231.Google Scholar
- S. Vogel, H. Ney, and C. Tillmann. 1996. HMM-Based word alignment in statistical translation. In Proceedings of the International Conference on Computational Linguistics (COLING). Association for Computer Linguistics, Copenhagen, Denmark, 836--841. Google Scholar
Digital Library
- K. Wang, C. Zong, and K.-Y. Su. 2010. A character-based joint model for chinese word segmentation. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING). Coling 2010 Organizing Committee, Beijing, China, 1173--1181. Google Scholar
Digital Library
- T. Xiao and J. Zhu. 2013. Unsupervised sub-tree alignment for tree-to-tree translation. Journal of Artificial Intelligence Research (JAIR) 48(2013), 733--782. Google Scholar
Digital Library
- D. Xiong, Q. Liu, and S. Lin, 2006. Maximum entropy based phrase reordering model for statistical machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL). Association for Computational Linguistics, Sydney, Australia, 521--528. Google Scholar
Digital Library
- D. Xiong and M. Zhang. 2014. A sense-based translation model for statistical machine translation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, Baltimore, Maryland, 1459--1469.Google Scholar
- F. Zhai, J. Zhang, Y. Zhou, and C. Zong. 2013. Unsupervised tree induction for tree-based translation. Transactions of Association for Computational Linguistics (TACL) 1(2013), 243--254.Google Scholar
Cross Ref
- J. Zhang, S. Liu, M. Li, M. Zhou, and C. Zong. 2014. Mind the gap: machine translation by minimizing the semantic gap in embedding space. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI). Association for the Advancement of Artificial Intelligence, Québec, Canada, 1657--1663.Google Scholar
Index Terms
Modeling Monolingual Character Alignment for Automatic Evaluation of Chinese Translation
Recommendations
Hybrid model for Chinese character recognition based on Tesseract-OCR
Optical character recognition (OCR) is an important way to input information into a computer. And text information can be extracted by OCR from an image. Currently, the accuracy rate of Chinese OCR can also be improved. This study proposes a hybrid ...
Semantic-Based Handwritten Chinese Character Recognition Model
ICCMS '10: Proceedings of the 2010 Second International Conference on Computer Modeling and Simulation - Volume 01There have been many different literals discussing algorithms for handwritten Chinese character recognition, but most algorithms aim at recognizing isolated Chinese character one by one. Therefore, their recognition accuracy isn’t good enough for the ...
Hand Printed Chinese Character Recognition via Machine Learning
ICDAR '97: Proceedings of the 4th International Conference on Document Analysis and RecognitionRecognition of Chinese characters has been an area of great interest for many years, and a large number of research papers and reports have been published in this area. There are several major problems with Chinese character recognition: Chinese ...






Comments