Abstract
Most of the syntax-based metrics obtain the similarity by comparing the sub-structures extracted from the trees of hypothesis and reference. These sub-structures cannot represent all the information in the trees because their lengths are limited. To sufficiently use the reference syntax information, a new automatic evaluation metric is proposed based on the dependency parsing model. First, a dependency parsing model is trained using the reference dependency tree for each sentence. Then, the hypothesis is parsed by this dependency parsing model and the corresponding hypothesis dependency tree is generated. The quality of hypothesis can be judged by the quality of the hypothesis dependency tree. Unigram F-score is included in the new metric so that lexicon similarity is obtained. According to experimental results, the proposed metric can perform better than METEOR and BLEU on system level and get comparable results with METEOR on sentence level. To further improve the performance, we also propose a combined metric which gets the best performance on the sentence level and on the system level.
- Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Association for Computational Linguistics, Ann Arbor, Michigan, 65--72. http://www.aclweb.org/anthology/W/W05/W05-0909.Google Scholar
- Yee Seng Chan and Hwee Tou Ng. 2008. MAXSIM: A maximum similarity metric for machine translation evaluation. In Proceedings of ACL-08: HLT. 55--62.Google Scholar
- Boxing Chen and Roland Kuhn. 2011. AMBER: A modified BLEU, enhanced ranking metric. In Proceedings of the 6th Workshop on Statistical Machine Translation. Association for Computational Linguistics, Edinburgh, Scotland, 71--77. http://www.aclweb.org/anthology/W11-2105. Google Scholar
Digital Library
- Boxing Chen, Roland Kuhn, and George Foster. 2012. Improving AMBER, an MT evaluation metric. In Proceedings of the 7th Workshop on Statistical Machine Translation (WMT’12). Association for Computational Linguistics, Stroudsburg, PA, 59--63. http://dl.acm.org/citation.cfm?id=2393015.2393021 Google Scholar
Digital Library
- Michael Denkowski and Alon Lavie. 2011. Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems. In Proceedings of the EMNLP 2011 Workshop on Statistical Machine Translation.Google Scholar
Digital Library
- Michael Denkowski and Alon Lavie. 2014. Meteor universal: Language-specific translation evaluation for any target language. In Proceedings of the EACL 2014 Workshop on Statistical Machine Translation.Google Scholar
Cross Ref
- Markus Dreyer and Daniel Marcu. 2012. Hyter: Meaning-equivalent semantics for translation evaluation. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 162--171. Google Scholar
Digital Library
- Melania Duma, Cristina Vertan, and Wolfgang Menzel. 2013. A new syntactic metric for evaluation of machine translation. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics Proceedings of the Student Research Workshop. Association for Computational Linguistics, Sofia, Bulgaria, 130--135. http://www.aclweb.org/anthology/P13-3019.Google Scholar
- Hiroshi Echizen-ya and Kenji Araki. 2010. Automatic evaluation method for machine translation using noun-phrase chunking. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL’10). Association for Computational Linguistics, Stroudsburg, PA, 108--117. http://dl.acm.org/citation.cfm?id=1858681.1858693Google Scholar
- Shubham Gautam and Pushpak Bhattacharyya. 2014. LAYERED: Metric for machine translation evaluation. In Proceedings of the 9th Workshop on Statistical Machine Translation. Association for Computational Linguistics, Baltimore, Maryland, 387--393. http://www.aclweb.org/anthology/W14-3350.Google Scholar
Cross Ref
- Meritxell Gonzàlez, Alberto Barrón-Cedeño, and Lluís Màrquez. 2014. IPA and STOUT: Leveraging linguistic and source-based features for machine translation evaluation. In Proceedings of the 9th Workshop on Statistical Machine Translation. Association for Computational Linguistics, Baltimore, Maryland, 394--401. http://www.aclweb.org/anthology/W14-3351.Google Scholar
Cross Ref
- Shafiq Joty, Francisco Guzmán, Lluís Màrquez, and Preslav Nakov. 2014. DiscoTK: Using discourse structure for machine translation evaluation. In Proceedings of the 9th Workshop on Statistical Machine Translation. Association for Computational Linguistics, Baltimore, Maryland, 402--408. http://www.aclweb.org/anthology/W14-3352.Google Scholar
Cross Ref
- Alon Lavie and Abhaya Agarwal. 2007. Meteor: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of the 2nd Workshop on Statistical Machine Translation (StatMT’07). Association for Computational Linguistics, Stroudsburg, PA, 228--231. http://dl.acm.org/citation.cfm?id=1626355.1626389Google Scholar
Cross Ref
- Alon Lavie and Michael J. Denkowski. 2009. The METEOR metric for automatic evaluation of machine translation. Machine Translation 23, 2--3 (2009), 105--115. Google Scholar
Digital Library
- Ding Liu and Daniel Gildea. 2005. Syntactic features for evaluation of machine translation. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. 25--32.Google Scholar
- Chi-kiu Lo, Anand Karthik Tumuluru, and Dekai Wu. 2012. Fully automatic semantic MT evaluation. In Proceedings of the 7th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 243--252. Google Scholar
Digital Library
- Chi-kiu Lo, Anand Karthik Tumuluru, and Dekai Wu. 2012. Fully automatic semantic MT evaluation. In Proceedings of the 7th Workshop on Statistical Machine Translation. Association for Computational Linguistics, Montréal, Canada, 243--252. http://www.aclweb.org/anthology/W12-3129. Google Scholar
Digital Library
- Chi-kiu Lo and Dekai Wu. 2011. MEANT: An inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility based on semantic roles. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Portland, Oregon, 220--229. http://www.aclweb.org/anthology/P11-1023.Google Scholar
- Chi-kiu Lo and Dekai Wu. 2013. MEANT at WMT 2013: A tunable, accurate yet inexpensive semantic frame based MT evaluation metric. In Proceedings of the 8th Workshop on Statistical Machine Translation. Association for Computational Linguistics, Sofia, Bulgaria, 422--428. http://www.aclweb.org/anthology/W13-2254.Google Scholar
- Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1412--1421.Google Scholar
- Dennis Mehay and Chris Brew. 2007. BLEUÂTRE: Flattening syntactic dependencies for MT evaluation. In Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation (TMI).Google Scholar
- Fandong Meng, Zhengdong Lu, Hang Li, and Qun Liu. 2016. Interactive attention for neural machine translation. In Proceedings of the International Conference on Computational Linguistics (2016), 2174--2185.Google Scholar
- F. J. Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1. Association for Computational Linguistics, 160--167. Google Scholar
Digital Library
- Karolina Owczarzak, Josef van Genabith, and Andy Way. 2007. Dependency-based automatic evaluation for machine translation. In Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation (SSST’07). Association for Computational Linguistics, Stroudsburg, PA, 80--87. http://dl.acm.org/citation.cfm?id=1626281.1626292Google Scholar
Digital Library
- Karolina Owczarzak, Josef van Genabith, and Andy Way. 2007. Evaluating machine translation with LFG dependencies. Machine Translation 21, 2 (June 2007), 95--119. Google Scholar
Digital Library
- Karolina Owczarzak, Josef van Genabith, and Andy Way. 2007. Labelled dependencies in machine translation evaluation. In Proceedings of the 2nd Workshop on Statistical Machine Translation (StatMT’07). Association for Computational Linguistics, Stroudsburg, PA, 104--111. http://dl.acm.org/citation.cfm?id=1626355.1626369Google Scholar
Cross Ref
- Sebastian Padó, Michel Galley, Dan Jurafsky, and Christopher D. Manning. 2009. Textual entailment features for machine translation evaluation. In Proceedings of the 4th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 37--41. Google Scholar
Digital Library
- K. Papineni, S. Roukos, T. Ward, and W. J. Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 311--318.Google Scholar
- Maja Popović and Hermann Ney. 2009. Syntax-oriented evaluation measures for machine translation output. In Proceedings of the 4th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 29--32. Google Scholar
Digital Library
- Martin F. Porter. 2001. Snowball: A language for stemming algorithms. {Online}. http://snowball.tartarus.org/texts/introduction.html.Google Scholar
- Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of Association for Machine Translation in the Americas. 223--231.Google Scholar
- Matthew Snover, Nitin Madnani, Bonnie J. Dorr, and Richard Schwartz. 2009. Fluency, adequacy, or HTER?: Exploring different human judgments with a tunable MT metric. In Proceedings of the 4th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 259--268. Google Scholar
Digital Library
- Liling Tan, Rohit Gupta, and Josef van Genabith. 2015. USAAR-WLV: Hypernym generation with deep neural nets. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15). Association for Computational Linguistics, Denver, Colorado, 932--937. http://www.aclweb.org/anthology/S15-2155.Google Scholar
Cross Ref
- Zhaopeng Tu, Yang Liu, Lifeng Shang, Xiaohua Liu, and Hang Li. 2016. Neural machine translation with reconstruction. In Proceedings of the National Conference on Artificial Intelligence (2016), 3097--3103. Google Scholar
Digital Library
- Derek F. Wong, Yi Lu, and Lidia S. Chao. 2016. Bilingual recursive neural network based data selection for statistical machine translation. New Avenues in Knowledge Bases for Natural Language Processing. Knowledge-Based Systems 108 (2016), 15--24. Google Scholar
Digital Library
- Hui Yu, Qingsong Ma, Xiaofeng Wu, and Qun Liu. 2015. CASICT-DCU participation in WMT2015 metrics task. In Proceedings of the 10th Workshop on Statistical Machine Translation. Association for Computational Linguistics, Lisbon, Portugal, 417--421. http://aclweb.org/anthology/W15-3053.Google Scholar
Cross Ref
- Hui Yu, Xiaofeng Wu, Jun Xie, Wenbin Jiang, Qun Liu, and Shouxun Lin. 2014. RED: A reference dependency based MT evaluation metric. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. Dublin City University and Association for Computational Linguistics, Dublin, Ireland, 2042--2051. http://www.aclweb.org/anthology/C14-1193.Google Scholar
- Hui Yu, Weizhi Xu, Shouxun Lin, and Qun Liu. 2017. ENTF: An entropy-based MT evaluation metric. In Machine Translation, Derek F. Wong and Deyi Xiong (Eds.). Springer Singapore, Singapore, 68--77.Google Scholar
- Yue Zhang and Stephen Clark. 2008. A tale of two parsers: Investigating and combining graph-based and transition-based dependency parsing using beam-search. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 562--571. Google Scholar
Digital Library
- Junguo Zhu, Muyun Yang, Bo Wang, Sheng Li, and Tiejun Zhao. 2010. All in strings: A powerful string-based automatic MT evaluation metric with multiple granularities. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters (COLING’10). Association for Computational Linguistics, Stroudsburg, PA, 1533--1540. http://dl.acm.org/citation.cfm?id=1944566.1944741 Google Scholar
Digital Library
Index Terms
Machine Translation Evaluation Metric Based on Dependency Parsing Model
Recommendations
A Neural Network Classifier Based on Dependency Tree for English-Vietnamese Statistical Machine Translation
Computational Linguistics and Intelligent Text ProcessingAbstractReordering in MT is a major challenge when translating between languages with different of sentence structures. In Phrase-based statistical machine translation (PBSMT) systems, syntactic pre-ordering is a commonly used pre-processing technique. ...
A Reordering Model for Phrase-Based Machine Translation
GoTAL '08: Proceedings of the 6th international conference on Advances in Natural Language ProcessingThis paper presents a new method for reordering in phrase based statistical machine translation (PBSMT). Our method is based on previous chunk-level reordering methods for PBSMT. Our method is a global reordering. First, we parse the source language ...
Dependency-based automatic evaluation for machine translation
SSST '07: Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical TranslationWe present a novel method for evaluating the output of Machine Translation (MT), based on comparing the dependency structures of the translation and reference rather than their surface string forms. Our method uses a treebank-based, widecoverage, ...






Comments