skip to main content
research-article

Machine Translation Evaluation Metric Based on Dependency Parsing Model

Authors Info & Claims
Published:10 June 2019Publication History
Skip Abstract Section

Abstract

Most of the syntax-based metrics obtain the similarity by comparing the sub-structures extracted from the trees of hypothesis and reference. These sub-structures cannot represent all the information in the trees because their lengths are limited. To sufficiently use the reference syntax information, a new automatic evaluation metric is proposed based on the dependency parsing model. First, a dependency parsing model is trained using the reference dependency tree for each sentence. Then, the hypothesis is parsed by this dependency parsing model and the corresponding hypothesis dependency tree is generated. The quality of hypothesis can be judged by the quality of the hypothesis dependency tree. Unigram F-score is included in the new metric so that lexicon similarity is obtained. According to experimental results, the proposed metric can perform better than METEOR and BLEU on system level and get comparable results with METEOR on sentence level. To further improve the performance, we also propose a combined metric which gets the best performance on the sentence level and on the system level.

References

  1. Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Association for Computational Linguistics, Ann Arbor, Michigan, 65--72. http://www.aclweb.org/anthology/W/W05/W05-0909.Google ScholarGoogle Scholar
  2. Yee Seng Chan and Hwee Tou Ng. 2008. MAXSIM: A maximum similarity metric for machine translation evaluation. In Proceedings of ACL-08: HLT. 55--62.Google ScholarGoogle Scholar
  3. Boxing Chen and Roland Kuhn. 2011. AMBER: A modified BLEU, enhanced ranking metric. In Proceedings of the 6th Workshop on Statistical Machine Translation. Association for Computational Linguistics, Edinburgh, Scotland, 71--77. http://www.aclweb.org/anthology/W11-2105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Boxing Chen, Roland Kuhn, and George Foster. 2012. Improving AMBER, an MT evaluation metric. In Proceedings of the 7th Workshop on Statistical Machine Translation (WMT’12). Association for Computational Linguistics, Stroudsburg, PA, 59--63. http://dl.acm.org/citation.cfm?id=2393015.2393021 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Michael Denkowski and Alon Lavie. 2011. Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems. In Proceedings of the EMNLP 2011 Workshop on Statistical Machine Translation.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Michael Denkowski and Alon Lavie. 2014. Meteor universal: Language-specific translation evaluation for any target language. In Proceedings of the EACL 2014 Workshop on Statistical Machine Translation.Google ScholarGoogle ScholarCross RefCross Ref
  7. Markus Dreyer and Daniel Marcu. 2012. Hyter: Meaning-equivalent semantics for translation evaluation. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 162--171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Melania Duma, Cristina Vertan, and Wolfgang Menzel. 2013. A new syntactic metric for evaluation of machine translation. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics Proceedings of the Student Research Workshop. Association for Computational Linguistics, Sofia, Bulgaria, 130--135. http://www.aclweb.org/anthology/P13-3019.Google ScholarGoogle Scholar
  9. Hiroshi Echizen-ya and Kenji Araki. 2010. Automatic evaluation method for machine translation using noun-phrase chunking. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL’10). Association for Computational Linguistics, Stroudsburg, PA, 108--117. http://dl.acm.org/citation.cfm?id=1858681.1858693Google ScholarGoogle Scholar
  10. Shubham Gautam and Pushpak Bhattacharyya. 2014. LAYERED: Metric for machine translation evaluation. In Proceedings of the 9th Workshop on Statistical Machine Translation. Association for Computational Linguistics, Baltimore, Maryland, 387--393. http://www.aclweb.org/anthology/W14-3350.Google ScholarGoogle ScholarCross RefCross Ref
  11. Meritxell Gonzàlez, Alberto Barrón-Cedeño, and Lluís Màrquez. 2014. IPA and STOUT: Leveraging linguistic and source-based features for machine translation evaluation. In Proceedings of the 9th Workshop on Statistical Machine Translation. Association for Computational Linguistics, Baltimore, Maryland, 394--401. http://www.aclweb.org/anthology/W14-3351.Google ScholarGoogle ScholarCross RefCross Ref
  12. Shafiq Joty, Francisco Guzmán, Lluís Màrquez, and Preslav Nakov. 2014. DiscoTK: Using discourse structure for machine translation evaluation. In Proceedings of the 9th Workshop on Statistical Machine Translation. Association for Computational Linguistics, Baltimore, Maryland, 402--408. http://www.aclweb.org/anthology/W14-3352.Google ScholarGoogle ScholarCross RefCross Ref
  13. Alon Lavie and Abhaya Agarwal. 2007. Meteor: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of the 2nd Workshop on Statistical Machine Translation (StatMT’07). Association for Computational Linguistics, Stroudsburg, PA, 228--231. http://dl.acm.org/citation.cfm?id=1626355.1626389Google ScholarGoogle ScholarCross RefCross Ref
  14. Alon Lavie and Michael J. Denkowski. 2009. The METEOR metric for automatic evaluation of machine translation. Machine Translation 23, 2--3 (2009), 105--115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ding Liu and Daniel Gildea. 2005. Syntactic features for evaluation of machine translation. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. 25--32.Google ScholarGoogle Scholar
  16. Chi-kiu Lo, Anand Karthik Tumuluru, and Dekai Wu. 2012. Fully automatic semantic MT evaluation. In Proceedings of the 7th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 243--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Chi-kiu Lo, Anand Karthik Tumuluru, and Dekai Wu. 2012. Fully automatic semantic MT evaluation. In Proceedings of the 7th Workshop on Statistical Machine Translation. Association for Computational Linguistics, Montréal, Canada, 243--252. http://www.aclweb.org/anthology/W12-3129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Chi-kiu Lo and Dekai Wu. 2011. MEANT: An inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility based on semantic roles. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Portland, Oregon, 220--229. http://www.aclweb.org/anthology/P11-1023.Google ScholarGoogle Scholar
  19. Chi-kiu Lo and Dekai Wu. 2013. MEANT at WMT 2013: A tunable, accurate yet inexpensive semantic frame based MT evaluation metric. In Proceedings of the 8th Workshop on Statistical Machine Translation. Association for Computational Linguistics, Sofia, Bulgaria, 422--428. http://www.aclweb.org/anthology/W13-2254.Google ScholarGoogle Scholar
  20. Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1412--1421.Google ScholarGoogle Scholar
  21. Dennis Mehay and Chris Brew. 2007. BLEUÂTRE: Flattening syntactic dependencies for MT evaluation. In Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation (TMI).Google ScholarGoogle Scholar
  22. Fandong Meng, Zhengdong Lu, Hang Li, and Qun Liu. 2016. Interactive attention for neural machine translation. In Proceedings of the International Conference on Computational Linguistics (2016), 2174--2185.Google ScholarGoogle Scholar
  23. F. J. Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1. Association for Computational Linguistics, 160--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Karolina Owczarzak, Josef van Genabith, and Andy Way. 2007. Dependency-based automatic evaluation for machine translation. In Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation (SSST’07). Association for Computational Linguistics, Stroudsburg, PA, 80--87. http://dl.acm.org/citation.cfm?id=1626281.1626292Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Karolina Owczarzak, Josef van Genabith, and Andy Way. 2007. Evaluating machine translation with LFG dependencies. Machine Translation 21, 2 (June 2007), 95--119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Karolina Owczarzak, Josef van Genabith, and Andy Way. 2007. Labelled dependencies in machine translation evaluation. In Proceedings of the 2nd Workshop on Statistical Machine Translation (StatMT’07). Association for Computational Linguistics, Stroudsburg, PA, 104--111. http://dl.acm.org/citation.cfm?id=1626355.1626369Google ScholarGoogle ScholarCross RefCross Ref
  27. Sebastian Padó, Michel Galley, Dan Jurafsky, and Christopher D. Manning. 2009. Textual entailment features for machine translation evaluation. In Proceedings of the 4th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 37--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. K. Papineni, S. Roukos, T. Ward, and W. J. Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 311--318.Google ScholarGoogle Scholar
  29. Maja Popović and Hermann Ney. 2009. Syntax-oriented evaluation measures for machine translation output. In Proceedings of the 4th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 29--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Martin F. Porter. 2001. Snowball: A language for stemming algorithms. {Online}. http://snowball.tartarus.org/texts/introduction.html.Google ScholarGoogle Scholar
  31. Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of Association for Machine Translation in the Americas. 223--231.Google ScholarGoogle Scholar
  32. Matthew Snover, Nitin Madnani, Bonnie J. Dorr, and Richard Schwartz. 2009. Fluency, adequacy, or HTER?: Exploring different human judgments with a tunable MT metric. In Proceedings of the 4th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 259--268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Liling Tan, Rohit Gupta, and Josef van Genabith. 2015. USAAR-WLV: Hypernym generation with deep neural nets. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15). Association for Computational Linguistics, Denver, Colorado, 932--937. http://www.aclweb.org/anthology/S15-2155.Google ScholarGoogle ScholarCross RefCross Ref
  34. Zhaopeng Tu, Yang Liu, Lifeng Shang, Xiaohua Liu, and Hang Li. 2016. Neural machine translation with reconstruction. In Proceedings of the National Conference on Artificial Intelligence (2016), 3097--3103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Derek F. Wong, Yi Lu, and Lidia S. Chao. 2016. Bilingual recursive neural network based data selection for statistical machine translation. New Avenues in Knowledge Bases for Natural Language Processing. Knowledge-Based Systems 108 (2016), 15--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Hui Yu, Qingsong Ma, Xiaofeng Wu, and Qun Liu. 2015. CASICT-DCU participation in WMT2015 metrics task. In Proceedings of the 10th Workshop on Statistical Machine Translation. Association for Computational Linguistics, Lisbon, Portugal, 417--421. http://aclweb.org/anthology/W15-3053.Google ScholarGoogle ScholarCross RefCross Ref
  37. Hui Yu, Xiaofeng Wu, Jun Xie, Wenbin Jiang, Qun Liu, and Shouxun Lin. 2014. RED: A reference dependency based MT evaluation metric. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. Dublin City University and Association for Computational Linguistics, Dublin, Ireland, 2042--2051. http://www.aclweb.org/anthology/C14-1193.Google ScholarGoogle Scholar
  38. Hui Yu, Weizhi Xu, Shouxun Lin, and Qun Liu. 2017. ENTF: An entropy-based MT evaluation metric. In Machine Translation, Derek F. Wong and Deyi Xiong (Eds.). Springer Singapore, Singapore, 68--77.Google ScholarGoogle Scholar
  39. Yue Zhang and Stephen Clark. 2008. A tale of two parsers: Investigating and combining graph-based and transition-based dependency parsing using beam-search. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 562--571. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Junguo Zhu, Muyun Yang, Bo Wang, Sheng Li, and Tiejun Zhao. 2010. All in strings: A powerful string-based automatic MT evaluation metric with multiple granularities. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters (COLING’10). Association for Computational Linguistics, Stroudsburg, PA, 1533--1540. http://dl.acm.org/citation.cfm?id=1944566.1944741 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Machine Translation Evaluation Metric Based on Dependency Parsing Model

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 18, Issue 4
      December 2019
      305 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3327969
      Issue’s Table of Contents

      Copyright © 2019 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 June 2019
      • Accepted: 1 February 2019
      • Revised: 1 November 2018
      • Received: 1 July 2018
      Published in tallip Volume 18, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!