Abstract
There has been growing interest among researchers in quality estimation (QE), which attempts to automatically predict the quality of machine translation (MT) outputs. Most existing works on QE are based on supervised approaches using quality-annotated training data. However, QE training data quality scores readily become imbalanced or skewed: QE data are mostly composed of high translation quality sentence pairs but the data lack low translation quality sentence pairs. The use of imbalanced data with an induced quality estimator tends to produce biased translation quality scores with “high” translation quality scores assigned even to poorly translated sentences. To address the data imbalance, this article proposes a simple, efficient procedure called uniformly interpolated balancing to construct more balanced QE training data by inserting greater uniformness to training data. The proposed uniformly interpolated balancing procedure is based on the preparation of two different types of manually annotated QE data: (1) default skewed data and (2) near-uniform data. First, we obtain default skewed data in a naive manner without considering the imbalance by manually annotating qualities on MT outputs. Second, we obtain near-uniform data in a selective manner by manually annotating a subset only, which is selected from the automatically quality-estimated sentence pairs. Finally, we create uniformly interpolated balanced data by combining these two types of data, where one half originates from the default skewed data and the other half originates from the near-uniform data. We expect that uniformly interpolated balancing reflects the intrinsic skewness of the true quality distribution and manages the imbalance problem. Experimental results on an English-Korean quality estimation task show that the proposed uniformly interpolated balancing leads to robustness on both skewed and uniformly distributed quality test sets when compared to the test sets of other non-balanced datasets.
- Gustavo E. A. P. A. Batista, Ronaldo C. Prati, and Maria Carolina Monard. 2004. A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newslett. 6, 1 (2004), 20--29.Google Scholar
Digital Library
- Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Aurelie Neveol, Mariana Neves, Martin Popel, Matt Post, Raphael Rubino, Carolina Scarton, Lucia Specia, Marco Turchi, Karin Verspoor, and Marcos Zampieri. 2016. Findings of the 2016 conference on machine translation. In Proceedings of the 1st Conference on Machine Translation. Association for Computational Linguistics, 131--198. Retrieved from http://www.aclweb.org/anthology/W/W16/W16-2301.Google Scholar
- Antoine Bordes, Seyda Ertekin, Jason Weston, and Léon Bottou. 2005. Fast kernel classifiers with online and active learning. J. Mach. Learn. Res. 6 (2005), 1579--1619.Google Scholar
Digital Library
- Paula Branco, Luís Torgo, and Rita P. Ribeiro. 2016. A survey of predictive modeling on imbalanced domains. Comput. Surv. 49, 2 (2016), 31:1--31:50.Google Scholar
- Paula Branco, Luís Torgo, and Rita P. Ribeiro. 2017. SMOGN: A pre-processing approach for imbalanced regression. In Proceedings of the 1st International Workshop on Learning with Imbalanced Domains: Theory and Applications ([email protected]/ECML’17). 36--50.Google Scholar
- N. V. Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer. 2003. SMOTEBoost: Improving prediction of the minority class in boosting. In Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD’03). 107--119.Google Scholar
- Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. J. Artific. Intell. Res. 16, 1 (2002), 321--357.Google Scholar
Digital Library
- Kevin Duh, Katsuhito Sudoh, Xianchao Wu, Hajime Tsukada, and Masaaki Nagata. 2012. Learning to translate with multiple objectives. In Proceedings of the 50th Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL’12). 1--10.Google Scholar
Digital Library
- Seyda Ertekin, Jian Huang, Leon Bottou, and Lee Giles. 2007. Learning on the border: Active learning in imbalanced data classification. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM’07). 127--136.Google Scholar
Digital Library
- Seyda Ertekin, Jian Huang, and C. Lee Giles. 2007. Active learning for class imbalance problem. In Proceedings of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’07). 823--824.Google Scholar
- Wei Fan, Salvatore J. Stolfo, Junxin Zhang, and Philip K. Chan. 1999. AdaCost: Misclassification cost-sensitive boosting. In Proceedings of the 16th International Conference on Machine Learning (ICML’99). 97--105.Google Scholar
- Mariano Felice and Lucia Specia. 2012. Linguistic features for quality estimation. In Proceedings of the 7th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 96--103. Retrieved from http://www.aclweb.org/anthology/W12-3110.Google Scholar
Digital Library
- Alberto Fernández, Salvador García, Francisco Herrera, and Nitesh V. Chawla. 2018. SMOTE for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary. J. Artific. Intell. Res. 61, 1 (2018).Google Scholar
- Yoav Freund. 1995. Boosting a weak learning algorithm by majority. Inf. Comput. 121, 2 (1995), 256--285.Google Scholar
Digital Library
- Mikel Galar, Alberto Fernandez, Edurne Barrenechea, Humberto Bustince, and Francisco Herrera. 2012. A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst., Man, Cyber., Part C: Applic. Rev. 42, 4 (2012), 463--484.Google Scholar
Digital Library
- Jesús González-Rubio, J. Ramón Navarro-Cerdán, and Francisco Casacuberta. 2013. Dimensionality reduction methods for machine translation quality estimation. Mach. Transl. 27, 3 (2013), 281--301. DOI:https://doi.org/10.1007/s10590-013-9139-3Google Scholar
Digital Library
- Jesús González-Rubio, Alberto Sanchís, and Francisco Casacuberta. 2012. PRHLT submission to the WMT12 quality estimation task. In Proceedings of the 7th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 104--108. Retrieved from http://www.aclweb.org/anthology/W12-3111.Google Scholar
Digital Library
- Haibo He, Yang Bai, Edwardo A. Garcia, and Shutao Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). 1322--1328.Google Scholar
- Haibo He and Edwardo A. Garcia. 2009. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 9 (2009), 1263--1284.Google Scholar
Digital Library
- Xia Hong, Sheng Chen, and Christopher J. Harris. 2007. A kernel-based two-class classifier for imbalanced data sets. IEEE Trans. Neural Netw. 18 (2007), 28--41.Google Scholar
Digital Library
- Taeho Jo and Nathalie Japkowicz. 2004. Class imbalances versus small disjuncts. ACM SIGKDD Expl. Newslett.—Spec. Issue Learn. Imbal. Datas. 6, 1 (2004), 40--49.Google Scholar
- Hyun Kim, Hun-Young Jung, Hongseok Kwon, Jong-Hyeok Lee, and Seung-Hoon Na. 2017. Predictor-estimator: Neural quality estimation based on target word prediction for machine translation. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 17, 1, Article 3 (Sept. 2017), 22 pages. DOI:https://doi.org/10.1145/3109480Google Scholar
Digital Library
- Hyun Kim and Jong-Hyeok Lee. 2016. Recurrent neural network based translation quality estimation. In Proceedings of the 1st Conference on Machine Translation. Association for Computational Linguistics, 787--792. Retrieved from http://www.aclweb.org/anthology/W/W16/W16-2384.Google Scholar
Cross Ref
- Hyun Kim and Jong-Hyeok Lee. 2016. A recurrent neural networks approach for estimating the quality of machine translation output. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 494--498. Retrieved from http://www.aclweb.org/anthology/N16-1059.Google Scholar
Cross Ref
- Hyun Kim, Jong-Hyeok Lee, and Seung-Hoon Na. 2017. Predictor-estimator using multilevel task learning with stack propagation for neural quality estimation. In Proceedings of the 2nd Conference on Machine Translation, Volume 2: Shared Task Papers. Association for Computational Linguistics, 562--568. Retrieved from http://www.aclweb.org/anthology/W17-4763.Google Scholar
Cross Ref
- Hyun Kim, Jong-Hyeok Lee, and Seung-Hoon Na. 2019. Multi-task stack propagation for neural quality estimation. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 18, 4, Article 48 (May 2019), 18 pages. DOI:https://doi.org/10.1145/3321127Google Scholar
Digital Library
- Hyun Kim, Jaehun Shin, Wonkee Lee, Seungwoo Cho, and Jong-Hyeok Lee. 2018. Quality estimation of English-Korean machine translation using neural network based predictor-estimator model. J. Korean Inst. Inf. Sci. Eng. 45, 6 (2018), 545--553.Google Scholar
- Anna Kozlova, Mariya Shmatova, and Anton Frolov. 2016. YSDA participation in the WMT’16 quality estimation shared task. In Proceedings of the 1st Conference on Machine Translation. Association for Computational Linguistics, 793--799. Retrieved from http://www.aclweb.org/anthology/W/W16/W16-2385.Google Scholar
Cross Ref
- Bartosz Krawczyk. 2016. Learning from imbalanced data: Open challenges and future directions. Prog. Artific. Intell. 5, 4 (2016), 221--232.Google Scholar
Cross Ref
- Julia Kreutzer, Shigehiko Schamoni, and Stefan Riezler. 2015. QUality estimation from ScraTCH (QUETCH): Deep learning for word-level translation quality estimation. In Proceedings of the 10th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 316--322. Retrieved from http://aclweb.org/anthology/W15-3037.Google Scholar
Cross Ref
- Matjaz Kukar and Igor Kononenko. 1998. Cost-sensitive learning with neural networks. In Proceedings of the European Conference on Artificial Intelligence (ECAI’98).Google Scholar
- Fengqi Li, Chuang Yu, Nanhai Yang, Feng Xia, Guangming Li, and Fatemeh Kaveh-Yazdy. 2013. Iterative nearest neighborhood oversampling in semisupervised learning from imbalanced data. Sci. World J. 2013, Article 875450 (2013), 9 pages. DOI:https://doi.org/10.1155/2013/875450Google Scholar
- Xu-Ying Liu, Jianxin Wu, and Zhi-Hua Zhou. 2009. Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst., Man, Cyber., Part B: Cyber. 39, 2 (2009), 539--550.Google Scholar
Digital Library
- Rushi Longadge and Snehalata Dongre. 2013. Class imbalance problem in data mining review. CoRR abs/1305.1707 (2013).Google Scholar
- André F. T. Martins, Ramón Astudillo, Chris Hokamp, and Fabio Kepler. 2016. Unbabel’s participation in the WMT16 word-level translation quality estimation shared task. In Proceedings of the 1st Conference on Machine Translation. Association for Computational Linguistics. 806--811. Retrieved from http://www.aclweb.org/anthology/W/W16/W16-2387.Google Scholar
Cross Ref
- André F.T. Martins, Marcin Junczys-Dowmunt, Fabio N. Kepler, Ramón Astudillo, Chris Hokamp, Roman Grundkiewicz. 2017. Pushing the Limits of Translation Quality Estimation. Transactions of the Association for Computational Linguistics 5 (2017), 205--218.Google Scholar
Cross Ref
- Erwan Moreau and Carl Vogel. 2014. Limitations of MT quality estimation supervised systems: The tails prediction problem. In Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers (COLING’14). Dublin City University and Association for Computational Linguistics, 2205--2216. Retrieved from http://www.aclweb.org/anthology/C14-1208.Google Scholar
- Youngja Park, Zijie Qi, Suresh N. Chari, and Ian M. Molloy. 2012. Generating balanced classifier-independent training samples from unlabeled data. In Proceedings of the 16th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining—Volume Part I (PAKDD’12). 266--281.Google Scholar
- Raj Nath Patel and M. Sasikumar. 2016. Translation quality estimation using recurrent neural network. In Proceedings of the 1st Conference on Machine Translation. Association for Computational Linguistics, 819--824. Retrieved from http://www.aclweb.org/anthology/W/W16/W16-2389.Google Scholar
- Raphael Rubino, Jose de Souza, Jennifer Foster, and Lucia Specia. 2013. Topic models for translation quality estimation for gisting purposes. In Proceedings of the 14th Machine Translation Summit. 295--302.Google Scholar
- Baskaran Sankaran, Anoop Sarkar, and Kevin Duh. 2013. Multi-metric optimization using ensemble tuning. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL’13). 947--957.Google Scholar
- Kashif Shah, Trevor Cohn, and Lucia Specia. 2015. A Bayesian non-linear method for feature selection in machine translation quality estimation. Mach. Transl. 29, 2 (2015), 101--125. DOI:https://doi.org/10.1007/s10590-014-9164-xGoogle Scholar
Digital Library
- Kashif Shah, Varvara Logacheva, Gustavo Paetzold, Frédéric Blain, Daniel Beck, Fethi Bougares, and Lucia Specia. 2015. SHEF-NN: Translation quality estimation with neural networks. In Proceedings of the 10th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 342--347. Retrieved from http://aclweb.org/anthology/W15-3041.Google Scholar
Cross Ref
- Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the Association for Machine Translation in the Americas. 223--231.Google Scholar
- Radu Soricut and Abdessamad Echihabi. 2010. TrustRank: Inducing trust in automatic translations via ranking. In Proceedings of the 48th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 612--621. Retrieved from http://www.aclweb.org/anthology/P10-1063.Google Scholar
Digital Library
- Lucia Specia, Varvara Logacheva, and Carolina Scarton. 2016. WMT16 Quality Estimation Shared Task Training and Development Data. Retrieved from http://hdl.handle.net/11372/LRT-1646.Google Scholar
- Lucia Specia, Kashif Shah, Jose G. C. de Souza, and Trevor Cohn. 2013. QuEst—A translation quality estimation framework. In Proceedings of the 51st Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, 79--84. Retrieved from http://www.aclweb.org/anthology/P13-4014.Google Scholar
- Yanmin Sun. 2007. Cost-sensitive boosting for classification of imbalanced data. Pattern Recog. 40 (2007), 3358--3378.Google Scholar
Digital Library
- Kai Ming Ting. 2000. A comparative study of cost-sensitive boosting algorithms. In Proceedings of the 17th International Conference on Machine Learning (ICML’00). 983--990.Google Scholar
- Ivan Tomek. 1976. Two modifications of CNN. IEEE Trans. Syst., Man, Cyber. 6, 11 (1976), 769--772.Google Scholar
- Luís Torgo, Paula Branco, Rita P. Ribeiro, and Bernhard Pfahringer. 2015. Resampling strategies for regression. Exp. Syst.: J. Knowl. Eng. 32, 3 (2015), 465--476.Google Scholar
Digital Library
- Gang Wu and Edward Y. Chang. 2003. Adaptive feature-space conformal transformation for imbalanced-data learning. In Proceedings of the 20th International Conference on International Conference on Machine Learning (ICML’03). 816--823.Google Scholar
- Gang Wu and Edward Y. Chang. 2005. KBA: Kernel boundary alignment considering imbalanced data distribution. IEEE Trans. Knowl. Data Eng. 17 (2005), 786--795.Google Scholar
Digital Library
- Show-Jane Yen and Yue-Shi Lee. 2009. Cluster-based under-sampling approaches for imbalanced data distributions. Exp. Syst. Applic. 36, 3 (2009), 5718--5727.Google Scholar
Digital Library
- Jianping Zhang and Interjeet Mani. 2013. KNN approach to unbalanced data distributions: A case study involving information extraction. In Proceedings of the International Conference on Machine Learning (ICML’03), Workshop on Learning from Imbalanced Data Sets.Google Scholar
- Zhi-Hua Zhou and Xu-Ying Liu. 2006. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18, 1 (2006), 63--77.Google Scholar
Digital Library
- Weiwei Zong, Guang-Bin Huang, and Yiqiang Chen. 2013. Weighted extreme learning machine for imbalance learning. Neurocomputing 101 (2013), 229--242.Google Scholar
Digital Library
Index Terms
Uniformly Interpolated Balancing for Robust Prediction in Translation Quality Estimation: A Case Study of English-Korean Translation
Recommendations
Multi-task Stack Propagation for Neural Quality Estimation
Quality estimation is an important task in machine translation that has attracted increased interest in recent years. A key problem in translation-quality estimation is the lack of a sufficient amount of the quality annotated training data. To address ...
Divide and translate: improving long distance reordering in statistical machine translation
WMT '10: Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATRThis paper proposes a novel method for long distance, clause-level reordering in statistical machine translation (SMT). The proposed method separately translates clauses in the source sentence and reconstructs the target sentence using the clause ...






Comments