Abstract
Two of the most popular Machine Translation (MT) paradigms are rule based (RBMT) and corpus based, which include the statistical systems (SMT). When scarce parallel corpus is available, RBMT becomes particularly attractive. This is the case of the Chinese--Spanish language pair.
This article presents the first RBMT system for Chinese to Spanish. We describe a hybrid method for constructing this system taking advantage of available resources such as parallel corpora that are used to extract dictionaries and lexical and structural transfer rules.
The final system is freely available online and open source. Although performance lags behind standard SMT systems for an in-domain test set, the results show that the RBMT’s coverage is competitive and it outperforms the SMT system in an out-of-domain test set. This RBMT system is available to the general public, it can be further enhanced, and it opens up the possibility of creating future hybrid MT systems.
- C. Armentano-Oller, R. C. Carrasco, A. M. Corb-Bellot, M. L. Forcada, M. Ginestí-Rosell, S. Ortiz-Rojas, J. A. Pérez-Ortiz, G. Ramírez-Sánchez, F. Sánchez-Martínez, and M. A. Scalco. 2006. Open-source Portuguese-Spanish machine translation. In Computational Processing of the Portuguese Language, Proceedings of the 7th International Workshop on Computational Processing of Written and Spoken Portuguese (PROPOR’06), R. Vieira, P. Quaresma, M.d.G.V. Nunes, N. J. Mamede, C. Oliveira, and M. C. Dias (Eds.). Lecture Notes in Computer Science, Vol. 3960. Springer-Verlag, 50--59. Google Scholar
Digital Library
- N. Bertoldi, R. Cattoni, M. Federico, and M. Barbaiani. 2008. FBK @ IWSLT-2008. In Proceedings of the International Workshop on Spoken Language Translation. Hawaii, 34--38.Google Scholar
- M. D. Brandt, H. Loftsson, H. Sigurrsson, and F. M. Tyers. 2011. Apertium-IceNLP: A rule-based Icelandic to English machine translation system. In Proceedings of the 16th Annual Conference of the European Association of Machine Translation.Google Scholar
- P. C. Chang, M. Galley, and C. D. Manning. 2008. Optimizing Chinese word segmentation for machine translation performance. In Proceedings of the 3rd Workshop on Statistical Machine Translation (StatMT’08). Association for Computational Linguistics, Stroudsburg, PA, 224--232. http://dl.acm.org/citation.cfm?id=1626394.1626430 Google Scholar
Digital Library
- A. M. Corbí-Bellot, M. L. Forcada, S. Ortiz-Rojas, J. A. Pérez-Ortiz, G. Ramírez-Sánchez, F. Sánchez-Martínez, I. Alegria, A. Mayor, and K. Sarasola. 2005. An open-source shallow-transfer machine translation engine for the romance languages of Spain. In Proceedings of the 10th Conference of the European Association for Machine Translation. 79--86.Google Scholar
- J. P. Martínez Cortés, J. O’Regan, and F. M. Tyers. 2012. Free/open source shallow-transfer based machine translation for Spanish and Aragonese. In LREC. 2153--2157.Google Scholar
- M. R. Costa-Jussà, M. Farrús, J. B. Mariño, and J. A. R. Fonollosa. 2012. Study and comparison of rule-based and statistical catalan-spanish machine translation systems. Computing and Informatics 31, 2 (2012), 245--270.Google Scholar
- M. R. Costa-jussà, C. A. Henríquez Q, and R. E. Banchs. 2012. Evaluating indirect strategies for Chinese-Spanish statistical machine translation. Journal of Artificial Intelligence Research 45, 1 (Sept. 2012), 761--780. Google Scholar
Digital Library
- A. Cuza, M. Perez-Leroux, and L. Sánchez. 2013. The role of semantic transfer in clitic-drop among Chinese L1-Spanish L2 bilinguals. Studies in Second Language Acquisition 35, 1 (2013), 93--125.Google Scholar
Cross Ref
- M. Farrús, M. R. Costa-jussà, and M. Popović. 2012. Study and correlation analysis of linguistic, perceptual, and automatic machine translation evaluations. Journal of the Association for Information Sciences and Technolgy (JASIST) 63, 1 (Jan. 2012), 174--184. DOI:http://dx.doi.org/10.1002/asi.21674 Google Scholar
Digital Library
- M. L. Forcada, M. Ginestí-Rosell, J. Nordfalk, J. O’Regan, S. Ortiz-Rojas, J. A. Pérez-Ortiz, F. Sánchez-Martínez, G. Ramírez-Sánchez, and F. M. Tyers. 2011. Apertium: A free/open-source platform for rule-based machine translation. Machine Translation 25, 2 (2011), 127--144. Google Scholar
Digital Library
- M. Gonzàlez, J. Giménez, and Ll. Màrquez. 2012. A graphical interface for MT evaluation and error analysis. In The 50th Annual Meeting of the Association for Computational Linguistics. Google Scholar
Digital Library
- C. A. Henríquez Q., R. E. Banchs, and J. B. Mariño. 2010. Learning Reordering Models for Statistical Machine Translation with a Pivot Language. (2010). Internal Report TALP-UPC.Google Scholar
- W. J. Hutchins and L. Sommers. 1992. An Introduction to Machine Translation. Academic Press. 362.Google Scholar
- IndoAsianNews. 2013. China’s trade with Latin America grew in 2011. Retrieved from http://in.news.yahoo.com/chinas-trade-latin-america-grew-2011-050334275.html.Google Scholar
- P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL’07). 177--180. Google Scholar
Digital Library
- P. Koehn, F. J. Och, and D. Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 41th Annual Meeting of the Association for Computational Linguistics. Google Scholar
Digital Library
- A. Lavie and A. Agarwal. 2007. Meteor: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of the 2nd Workshop on Statistical Machine Translation (StatMT’07). Association for Computational Linguistics, Stroudsburg, PA, 228--231. http://dl.acm.org/citation.cfm?id=1626355.1626389 Google Scholar
Digital Library
- C.-Y. Lin and F. J. Och. 2004. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (ACL’04). Association for Computational Linguistics, Stroudsburg, PA, Article 605. DOI:http://dx.doi.org/10.3115/1218955.1219032 Google Scholar
Digital Library
- A. Mayor, I. Alegria, A. Díaz de Ilarrraza, G. Labaka, M. Lersundi, and K. Sarasola. 2011. Matxin, an open-source rule-based machine translation system for Basque. Machine Translation 25 (2011), 53--82. Google Scholar
Digital Library
- I. A. McCowan, D. Moore, J. Dines, D. Gatica-Perez, M. Flynn, P. Wellner, and H. Bourlard. 2004. On the Use of Information Retrieval Measures for Speech Recognition Evaluation. Idiap-RR Idiap-RR-73-2004. IDIAP, Martigny, Switzerland.Google Scholar
- F. J. Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41th Annual Meeting of the Association for Computational Linguistics. 160--167. Google Scholar
Digital Library
- F. J. Och and H. Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29, 1 (March 2003), 19--51. Google Scholar
Digital Library
- F. J. Och and H. Ney. 2004. The alignment template approach to statistical machine translation. Computational Linguistics 30, 4 (Dec. 2004), 417--449. Google Scholar
Digital Library
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL’02). Association for Computational Linguistics, Stroudsburg, PA, 311--318. DOI:http://dx.doi.org/10.3115/1073083.1073135 Google Scholar
Digital Library
- Q-SuccessConsulting. 2013. Web Technology Surveys. Retrieved from http://w3techs.com/technologies/overview/content_language/all.Google Scholar
- A. Rafalovitch and R. Dale. 2009. United nations general assembly resolutions: A six-language parallel corpus. In Proceedings of the MT Summit XII. Ottawa, 292--299.Google Scholar
- V. Sánchez-Cartagena. 2014. Building machine translation systems for language pairs with scarce resources. Ph.D. Dissertation. Departament de Llenguatges i Sistemes Infomàtics, Universitat d'Alacant, Spain.Google Scholar
- F. Sánchez-Martínez and M. L. Forcada. 2009a. Inferring shallow-transfer machine translation rules from small parallel corpora. Journal of Artificial Intelligence Research 34 (2009), 605--635. Google Scholar
Digital Library
- F. Sánchez-Martínez and M. L. Forcada. 2009b. Inferring shallow-transfer machine translation rules from small parallel corpora. Journal of Artificial Intelligence Research 34 (2009), 605--635. Google Scholar
Digital Library
- A. Stolcke. 2002. SRILM: An extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing. 901--904.Google Scholar
- F. M. Tyers. 2013. Feasible Lexical Selection for Rule-Based Machine Translation. Ph.D. Dissertation. PhD thesis 2013. Departament de Llenguatges i Sistemes Infomtics, Universitat d’Alacant, Spain.Google Scholar
- Y. Zhang, N. Wu, and M. Yip. 2006. Lexical ambiguity resolution in Chinese sentence processing. Handbook of East Asian Psycholinguistics 1 (2006), 268--278.Google Scholar
Cross Ref
Index Terms
Description of the Chinese-to-Spanish Rule-Based Machine Translation System Developed Using a Hybrid Combination of Human Annotation and Statistical Techniques
Recommendations
A product and process analysis of post-editor corrections on neural, statistical and rule-based machine translation output
This paper presents a comparison of post-editing (PE) changes performed on English-to-Finnish neural (NMT), rule-based (RBMT) and statistical machine translation (SMT) output, combining a product-based and a process-based approach. A total of 33 ...
Dependency-Based Chinese-English Statistical Machine Translation
CICLing '07: Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text ProcessingWe present a Chinese-English Statistical Machine Translation (SMT) system based on dependency tree mappings. We use a state-of-the-art dependency parser to parse the English translation of the Penn Chinese Treebank to make it bilingual and then learn a ...
Improving English-Spanish statistical machine translation: experiments in domain adaptation, sentence paraphrasing, tokenization, and recasing
StatMT '08: Proceedings of the Third Workshop on Statistical Machine TranslationWe describe the experiments of the UC Berkeley team on improving English-Spanish machine translation of news text, as part of the WMT'08 Shared Translation Task. We experiment with domain adaptation, combining a small in-domain news bi-text and a large ...






Comments