skip to main content
short-paper

Description of the Chinese-to-Spanish Rule-Based Machine Translation System Developed Using a Hybrid Combination of Human Annotation and Statistical Techniques

Published:21 November 2015Publication History
Skip Abstract Section

Abstract

Two of the most popular Machine Translation (MT) paradigms are rule based (RBMT) and corpus based, which include the statistical systems (SMT). When scarce parallel corpus is available, RBMT becomes particularly attractive. This is the case of the Chinese--Spanish language pair.

This article presents the first RBMT system for Chinese to Spanish. We describe a hybrid method for constructing this system taking advantage of available resources such as parallel corpora that are used to extract dictionaries and lexical and structural transfer rules.

The final system is freely available online and open source. Although performance lags behind standard SMT systems for an in-domain test set, the results show that the RBMT’s coverage is competitive and it outperforms the SMT system in an out-of-domain test set. This RBMT system is available to the general public, it can be further enhanced, and it opens up the possibility of creating future hybrid MT systems.

References

  1. C. Armentano-Oller, R. C. Carrasco, A. M. Corb-Bellot, M. L. Forcada, M. Ginestí-Rosell, S. Ortiz-Rojas, J. A. Pérez-Ortiz, G. Ramírez-Sánchez, F. Sánchez-Martínez, and M. A. Scalco. 2006. Open-source Portuguese-Spanish machine translation. In Computational Processing of the Portuguese Language, Proceedings of the 7th International Workshop on Computational Processing of Written and Spoken Portuguese (PROPOR’06), R. Vieira, P. Quaresma, M.d.G.V. Nunes, N. J. Mamede, C. Oliveira, and M. C. Dias (Eds.). Lecture Notes in Computer Science, Vol. 3960. Springer-Verlag, 50--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Bertoldi, R. Cattoni, M. Federico, and M. Barbaiani. 2008. FBK @ IWSLT-2008. In Proceedings of the International Workshop on Spoken Language Translation. Hawaii, 34--38.Google ScholarGoogle Scholar
  3. M. D. Brandt, H. Loftsson, H. Sigurrsson, and F. M. Tyers. 2011. Apertium-IceNLP: A rule-based Icelandic to English machine translation system. In Proceedings of the 16th Annual Conference of the European Association of Machine Translation.Google ScholarGoogle Scholar
  4. P. C. Chang, M. Galley, and C. D. Manning. 2008. Optimizing Chinese word segmentation for machine translation performance. In Proceedings of the 3rd Workshop on Statistical Machine Translation (StatMT’08). Association for Computational Linguistics, Stroudsburg, PA, 224--232. http://dl.acm.org/citation.cfm?id=1626394.1626430 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. M. Corbí-Bellot, M. L. Forcada, S. Ortiz-Rojas, J. A. Pérez-Ortiz, G. Ramírez-Sánchez, F. Sánchez-Martínez, I. Alegria, A. Mayor, and K. Sarasola. 2005. An open-source shallow-transfer machine translation engine for the romance languages of Spain. In Proceedings of the 10th Conference of the European Association for Machine Translation. 79--86.Google ScholarGoogle Scholar
  6. J. P. Martínez Cortés, J. O’Regan, and F. M. Tyers. 2012. Free/open source shallow-transfer based machine translation for Spanish and Aragonese. In LREC. 2153--2157.Google ScholarGoogle Scholar
  7. M. R. Costa-Jussà, M. Farrús, J. B. Mariño, and J. A. R. Fonollosa. 2012. Study and comparison of rule-based and statistical catalan-spanish machine translation systems. Computing and Informatics 31, 2 (2012), 245--270.Google ScholarGoogle Scholar
  8. M. R. Costa-jussà, C. A. Henríquez Q, and R. E. Banchs. 2012. Evaluating indirect strategies for Chinese-Spanish statistical machine translation. Journal of Artificial Intelligence Research 45, 1 (Sept. 2012), 761--780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Cuza, M. Perez-Leroux, and L. Sánchez. 2013. The role of semantic transfer in clitic-drop among Chinese L1-Spanish L2 bilinguals. Studies in Second Language Acquisition 35, 1 (2013), 93--125.Google ScholarGoogle ScholarCross RefCross Ref
  10. M. Farrús, M. R. Costa-jussà, and M. Popović. 2012. Study and correlation analysis of linguistic, perceptual, and automatic machine translation evaluations. Journal of the Association for Information Sciences and Technolgy (JASIST) 63, 1 (Jan. 2012), 174--184. DOI:http://dx.doi.org/10.1002/asi.21674 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. L. Forcada, M. Ginestí-Rosell, J. Nordfalk, J. O’Regan, S. Ortiz-Rojas, J. A. Pérez-Ortiz, F. Sánchez-Martínez, G. Ramírez-Sánchez, and F. M. Tyers. 2011. Apertium: A free/open-source platform for rule-based machine translation. Machine Translation 25, 2 (2011), 127--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Gonzàlez, J. Giménez, and Ll. Màrquez. 2012. A graphical interface for MT evaluation and error analysis. In The 50th Annual Meeting of the Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. A. Henríquez Q., R. E. Banchs, and J. B. Mariño. 2010. Learning Reordering Models for Statistical Machine Translation with a Pivot Language. (2010). Internal Report TALP-UPC.Google ScholarGoogle Scholar
  14. W. J. Hutchins and L. Sommers. 1992. An Introduction to Machine Translation. Academic Press. 362.Google ScholarGoogle Scholar
  15. IndoAsianNews. 2013. China’s trade with Latin America grew in 2011. Retrieved from http://in.news.yahoo.com/chinas-trade-latin-america-grew-2011-050334275.html.Google ScholarGoogle Scholar
  16. P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL’07). 177--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Koehn, F. J. Och, and D. Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 41th Annual Meeting of the Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Lavie and A. Agarwal. 2007. Meteor: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of the 2nd Workshop on Statistical Machine Translation (StatMT’07). Association for Computational Linguistics, Stroudsburg, PA, 228--231. http://dl.acm.org/citation.cfm?id=1626355.1626389 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C.-Y. Lin and F. J. Och. 2004. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (ACL’04). Association for Computational Linguistics, Stroudsburg, PA, Article 605. DOI:http://dx.doi.org/10.3115/1218955.1219032 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Mayor, I. Alegria, A. Díaz de Ilarrraza, G. Labaka, M. Lersundi, and K. Sarasola. 2011. Matxin, an open-source rule-based machine translation system for Basque. Machine Translation 25 (2011), 53--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. I. A. McCowan, D. Moore, J. Dines, D. Gatica-Perez, M. Flynn, P. Wellner, and H. Bourlard. 2004. On the Use of Information Retrieval Measures for Speech Recognition Evaluation. Idiap-RR Idiap-RR-73-2004. IDIAP, Martigny, Switzerland.Google ScholarGoogle Scholar
  22. F. J. Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41th Annual Meeting of the Association for Computational Linguistics. 160--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. F. J. Och and H. Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29, 1 (March 2003), 19--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. F. J. Och and H. Ney. 2004. The alignment template approach to statistical machine translation. Computational Linguistics 30, 4 (Dec. 2004), 417--449. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL’02). Association for Computational Linguistics, Stroudsburg, PA, 311--318. DOI:http://dx.doi.org/10.3115/1073083.1073135 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Q-SuccessConsulting. 2013. Web Technology Surveys. Retrieved from http://w3techs.com/technologies/overview/content_language/all.Google ScholarGoogle Scholar
  27. A. Rafalovitch and R. Dale. 2009. United nations general assembly resolutions: A six-language parallel corpus. In Proceedings of the MT Summit XII. Ottawa, 292--299.Google ScholarGoogle Scholar
  28. V. Sánchez-Cartagena. 2014. Building machine translation systems for language pairs with scarce resources. Ph.D. Dissertation. Departament de Llenguatges i Sistemes Infomàtics, Universitat d'Alacant, Spain.Google ScholarGoogle Scholar
  29. F. Sánchez-Martínez and M. L. Forcada. 2009a. Inferring shallow-transfer machine translation rules from small parallel corpora. Journal of Artificial Intelligence Research 34 (2009), 605--635. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. F. Sánchez-Martínez and M. L. Forcada. 2009b. Inferring shallow-transfer machine translation rules from small parallel corpora. Journal of Artificial Intelligence Research 34 (2009), 605--635. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Stolcke. 2002. SRILM: An extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing. 901--904.Google ScholarGoogle Scholar
  32. F. M. Tyers. 2013. Feasible Lexical Selection for Rule-Based Machine Translation. Ph.D. Dissertation. PhD thesis 2013. Departament de Llenguatges i Sistemes Infomtics, Universitat d’Alacant, Spain.Google ScholarGoogle Scholar
  33. Y. Zhang, N. Wu, and M. Yip. 2006. Lexical ambiguity resolution in Chinese sentence processing. Handbook of East Asian Psycholinguistics 1 (2006), 268--278.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Description of the Chinese-to-Spanish Rule-Based Machine Translation System Developed Using a Hybrid Combination of Human Annotation and Statistical Techniques

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 15, Issue 1
      January 2016
      89 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/2847552
      Issue’s Table of Contents

      Copyright © 2015 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 November 2015
      • Accepted: 1 February 2015
      • Revised: 1 December 2014
      • Received: 1 February 2014
      Published in tallip Volume 15, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!