Abstract
English and Hindi have significantly different word orders. English follows the subject-verb-object (SVO) order, while Hindi primarily follows the subject-object-verb (SOV) order. This difference poses challenges to modeling this pair of languages for translation. In phrase-based translation systems, word reordering is governed by the language model, the phrase table, and reordering models. Reordering in such systems is generally achieved during decoding by transposing words within a defined window. These systems can handle local reorderings, and while some phrase-level reorderings are carried out during the formation of phrases, they are weak in learning long-distance reorderings. To overcome this weakness, researchers have used reordering as a step in pre-processing to render the reordered source sentence closer to the target language in terms of word order. Such approaches focus on using parts-of-speech (POS) tag sequences and reordering the syntax tree by using grammatical rules, or through head finalization. This study shows that mere head finalization is not sufficient for the reordering of sentences in the English-Hindi language pair. It describes various grammatical constructs and presents a comparative evaluation of reorderings with the original and the head-finalized representations. The impact of the reordering on the quality of translation is measured through the BLEU score in phrase-based statistical systems and neural machine translation systems. A significant gain in BLEU score was noted for reorderings in different grammatical constructs.
- D. Bahdanau, K. Cho, and Y. Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15).Google Scholar
- A. Bansal, E. Banerjee, and G. N. Jha. 2013. Corpora creation for Indian language technologies---The ILCI Project. In Proceedings of Language Technology Conference (LTC’13).Google Scholar
- Luisa Bentivogli, Arianna Bisazza, Mauro Cettolo, and Marcello Federico. 2016. Neural versus phrase based machine translation quality: A case study. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 257–267.Google Scholar
Cross Ref
- A. Bisazza and M. Federico. 2016. A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena. Comput. Ling. 42, 2 (2016), 163--205.Google Scholar
Digital Library
- A. Bisazza and F. Marcello. 2010. Chunk-based verb reordering in VSO sentences for Arabic-English statistical machine translation. In Proceedings of the Joint 5th Workshop on Statistical Machine Translation and Metrics MATR (WMT’10).Google Scholar
- Rajen Chatterjee, Anoop Kunchukuttan, and Pushpak Bhattacharyya. 2014. Supertag based pre-ordering in machine translation. In Proceedings of the 11th International Conference on Natural Language Processing.Google Scholar
- Kehai Chen, Rui Wang, Masao Utiyama, and Eiichiro Sumita. 2019. Neural machine translation with reordering embeddings. In Proceedings of the 57th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics.Google Scholar
Cross Ref
- M. Collins, P. Koehn, and I. Ku Cerová. 2005. Clause restructuring for statistical machine translation. In Proceedings of the Meeting of the Association for Computational Linguistics. 531–540.Google Scholar
- J. M. Crego and N. Habash. 2008. Using shallow syntax information to improve word alignment and reordering for SMT. In Proceedings of the 3rd Workshop on Statistical Machine Translation (StatMT’08). 53–61.Google Scholar
- D. Crystal. 1995. The Cambridge Encyclopedia of the English Language. Cambridge University Press, Cambridge, MA.Google Scholar
- M. Feng, J. Peter, and H. Ney. 2013. Advancements in reordering models for statistical machine translation. In Proceedings of the 51st Meeting of the Association for Computational Linguistics. 322–332.Google Scholar
- K. Heafield. 2011. KenLM: Faster and smaller language model queries. In Proceedings of the EMNLP 6th Workshop on Statistical Machine Translation. 187–197.Google Scholar
- H. Isozaki, K. Sudoh, H. Tsukada, and K. Duh. 2010. Head finalization: A simple reordering rule for SOV languages. In Proceedings of the Joint 5th Workshop on Statistical Machine Translation and Metrics (MATR’10). 244–251.Google Scholar
- Y. Kachru. 1980. Aspects of Hindi Syntax. Manohar Lal Publications, Delhi.Google Scholar
- Yuki Kawara, Chenhui Chu, and Yuki Arase. 2018. Recursive neural network based preordering for English-to-Japanese machine translation. In Proceedings of the ACL Student Research Workshop. Association for Computational Linguistics, 21–27.Google Scholar
Cross Ref
- M. G. Kendall. 1938. A new measure of rank correlation. Biometrika 30, 1 (1938), 81–93.Google Scholar
Digital Library
- M. G. Kendall. 1975. Rank correlation methods. Charles Griffin (1975).Google Scholar
- G. Klein, Y. Kim, Y. Deng, J. Senellart, and A. M. Rush. 2017. OpenNMT: Open-source Toolkit for Neural Machine Translation. Retrieved from: ArXiv e-prints ArXiv:1701.02810.Google Scholar
- P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Shen Cowan, W. Moran, C. Zens, R. Dyer, C. Bojar, and O. Constantin. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 177–180.Google Scholar
- P. Koehn, F. J. Och, and D. Marcu. 2003. Statistical phrase-based translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 127–133.Google Scholar
- A. Kunchukuttan, P. Mehta, and P. Bhattacharyya. 2018. The IIT Bombay English-Hindi parallel corpus. In Proceedings of the Language Resource and Evaluation Conference (LREC’18).Google Scholar
- Junhui Li, Deyi Xiong, Zhaopeng Tu, Muhua Zhu, Min Zhang, and Guodong Zhou. 2017. Modeling source syntax for neural machine translation. In Proceedings of the 55th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 688–697.Google Scholar
Cross Ref
- Rudra Murthy, Anoop Kunchukuttan, and Pushpak Bhattacharyya. 2019. Addressing word-order divergence in multilingual neural machine translation for extremely low resource languages. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 3868–3873. Retrieved from https://www.aclweb.org/anthology/N19-1387.Google Scholar
Cross Ref
- J. Niehues and M. Kolss. 2009. A POS-based model for long-range reorderings in SMT. In Proceedings of the 4th Workshop on Statistical Machine Translation (WMT'09).Google Scholar
- T. Ninomiya, Y. Tsuruoka, Y. Miyao, K. Taura, and J. Tsujii. 2006. Fast and scalable HPSG parsing. Trait. Autom. Lang. 46 (2006), 2. Association pour le Traitement Automatique des Langues.Google Scholar
- F. Och and H. Ney. 2003. A systematic comparison of various statistical alignment models. Comput. Ling. 29, 1 (2003), 19--51.Google Scholar
Digital Library
- Franz J. Och. 2003. Minimum error rate training for statistical machine translation. In Proceedings of the 41st Meeting of the Association for Computational Linguistics (ACL’03) , 160–167.Google Scholar
Digital Library
- F. J. Och and H. Ney. 2004. The alignment template approach to statistical machine translation. Comput. Ling. 30, 4 (2004).Google Scholar
- K. Papineni, S. Roukos, T. Ward, and W. J. Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Meeting of the Association for Computational Linguistics (ACL’02). 311–318.Google Scholar
- C. Pollard and I. A. Sag. 1994. Head-driven Phrase Structure Grammar. University of Chicago Press, Chicago.Google Scholar
- M. Popovic and H. Ney. 2006. POS-based word reorderings for statistical machine translation. In Proceedings of International Conference on Language Resources and Evaluation (LREC’06).Google Scholar
- R. Quirk, S. Greenbaum, G. Leech, and J. Svartvik. 1985. A Comprehensive Grammar of the English Language. Longman. [CGEL], London.Google Scholar
- A. Ramanathan, P. Bhattacharyya, J. Hegde, R. M. Shah, and M. Sasikumar. 2008. Simple syntactic and morphological processing can help English-Hindi SMT. In Proceedings of the International Joint Conference on Natural Language Processing.Google Scholar
- K. Rottmann and S. Vogel. 2007. Word reordering in statistical machine translation with a POS-based distortion model. In Proceedings of the Conference on Theoretical and Methodological Issues in Machine Translation (TMI’07) (2007), 171–180.Google Scholar
- S. B. Singh. 2010. A Syntactic Grammar of Hindi (first ed.). Ocean Books.Google Scholar
- R. M. K. Sinha and A. Thakur. 2008. A study of the translation divergence in English and Hindi MT. CSI J. 38 (2008), 3.Google Scholar
- C. Tillmann. 2004. A unigram orientation model for statistical machine translation. In Proceedings of the Joint Conference on Human Language Technologies and the Meeting of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL’04). 101–104.Google Scholar
Cross Ref
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, and A. N. Gomez. 2017. Attention is all you need. In Proceedings of the 30th Neural Information Processing Systems Conference (NIPS'17).Google Scholar
- F. Xia and M. McCord. 2004. Improving a statistical MT system with automatically learned rewrite patterns. In Proceedings of the International Conference on Computational Linguistics.Google Scholar
- D. Xiong, L. Qun, and S. Lin. 2006. Maximum entropy based phrase reordering model for statistical machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Meeting of the Association for Computational Linguistics. 521–528.Google Scholar
- R. Zens and H. Ney. 2006. Discriminative reordering models for statistical machine translation. In Proceedings of the Workshop on Statistical Machine Translation. 55–63.Google Scholar
- Y. Zhang, R. Zens, and H. Ney. 2007. Chunk-level reordering of source language sentences with automatically learned rules for statistical machine translation. In Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation. Association for Computational Linguistics.Google Scholar
- Yang Zhao, Jiajun Zhang, and Chengqing Zong. 2018. Exploiting pre-ordering for neural machine translation. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC’18). European Language Resource Association.Google Scholar
- C. Zhou, X. Ma, J. Hu, and G. Neubig. 2019. Handling syntactic divergence in low-resource machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 1388–1394.Google Scholar
Index Terms
Source-side Reordering to Improve Machine Translation between Languages with Distinct Word Orders
Recommendations
HPSG-Based Preprocessing for English-to-Japanese Translation
Japanese sentences have completely different word orders from corresponding English sentences. Typical phrase-based statistical machine translation (SMT) systems such as Moses search for the best word permutation within a given distance limit (...
Syntax-based reordering for statistical machine translation
Abstract: In this paper, we develop an approach called syntax-based reordering (SBR) to handling the fundamental problem of word ordering for statistical machine translation (SMT). We propose to alleviate the word order challenge including morpho-...
A Reordering Model for Phrase-Based Machine Translation
GoTAL '08: Proceedings of the 6th international conference on Advances in Natural Language ProcessingThis paper presents a new method for reordering in phrase based statistical machine translation (PBSMT). Our method is based on previous chunk-level reordering methods for PBSMT. Our method is a global reordering. First, we parse the source language ...






Comments