Abstract
Neural machine translation has achieved remarkable progress over the past several years; however, little attention has been paid to machine translation (MT) between Japanese and Chinese, which share a large proportion of cognate words that can be utilized as additional linguistic knowledge to enhance translation performance. In this article, we seek to strengthen the semantic correlation between Japanese and Chinese by leveraging cognate words that share common Chinese characters. Specifically, we experiment with three strategies: (1) a shared vocabulary with cognate lexicon induction, which models the commonality between source and target cognates; (2) a shared private representation with a dynamic gating mechanism, which models the language-specific features on the source side; and (3) an embedding shortcut, which enables the decoder to access the shared private representation with shortest distance and aids the training process. The experiments and analysis presented in this article demonstrate that our proposed approaches can significantly improve the performance of both Japanese-to-Chinese and Chinese-to-Japanese translations and verify the effectiveness of exploiting Japanese–Chinese cognates for MT.
- [1] . 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations (ICLR’15).Google Scholar
- [2] . 2018. The best of both worlds: Combining recent advances in neural machine translation. CoRR abs/1804.09849 (2018).Google Scholar
- [3] . 2013. Chinese-japanese machine translation exploiting chinese characters. ACM Trans. As. Lang. Inf. Process. 12, 4 (2013), 16:1–16:25.Google Scholar
- [4] . 2012. Chinese characters mapping table of japanese, traditional chinese and simplified chinese. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’12). 2149–2152.Google Scholar
- [5] . 2020. A comprehensive survey of multilingual neural machine translation. CoRR abs/2001.01115 (2020).Google Scholar
- [6] . 2022. Blockchain-based framework for reducing fake or vicious news spread on social media/messaging platforms. ACM Trans. As. Low Resour. Lang. Inf. Process. 21, 1 (2022), 8:1–8:13.Google Scholar
- [7] . 2019. Widening the representation bottleneck in neural machine translation with lexical shortcuts. In Proceedings of the Conference on Machine Translation. 102–115.Google Scholar
Cross Ref
- [8] . 2017. Convolutional sequence to sequence learning. In Proceedings of the International Conference on Machine Learning (ICML’17). 1243–1252.Google Scholar
- [9] . 2017. OpenNMT: Open-source toolkit for neural machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’17). 67–72.Google Scholar
Cross Ref
- [10] . 2018. Attention focusing for neural machine translation by bridging source and target embeddings. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’18). Melbourne, Australia, 1767–1776.Google Scholar
Cross Ref
- [11] . 2018. Japanese predicate conjugation for neural machine translation. CoRR abs/1805.10047 (2018).Google Scholar
- [12] . 2014. Deeply-supervised nets. CoRR abs/1409.5185 (2014).Google Scholar
- [13] . 2019. Shared-private bilingual word embeddings for neural machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’19). 3613–3622.Google Scholar
Cross Ref
- [14] . 2016. ASPEC: Asian scientific paper excerpt corpus. In Proceedings of International Conference on Language Resources and Evaluation (LREC’16).Google Scholar
- [15] . 2021. A two-stage text feature selection algorithm for improving text classification. ACM Trans. As. Low Resour. Lang. Inf. Process. 20, 3 (2021), 49:1–49:19.Google Scholar
- [16] . 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’02). 311–318.Google Scholar
- [17] . 2019. Korean-to-chinese machine translation using chinese character as pivot clue. CoRR abs/1911.11008 (2019).Google Scholar
- [18] . 2016. Neural machine translation of rare words with subword units. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’16). 1715–1725.Google Scholar
Cross Ref
- [19] . 2015. Highway networks. CoRR abs/1505.00387 (2015).Google Scholar
- [20] . 2014. Sequence to sequence learning with neural networks. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NIPS’14). 3104–3112.Google Scholar
- [21] . 2017. Attention is all you need. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NIPS’17). 5998–6008.Google Scholar
- [22] . 2020. Neural machine translation with byte-level subwords. In Proceedings of the Annual AAAI Conference on Artificial Intelligence (AAAI’20). New York, NY, 9154–9160.Google Scholar
Cross Ref
- [23] . 2018. Exploiting common characters in chinese and japanese to learn cross-lingual word embeddings via matrix factorization. In Proceedings of the 3rd Workshop on Representation Learning for NLP @ACL. 113–121.Google Scholar
Cross Ref
- [24] . 2018. Graph-based bilingual word embedding for statistical machine translation. ACM Trans. As. Low Resour. Lang. Inf. Process. 17, 4 (
October 2018), 31:1–31:23.Google Scholar - [25] . 2018. Word attention for sequence to sequence text understanding. In Proceedings of the 32th Conference on Artificial Intelligence. 5578–5585.Google Scholar
Cross Ref
- [26] . 2021. Using sub-character level information for neural machine translation of logographic languages. ACM Trans. As. Low Resour. Lang. Inf. Process. 20, 2 (2021), 31:1–31:15.Google Scholar
Index Terms
Exploiting Japanese–Chinese Cognates with Shared Private Representations for NMT
Recommendations
Chinese-Japanese Machine Translation Exploiting Chinese Characters
The Chinese and Japanese languages share Chinese characters. Since the Chinese characters in Japanese originated from ancient China, many common Chinese characters exist between these two languages. Since Chinese characters contain significant semantic ...
Arabic-Malay Cognates as a Computer Assisted Language Learning
ACSAT '14: Proceedings of the 2014 3rd International Conference on Advanced Computer Science Applications and TechnologiesThis paper aims to build an Arabic WEB-based learning system for Malaysian learners focusing on cognate words in the Arabic language and Malay language. This paper focuses only on the highlighted cognates in the lessons interface in the WEB-based ...
Multilingual BERT-based Word Alignment By Incorporating Common Chinese Characters
Word alignment is an important task of detecting translation equivalents between a sentence pair. Although word alignment is no longer necessarily needed for neural machine translation, it’s still useful in a wealth of applications, e.g., bilingual ...






Comments