skip to main content
short-paper

Multilingual BERT-based Word Alignment By Incorporating Common Chinese Characters

Published:19 June 2023Publication History
Skip Abstract Section

Abstract

Word alignment is an important task of detecting translation equivalents between a sentence pair. Although word alignment is no longer necessarily needed for neural machine translation, it’s still useful in a wealth of applications, e.g., bilingual lexicon induction, constraint decoding, and so on. However, the most well-known word aligners are still Giza++ and fastAlign, both of which are implementations of traditional IBM models. To keep pace with the advance in NMT, there has been a surge of interest in replacing the IBM models with neural models. We follow this trend but aim to boost performance of word alignment between Japanese and Chinese, which share a large portion of Chinese characters. Our key idea is to leverage these common Chinese characters in both languages as an indicator for inferring alignment; i.e., the source and target words with the common Chinese characters should be most likely aligned. Following this idea, we propose three methods that leverage common Chinese characters to boost the mBERT-based word alignment, including reward factor, representation alignment, and contrastive training. Furthermore, we annotate and release a golden dataset for Japanese-Chinese word alignment. Experiments on the dataset show that our methods outperform several strong baselines in terms of AER score and verify the effectiveness of exploiting common Chinese characters.

REFERENCES

  1. [1] Brown Peter F., Pietra Vincent J. Della, and DellaPietra Stephen A.. 1993. The mathematics of statistical machine translation. Computational Linguistics 19, 2 (1993), 263311.Google ScholarGoogle Scholar
  2. [2] Cao Steven, Kitaev Nikita, and Klein Dan. 2020. Multilingual alignment of contextual word representations. In Proceedings of ICLR (ICLR’20). Addis Ababa.Google ScholarGoogle Scholar
  3. [3] Chen Chi, Sun Maosong, and Liu Yang. 2021. Mask-align: Self-supervised neural word alignment. In Proceedings of ACL/IJCNLP (ACL/IJCNLP.21), Vol. 1, 47814791.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Chen Ting, Kornblith Simon, Norouzi Mohammad, and Hinton Geoffrey E.. 2020. A simple framework for contrastive learning of visual representations. In Proceedings of ICML (ICML’20). Vol. 119, 1597–1607.Google ScholarGoogle Scholar
  5. [5] Chen Yun, Liu Yang, Chen Guanhua, Jiang Xin, and Liu Qun. 2020. Accurate word alignment induction from neural machine translation. In Proceedings of EMNLP (EMNLP’20). 566–576.Google ScholarGoogle Scholar
  6. [6] Chu Chenhui, Nakazawa Toshiaki, Kawahara Daisuke, and Kurohashi Sadao. 2013. Chinese-Japanese machine translation exploiting chinese characters. ACM Trans. Asian Lang. Inf. Process. 12, 4 (2013), 16:1–16:25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Conneau Alexis and Lample Guillaume. 2019. Cross-lingual language model pretraining. In Proceedings of NIPS (NIPS’19). Vol. 1.Google ScholarGoogle Scholar
  8. [8] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT (NAACL-HLT’19). Vol. 32, 4171–4186.Google ScholarGoogle Scholar
  9. [9] Dou Zi-Yi and Neubig Graham. 2021. Word alignment by fine-tuning embeddings on parallel corpora. In Proceedings of EACL (EACL’21). 21122128.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Dyer Chris, Chahuneau Victor, and Smith Noah A.. 2013. A simple, fast, and effective reparameterization of IBM model 2. In Proceedings of HLT-NAACL (HLT-NAACL’13). 644–648.Google ScholarGoogle Scholar
  11. [11] Gao Jun, He Di, Tan Xu, Qin Tao, Wang Liwei, and Liu Tieyan. 2019. Representation degeneration problem in training natural language generation models. CoRR abs/1907.12009 (2019).Google ScholarGoogle Scholar
  12. [12] Garg Sarthak, Peitz Stephan, Nallasamy Udhyakumar, and Paulik Matthias. 2019. Jointly learning to align and translate with transformer models. In Proceedings of EMNLP-IJCNLP (EMNLP-IJCNLP’19). Vol. 1, 4452–4461.Google ScholarGoogle Scholar
  13. [13] Huck Matthias, Dutka Diana, and Fraser Alexander. 2018. Cross-lingual annotation projection is effective for neural part-of-speech tagging. In Proceedings of the 6th Workshop on NLP for Similar Languages, Varieties and Dialects (WSSVD’18). Vol. 1.Google ScholarGoogle Scholar
  14. [14] Kulshreshtha Saurabh, García José Luis Redondo, and Chang Ching-Yun. 2020. Cross-lingual alignment methods for multilingual BERT: A comparative study. In Proceedings of EMNLP (EMNLP’20). 933–942.Google ScholarGoogle Scholar
  15. [15] Li Zezhong, Ren Fuji, Sun Xiao, Huang Degen, and Shi Piao. 2023. Exploiting Japanese-Chinese cognates with shared private representations for neural machine translation. ACM Trans. Asian Lang. Inf. Process. 22, 1 (2023), 28:1–28:12.Google ScholarGoogle Scholar
  16. [16] Nagata Masaaki, Chousa Katsuki, and Nishino Masaaki. 2020. A supervised word alignment method based on cross-language span prediction using multilingual BERT. In Proceedings of EMNLP (EMNLP’20). 555–565.Google ScholarGoogle Scholar
  17. [17] Nakazawa Toshiaki, Yaguchi Manabu, Uchimoto Kiyotaka, Utiyama Masao, Sumita Eiichiro, Kurohashi Sadao, and Isahara Hitoshi. 2016. ASPEC: Asian scientific paper excerpt corpus. In Proceedings of LREC (LREC’16).Google ScholarGoogle Scholar
  18. [18] Och Franz Josef and Ney Hermann. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29, 1 (2003), 1951.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Okita Tsuyoshi. 2012. Annotated corpora for word alignment between Japanese and English and its evaluation with MAP-based word aligner. In Proceedings of ELRA (ELRA’12).Google ScholarGoogle Scholar
  20. [20] Pal Santanu, Naskar Sudip Kumar, Vela Mihaela, Liu Qun, and Genabith Josef van. 2017. Neural automatic post-editing using prior alignment and reranking. In Proceedings of EACL (EACL’17). Vol. 2, 349–355.Google ScholarGoogle Scholar
  21. [21] Pires Telmo, Schlinger Eva, and Garrette Dan. 2019. How multilingual is multilingual BERT? In Proceedings of ACL (ACL’19), Vol. 1, 49965001.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Sabet Masoud Jalili, Dufter Philipp, Yvon François, and Schütze Hinrich. 2020. SimAlign: High quality word alignments without parallel training data using static and contextualized embeddings. In Proceedings of EMNLP (EMNLP’20). 1627–1643.Google ScholarGoogle Scholar
  23. [23] Schuster Mike and Nakajima Kaisuke. 2012. Japanese and Korean voice search. In Proceedings of ICASSP (ICASSP’12). 5149–5152.Google ScholarGoogle Scholar
  24. [24] Su Jinsong, Zhang Biao, Xiong Deyi, Li Ruochen, and Yin Jianmin. 2016. Convolution-enhanced bilingual recursive neural network for bilingual semantic modeling. In Proceedings of COLING (COLING’16). 3071–3081.Google ScholarGoogle Scholar
  25. [25] Sutskever Ilya, Vinyals Oriol, and Le Quoc V.. 2016. Incorporating discrete translation lexicons into neural machine translation. In Proceedings of EMNLP (EMNLP’16). 1557–1567.Google ScholarGoogle Scholar
  26. [26] Xu Haoran, Durme Benjamin Van, and Murray Kenton W.. 2021. BERT, mBERT, or BiBERT? A study on contextualized embeddings for neural machine translation. In Proceedings of EMNLP (EMNLP’21). 6663–6675.Google ScholarGoogle Scholar
  27. [27] Zenkel Thomas, Wuebker Joern, and DeNero John. 2019. Adding interpretable attention to neural translation models improves word alignment. CoRR abs/1901.11359 (2019).Google ScholarGoogle Scholar
  28. [28] Zhang Biao, Xiong Deyi, and Su Jinsong. 2017. BattRAE: Bidimensional attention-based recursive autoencoders for learning bilingual phrase embeddings. In Proceedings of AAAI (AAAI’17). 3372–3378.Google ScholarGoogle Scholar
  29. [29] Zhang Biao, Xiong Deyi, Su Jinsong, and Qin Yue. 2020. Alignment-supervised bidimensional attention-based recursive autoencoders for bilingual phrase representation. IEEE Trans. Cybern. 50, 2 (2020), 503513.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Zhang Tong, Ye Wei, Yang Baosong, Zhang Long, Ren Xingzhang, Liu Dayiheng, Sun Jinan, Zhang Shikun, Zhang Haibo, and Zhao Wen. 2022. Frequency-aware contrastive learning for neural machine translation. In Proceedings of AAAI (AAAI’22), Vol. 119, 1171211720.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Zhang Yujie, Wang Zhulong, Uchimoto Kiyotaka, Ma Qing, and Isahara Hitoshi. 2008. Word alignment annotation in a Japanese-Chinese parallel corpus. In Proceedings of LREC (LREC’08).Google ScholarGoogle Scholar

Index Terms

  1. Multilingual BERT-based Word Alignment By Incorporating Common Chinese Characters

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 6
      June 2023
      635 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3604597
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 June 2023
      • Online AM: 26 April 2023
      • Accepted: 19 April 2023
      • Revised: 15 January 2023
      • Received: 1 August 2022
      Published in tallip Volume 22, Issue 6

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper
    • Article Metrics

      • Downloads (Last 12 months)64
      • Downloads (Last 6 weeks)16

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!