skip to main content
note

Ancient–Modern Chinese Translation with a New Large Training Dataset

Authors Info & Claims
Published:31 May 2019Publication History
Skip Abstract Section

Abstract

Ancient Chinese brings the wisdom and spirit culture of the Chinese nation. Automatic translation from ancient Chinese to modern Chinese helps to inherit and carry forward the quintessence of the ancients. However, the lack of large-scale parallel corpus limits the study of machine translation in ancient–modern Chinese. In this article, we propose an ancient–modern Chinese clause alignment approach based on the characteristics of these two languages. This method combines both lexical-based information and statistical-based information, which achieves 94.2 F1-score on our manual annotation Test set. We use this method to create a new large-scale ancient–modern Chinese parallel corpus that contains 1.24M bilingual pairs. To our best knowledge, this is the first large high-quality ancient–modern Chinese dataset. Furthermore, we analyzed and compared the performance of the SMT and various NMT models on this dataset and provided a strong baseline for this task.

References

  1. Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).Google ScholarGoogle Scholar
  2. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google ScholarGoogle Scholar
  3. Peter F. Brown, Jennifer C. Lai, and Robert L. Mercer. 1991. Aligning sentences in parallel corpora. In ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).Google ScholarGoogle Scholar
  5. Andrew Finch, Taisuke Harada, Kumiko Tanaka-Ishii, and Eiichiro Sumita. 2017. Inducing a bilingual lexicon from short parallel multiword sequences. ACM Trans. Asian Low-Res. Lang. Inf. Process. 16, 3 (2017), 15:1--15:20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Yarin Gal and Zoubin Ghahramani. 2016. A theoretically grounded application of dropout in recurrent neural networks. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NIPS’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. William A. Gale and Kenneth W. Church. 1993. A program for aligning sentences in bilingual corpora. Comput. Ling. 19, 1 (1993), 75--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’16).Google ScholarGoogle ScholarCross RefCross Ref
  9. Kenneth Heafield. 2011. KenLM: Faster and smaller language model queries. In Proceedings of the 6th Workshop on Statistical Machine Translation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neur. Comput. 9, 8 (1997), 1735--1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hiroyuki Kaji, Yuuko Kida, and Yasutsugu Morimoto. 1992. Learning translation templates from bilingual text. In Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  13. Chunyu Kit, Jonathan J. Webster, King-Kui Sin, Haihua Pan, and Heng Li. 2004. Clause alignment for Hong Kong legal texts: A lexical-based approach. Int. J. Corpus Ling. 9, 1 (2004), 29--51.Google ScholarGoogle ScholarCross RefCross Ref
  14. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, et al. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the ACL on Interactive Poster and Demonstration Sessions. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Junhui Li, Deyi Xiong, Zhaopeng Tu, Muhua Zhu, Min Zhang, and Guodong Zhou. 2017. Modeling source syntax for neural machine translation. arXiv preprint arXiv:1705.01020 (2017).Google ScholarGoogle Scholar
  16. Zhun Lin and Xiaojie Wang. 2007. Chinese ancient-modern sentence alignment. In Proceedings of the International Conference on Computational Science. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ying Liu and Nan Wang. 2012. Sentence alignment for ancient and modern Chinese parallel corpus. In Emerging Research in Artificial Intelligence and Computational Intelligence.Google ScholarGoogle Scholar
  18. Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015).Google ScholarGoogle Scholar
  19. Andriy Mnih and Koray Kavukcuoglu. 2013. Learning word embeddings efficiently with noise-contrastive estimation. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NIPS’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Arbi Haza Nasution, Yohei Murakami, and Toru Ishida. 2018. A generalized constraint approach to bilingual dictionary induction for low-resource language families. ACM Trans. Asian Low-Res. Lang. Inf. Process. 17, 2 (2018), 9:1--9:29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’02). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Trans. Sign. Process. 45, 11 (1997), 2673--2681. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NIPS’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NIPS’17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Xiaojie Wang and Fuji Ren. 2005. Chinese-Japanese clause alignment. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016).Google ScholarGoogle Scholar
  27. Liu Yang, Tu Zhaopeng, Fandong Meng, Yong Cheng, and Junjie Zhai. 2018. Towards robust neural machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’18).Google ScholarGoogle Scholar
  28. Zhiyuan Zhang, Wei Li, and Xu Sun. 2018. Automatic transferring between ancient Chinese and contemporary Chinese. arXiv preprint arXiv:1803.01557 (2018).Google ScholarGoogle Scholar

Index Terms

  1. Ancient–Modern Chinese Translation with a New Large Training Dataset

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!