skip to main content
research-article

A BERT-Based Two-Stage Model for Chinese Chengyu Recommendation

Authors Info & Claims
Published:12 August 2021Publication History
Skip Abstract Section

Abstract

In Chinese, Chengyu are fixed phrases consisting of four characters. As a type of idioms, their meanings usually cannot be derived from their component characters. In this article, we study the task of recommending a Chengyu given a textual context. Observing some of the limitations with existing work, we propose a two-stage model, where during the first stage we re-train a Chinese BERT model by masking out Chengyu from a large Chinese corpus with a wide coverage of Chengyu. During the second stage, we fine-tune the re-trained, Chengyu-oriented BERT on a specific Chengyu recommendation dataset. We evaluate this method on ChID and CCT datasets and find that it can achieve the state of the art on both datasets. Ablation studies show that both stages of training are critical for the performance gain.

References

  1. Danqi Chen, Jason Bolton, and Christopher D. Manning. 2016. A thorough examination of the CNN/daily mail reading comprehension task. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 2358–2367. https://doi.org/10.18653/v1/P16-1223Google ScholarGoogle Scholar
  2. Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang, Shijin Wang, and Guoping Hu. 2019a. Pre-Training with Whole Word Masking for Chinese BERT. arxiv:cs.CL/1906.08101. http://arxiv.org/abs/cs.CL/1906.08101.Google ScholarGoogle Scholar
  3. Yiming Cui, Ting Liu, Wanxiang Che, Li Xiao, Zhipeng Chen, Wentao Ma, Shijin Wang, and Guoping Hu. 2019b. A span-extraction dataset for Chinese machine reading comprehension. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, 5883–5889. https://doi.org/10.18653/v1/D19-1600Google ScholarGoogle Scholar
  4. Andrew M. Dai and Quoc V. Le. 2015. Semi-supervised sequence learning. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'15). MIT Press, Cambridge, MA, 3079–3087. http://dl.acm.org/citation.cfm?id=2969442.2969583. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171–4186. https://doi.org/10.18653/v1/N19-1423Google ScholarGoogle Scholar
  6. Xingyi Duan, Baoxin Wang, Ziyue Wang, Wentao Ma, Yiming Cui, Dayong Wu, Shijin Wang, Ting Liu, Tianxiang Huo, Zhen Hu, et al.2019. CJRC: A reliable human-annotated benchmark dataset for Chinese judicial reading comprehension. Chinese Computational Linguistics (2019), 439–451. https://doi.org/10.1007/978-3-030-32381-3_36Google ScholarGoogle Scholar
  7. Paul Ekman. 1992. An argument for basic emotions. Cognition & Emotion 6, 3–4 (1992), 169–200.Google ScholarGoogle ScholarCross RefCross Ref
  8. Chikara Hashimoto, Satoshi Sato, and Takehito Utsuro. 2006. Japanese idiom recognition: Drawing a line between literal and idiomatic meanings. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions. Association for Computational Linguistics, 353–360. https://www.aclweb.org/anthology/P06-2046. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., 1693–1701. http://papers.nips.cc/paper/5945-teaching-machines-to-read-and-comprehend.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Wan Yu Ho, Christine Kng, Shan Wang, and Francis Bond. 2014. Identifying idioms in Chinese translations. In LREC. 716–721.Google ScholarGoogle Scholar
  11. Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 328–339. https://doi.org/10.18653/v1/P18-1031Google ScholarGoogle ScholarCross RefCross Ref
  12. Zhiying Jiang, Boliang Zhang, Lifu Huang, and Heng Ji. 2018. Chengyu cloze test. In Proceedings of the 13th Workshop on Innovative Use of NLP for Building Educational Applications. Association for Computational Linguistics, 154–158. https://doi.org/10.18653/v1/W18-0516Google ScholarGoogle ScholarCross RefCross Ref
  13. Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, and Omer Levy. 2020. SpanBERT: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics 8 (2020), 64–77. https://doi.org/10.1162/tacl_a_00300Google ScholarGoogle ScholarCross RefCross Ref
  14. Graham Katz and Eugenie Giesbrecht. 2006. Automatic identification of non-compositional multi-word expressions using latent semantic analysis. In Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties. Association for Computational Linguistics, 12–19. https://www.aclweb.org/anthology/W06-1203. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. ALBERT: A Lite BERT for self-supervised learning of language representations. In International Conference on Learning Representations. https://openreview.net/forum?id=H1eA7AEtvS.Google ScholarGoogle Scholar
  16. Haizhou Li and Baosheng Yuan. 1998. Chinese word segmentation. In Proceedings of the 12th Pacific Asia Conference on Language, Information and Computation. Chinese and Oriental Languages Information Processing Society, Singapore, 212–217. https://doi.org/2065/12081Google ScholarGoogle Scholar
  17. Dekang Lin. 1999. Automatic identification of non-compositional phrases. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 317–324. https://doi.org/10.3115/1034678.1034730 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019a. RoBERTa: A robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019). arxiv:1907.11692. http://arxiv.org/abs/1907.11692.Google ScholarGoogle Scholar
  19. Yuanchao Liu, Bo Pang, and Bingquan Liu. 2019b. Neural-based Chinese idiom recommendation for enhancing elegance in essay writing. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 5522–5526. https://doi.org/10.18653/v1/P19-1552Google ScholarGoogle ScholarCross RefCross Ref
  20. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 3111–3119. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, 2227–2237. https://doi.org/10.18653/v1/N18-1202Google ScholarGoogle ScholarCross RefCross Ref
  22. Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. (2018). https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.Google ScholarGoogle Scholar
  23. Ivan A. Sag, Timothy Baldwin, Francis Bond, Ann A. Copestake, and Dan Flickinger. 2002. Multiword expressions: A pain in the neck for NLP. In Proceedings of the 3rd International Conference on Computational Linguistics and Intelligent Text Processing (CICLing'02). Springer-Verlag, Berlin, 1–15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Chih Chieh Shao, Trois Liu, Yuting Lai, Yiying Tseng, and Sam Tsai. 2018a. DRCD: A Chinese Machine Reading Comprehension Dataset. arxiv:cs.CL/1806.00920Google ScholarGoogle Scholar
  25. Yutong Shao, Rico Sennrich, Bonnie Webber, and Federico Fancellu. 2018b. Evaluating machine translation performance on Chinese idioms with a blacklist method. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA). https://www.aclweb.org/anthology/L18-1005.Google ScholarGoogle Scholar
  26. Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Hao Tian, Hua Wu, and Haifeng Wang. 2020. ERNIE 2.0: A continual pre-training framework for language understanding. Proceedings of the AAAI Conference on Artificial Intelligence 34, 05 (Apr. 2020), 8968–8975. https://doi.org/10.1609/aaai.v34i05.6428Google ScholarGoogle ScholarCross RefCross Ref
  27. Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 1556–1566. https://doi.org/10.3115/v1/P15-1150Google ScholarGoogle Scholar
  28. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 5998–6008. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lei Wang and Shiwen Yu. 2010. Construction of Chinese idiom knowledge-base and its applications. In Proceedings of the 2010 Workshop on Multiword Expressions: From Theory to Applications. Coling 2010 Organizing Committee, 11–18. https://www.aclweb.org/anthology/W10-3703.Google ScholarGoogle Scholar
  30. Shuohang Wang and Jing Jiang. 2017. A Compare-aggregate model for matching text sequences. In 5th International Conference on Learning Representations (ICLR'17), Conference Track Proceedings. https://openreview.net/forum?id=HJTzHtqee.Google ScholarGoogle Scholar
  31. Wei Wang, Bin Bi, Ming Yan, Chen Wu, Jiangnan Xia, Zuyi Bao, Liwei Peng, and Luo Si. 2020. StructBERT: Incorporating language structures into pre-training for deep language understanding. In International Conference on Learning Representations. https://openreview.net/forum?id=BJgQ4lSFPH.Google ScholarGoogle Scholar
  32. Liang Xu, Hai Hu, Xuanwei Zhang, Lu Li, Chenjie Cao, Yudong Li, Yechen Xu, Kai Sun, Dian Yu, Cong Yu, Yin Tian, Qianqian Dong, Weitang Liu, Bo Shi, Yiming Cui, Junyi Li, Jun Zeng, Rongzhao Wang, Weijian Xie, Yanting Li, Yina Patterson, Zuoyu Tian, Yiwen Zhang, He Zhou, Shaoweihua Liu, Zhe Zhao, Qipeng Zhao, Cong Yue, Xinrui Zhang, Zhengliang Yang, Kyle Richardson, and Zhenzhong Lan. 2020. CLUE: A Chinese language understanding evaluation benchmark. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, (Online)4762–4772. https://www.aclweb.org/anthology/2020.coling-main.419.Google ScholarGoogle Scholar
  33. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R. Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems. 5754–5764. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Xu Linhong Lin Hongfei Pan Yu and Ren Hui Chen Jianmei. 2008. Constructing the affective lexicon ontology. Journal of the China Society for Scientific and Technical Information 2 (2008), 6.Google ScholarGoogle Scholar
  35. Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, and Qun Liu. 2019. ERNIE: Enhanced language representation with informative entities. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1441–1451. https://doi.org/10.18653/v1/P19-1139Google ScholarGoogle ScholarCross RefCross Ref
  36. Chujie Zheng, Minlie Huang, and Aixin Sun. 2019. ChID: A large-scale Chinese idiom dataset for cloze test. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 778–787. https://doi.org/10.18653/v1/P19-1075Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A BERT-Based Two-Stage Model for Chinese Chengyu Recommendation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 20, Issue 6
      November 2021
      439 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3476127
      Issue’s Table of Contents

      Copyright © 2021 Association for Computing Machinery.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 August 2021
      • Accepted: 1 February 2021
      • Revised: 1 January 2021
      • Received: 1 March 2020
      Published in tallip Volume 20, Issue 6

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!