skip to main content
research-article

Multi-task Label-wise Transformer for Chinese Named Entity Recognition

Published:24 March 2023Publication History
Skip Abstract Section

Abstract

Benefiting from the improvement of positional encoding and the introduction of lexical knowledge, Transformer has achieved superior performance than the prevailing BiLSTM-based models in named entity recognition (NER) task. However, existing Transformer-based models for Chinese NER pay less attention to the information captured by the bottom layers of Transformer and the significance of representation subspace where each head of Transformer is projected. In this article, we propose Multi-Task Label-Wise Transformer (MTLWT). From a global perspective, we assign entity boundary prediction (EBP) and entity type prediction (ETP) tasks to the first two layers. In this way, we stimulate lower layers to participate more in constructing character representation. Besides, in each multi-head self-attention (MHSA) layer, we provide a specific focus for each individual head, making the head project into a significant subspace. Experiments on four datasets from different domains show that our proposed model achieves comparable performance with other state-of-the-art models. In particular, MTLWT outperforms the other frameworks without external knowledge on all the datasets.

REFERENCES

  1. [1] Cao Pengfei, Chen Yubo, Liu Kang, Zhao Jun, and Liu Shengping. 2018. Adversarial transfer learning for Chinese named entity recognition with self-attention mechanism. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 182192. DOI: DOI: https://doi.org/10.18653/v1/D18-1017Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Collobert Ronan, Weston Jason, Bottou Léon, Karlen Michael, Kavukcuoglu Koray, and Kuksa Pavel. 2011. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12 (2011), 24932537.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Cui Leyang and Zhang Yue. 2019. Hierarchically-refined label attention network for sequence labeling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19), Association for Computational Linguistics, 41154128.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Dai Zihang, Yang Zhilin, Yang Yiming, Carbonell Jaime G., Le Quoc, and Salakhutdinov Ruslan. 2019. Transformer-XL: Attentive language models beyond a fixed-length context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 29782988.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 41714186. DOI: DOI: https://doi.org/10.18653/v1/N19-1423Google ScholarGoogle Scholar
  6. [6] Diefenbach Dennis, Lopez Vanessa, Singh Kamal, and Maret Pierre. 2018. Core techniques of question answering systems over knowledge bases: A survey. Knowl. Inf. Syst. 55, 3 (2018), 529569.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Ding Ruixue, Xie Pengjun, Zhang Xiaoyan, Lu Wei, Li Linlin, and Si Luo. 2019. A neural multi-digraph model for Chinese NER with gazetteers. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 14621467. DOI: DOI: https://doi.org/10.18653/v1/P19-1141Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Feng Xiaocheng, Feng Xiachong, Qin Bing, Feng Zhangyin, and Liu Ting. 2018. Improving low resource named entity recognition using cross-lingual knowledge transfer. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18). International Joint Conferences on Artificial Intelligence Organization, 40714077. DOI: DOI: https://doi.org/10.24963/ijcai.2018/566Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Gui Tao, Zou Yicheng, Zhang Qi, Peng Minlong, Fu Jinlan, Wei Zhongyu, and Huang Xuanjing. 2019. A lexicon-based graph neural network for Chinese NER. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 10401050. DOI: DOI: https://doi.org/10.18653/v1/D19-1096Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] He Hangfeng and Sun Xu. 2017. F-score driven max margin neural network for named entity recognition in Chinese social media. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Association for Computational Linguistics, 713718. Retrieved from https://www.aclweb.org/anthology/E17-2113.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Huang Zhiheng, Xu Wei, and Yu Kai. 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015).Google ScholarGoogle Scholar
  12. [12] Jawahar Ganesh, Sagot Benoît, and Seddah Djamé. 2019. What does BERT learn about the structure of language? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 36513657. DOI: DOI: https://doi.org/10.18653/v1/P19-1356Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Jia Chen, Liang Xiaobo, and Zhang Yue. 2019. Cross-domain NER using cross-domain language modeling. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 24642474.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Jia Chen and Zhang Yue. 2020. Multi-cell compositional LSTM for NER domain adaptation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 59065917. DOI: DOI: https://doi.org/10.18653/v1/2020.acl-main.524Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Lample Guillaume, Ballesteros Miguel, Subramanian Sandeep, Kawakami Kazuya, and Dyer Chris. 2016. Neural architectures for named entity recognition. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 260270. DOI: DOI: https://doi.org/10.18653/v1/N16-1030Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Levow Gina-Anne. 2006. The Third International Chinese Language Processing Bakeoff: Word segmentation and named entity recognition. In Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing, Association for Computational Linguistics, 108117.Google ScholarGoogle Scholar
  17. [17] Li Fei, Wang Zheng, Hui Siu Cheung, Liao Lejian, Song Dandan, Xu Jing, He Guoxiu, and Jia Meihuizi. 2021. Modularized interaction network for named entity recognition. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Association for Computational Linguistics, 200209.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Li Jian, Tu Zhaopeng, Yang Baosong, Lyu Michael R., and Zhang Tong. 2018. Multi-head attention with disagreement regularization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 28972903. DOI: DOI: https://doi.org/10.18653/v1/D18-1317Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Li Xiaonan, Yan Hang, Qiu Xipeng, and Huang Xuanjing. 2020. FLAT: Chinese NER using flat-lattice transformer. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 68366842. DOI: DOI: https://doi.org/10.18653/v1/2020.acl-main.611Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Liu Linlin, Ding Bosheng, Bing Lidong, Joty Shafiq, Si Luo, and Miao Chunyan. 2021. MulDA: A multilingual data augmentation framework for low-resource cross-lingual NER. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Association for Computational Linguistics, 58345846.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Liu Wei, Fu Xiyan, Zhang Yue, and Xiao Wenming. 2021. Lexicon enhanced Chinese sequence labeling using BERT adapter. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Association for Computational Linguistics, 58475858.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Liu Zihan, Winata Genta Indra, and Fung Pascale. 2020. Zero-resource cross-domain named entity recognition. In Proceedings of the 5th Workshop on Representation Learning for NLP. Association for Computational Linguistics, 16. DOI: DOI: https://doi.org/10.18653/v1/2020.repl4nlp-1.1Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Ma Ruotian, Peng Minlong, Zhang Qi, Wei Zhongyu, and Huang Xuanjing. 2020. Simplify the usage of lexicon in Chinese NER. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 59515960. DOI: DOI: https://doi.org/10.18653/v1/2020.acl-main.528Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Meng Yuxian, Wu Wei, Wang Fei, Li Xiaoya, Nie Ping, Yin Fan, Li Muyu, Han Qinghong, Sun Xiaofei, and Li Jiwei. 2019. Glyce: Glyph-vectors for chinese character representations. Adv. Neural Inf. Process. Syst. 32 (2019).Google ScholarGoogle Scholar
  25. [25] Mengge Xue, Yu Bowen, Liu Tingwen, Zhang Yue, Meng Erli, and Wang Bin. 2020. Porous lattice transformer encoder for Chinese NER. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, 38313841. DOI: DOI: https://doi.org/10.18653/v1/2020.coling-main.340Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Michel Paul, Levy Omer, and Neubig Graham. 2019. Are sixteen heads really better than one? In Advances in Neural Information Processing Systems, Wallach H., Larochelle H., Beygelzimer A., d'Alché-Buc F., Fox E., and Garnett R. (Eds.), Vol. 32. Curran Associates, Inc.Retrieved from https://proceedings.neurips.cc/paper/2019/file/2c601ad9d2ff9bc8b282670cdd54f69f-Paper.pdf.Google ScholarGoogle Scholar
  27. [27] Mullenbach James, Wiegreffe Sarah, Duke Jon, Sun Jimeng, and Eisenstein Jacob. 2018. Explainable prediction of medical codes from clinical text. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, 11011111. DOI: DOI: https://doi.org/10.18653/v1/N18-1100Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Peng Nanyun and Dredze Mark. 2015. Named entity recognition for Chinese social media with jointly trained embeddings. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 548554. DOI: DOI: https://doi.org/10.18653/v1/D15-1064Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Raganato Alessandro and Tiedemann Jörg. 2018. An analysis of encoder representations in transformer-based machine translation. In Proceedings of the EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Association for Computational Linguistics, 287297. DOI: DOI: https://doi.org/10.18653/v1/W18-5431Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Shaw Peter, Uszkoreit Jakob, and Vaswani Ashish. 2018. Self-attention with relative position representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), Association for Computational Linguistics, 464468.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Su Jianlin, Lu Yu, Pan Shengfeng, Wen Bo, and Liu Yunfeng. 2021. Roformer: Enhanced transformer with rotary position embedding. arXiv preprint arXiv:2104.09864 (2021).Google ScholarGoogle Scholar
  32. [32] Sui Dianbo, Chen Yubo, Liu Kang, Zhao Jun, and Liu Shengping. 2019. Leverage lexical knowledge for Chinese named entity recognition via collaborative graph network. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 38303840. DOI: DOI: https://doi.org/10.18653/v1/D19-1396Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Tong Yiqi, Chen Yidong, and Shi Xiaodong. 2021. A multi-task approach for improving biomedical named entity recognition by incorporating multi-granularity information. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Association for Computational Linguistics, 48044813.Google ScholarGoogle Scholar
  34. [34] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).Google ScholarGoogle Scholar
  35. [35] Voita Elena, Talbot David, Moiseev Fedor, Sennrich Rico, and Titov Ivan. 2019. Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 57975808. DOI: DOI: https://doi.org/10.18653/v1/P19-1580Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Wang Zhiguo, Mi Haitao, Hamza Wael, and Florian Radu. 2016. Multi-perspective context matching for machine comprehension. arXiv preprint arXiv:1612.04211 (2016).Google ScholarGoogle Scholar
  37. [37] Wang Zhenghui, Qu Yanru, Chen Liheng, Shen Jian, Zhang Weinan, Zhang Shaodian, Gao Yimei, Gu Gen, Chen Ken, and Yu Yong. 2018. Label-aware double transfer learning for cross-specialty medical named entity recognition. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, 115. DOI: DOI: https://doi.org/10.18653/v1/N18-1001Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Wu Hui and Shi Xiaodong. 2021. Synchronous dual network with cross-type attention for joint entity and relation extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 27692779.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Xuan Zhenyu, Bao Rui, and Jiang Shengyi. 2020. FGN: Fusion glyph network for chinese named entity recognition. In Proceedings of the China Conference on Knowledge Graph and Semantic Computing. Springer, 2840.Google ScholarGoogle Scholar
  40. [40] Yan Hang, Deng Bocao, Li Xiaonan, and Qiu Xipeng. 2019. TENER: Adapting transformer encoder for named entity recognition. ArXiv abs/1911.04474 (2019).Google ScholarGoogle Scholar
  41. [41] Yu B., Zhang Z., Liu T., Wang B., and Li Q.. 2019. Beyond word attention: Using segment attention in neural relation extraction. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI’19), International Joint Conferences on Artificial Intelligence Organization, 5401--5407.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Zhang Yue and Yang Jie. 2018. Chinese NER using lattice LSTM. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 15541564. DOI: DOI: https://doi.org/10.18653/v1/P18-1144Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Zhou Baohang, Cai Xiangrui, Zhang Ying, and Yuan Xiaojie. 2021. An end-to-end progressive multi-task learning framework for medical named entity recognition and normalization. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Association for Computational Linguistics, 62146224.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Zhou Joey Tianyi, Zhang Hao, Jin Di, Zhu Hongyuan, Fang Meng, Goh Rick Siow Mong, and Kwok Kenneth. 2019. Dual adversarial neural transfer for low-resource named entity recognition. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 34613471.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Multi-task Label-wise Transformer for Chinese Named Entity Recognition

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 4
      April 2023
      682 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3588902
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 March 2023
      • Online AM: 18 January 2023
      • Accepted: 26 November 2022
      • Revised: 14 July 2022
      • Received: 3 November 2021
      Published in tallip Volume 22, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)165
      • Downloads (Last 6 weeks)24

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!