Abstract
Benefiting from the improvement of positional encoding and the introduction of lexical knowledge, Transformer has achieved superior performance than the prevailing BiLSTM-based models in named entity recognition (NER) task. However, existing Transformer-based models for Chinese NER pay less attention to the information captured by the bottom layers of Transformer and the significance of representation subspace where each head of Transformer is projected. In this article, we propose Multi-Task Label-Wise Transformer (MTLWT). From a global perspective, we assign entity boundary prediction (EBP) and entity type prediction (ETP) tasks to the first two layers. In this way, we stimulate lower layers to participate more in constructing character representation. Besides, in each multi-head self-attention (MHSA) layer, we provide a specific focus for each individual head, making the head project into a significant subspace. Experiments on four datasets from different domains show that our proposed model achieves comparable performance with other state-of-the-art models. In particular, MTLWT outperforms the other frameworks without external knowledge on all the datasets.
- [1] . 2018. Adversarial transfer learning for Chinese named entity recognition with self-attention mechanism. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 182–192.
DOI: DOI: https://doi.org/10.18653/v1/D18-1017Google ScholarCross Ref
- [2] . 2011. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12 (2011), 2493–2537.Google Scholar
Digital Library
- [3] . 2019. Hierarchically-refined label attention network for sequence labeling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19), Association for Computational Linguistics, 4115–4128.Google Scholar
Cross Ref
- [4] . 2019. Transformer-XL: Attentive language models beyond a fixed-length context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 2978–2988.Google Scholar
Cross Ref
- [5] . 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171–4186.
DOI: DOI: https://doi.org/10.18653/v1/N19-1423Google Scholar - [6] . 2018. Core techniques of question answering systems over knowledge bases: A survey. Knowl. Inf. Syst. 55, 3 (2018), 529–569.Google Scholar
Digital Library
- [7] . 2019. A neural multi-digraph model for Chinese NER with gazetteers. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1462–1467.
DOI: DOI: https://doi.org/10.18653/v1/P19-1141Google ScholarCross Ref
- [8] . 2018. Improving low resource named entity recognition using cross-lingual knowledge transfer. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18). International Joint Conferences on Artificial Intelligence Organization, 4071–4077.
DOI: DOI: https://doi.org/10.24963/ijcai.2018/566Google ScholarDigital Library
- [9] . 2019. A lexicon-based graph neural network for Chinese NER. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 1040–1050.
DOI: DOI: https://doi.org/10.18653/v1/D19-1096Google ScholarCross Ref
- [10] . 2017. F-score driven max margin neural network for named entity recognition in Chinese social media. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Association for Computational Linguistics, 713–718. Retrieved from https://www.aclweb.org/anthology/E17-2113.Google Scholar
Cross Ref
- [11] . 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015).Google Scholar
- [12] . 2019. What does BERT learn about the structure of language? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 3651–3657.
DOI: DOI: https://doi.org/10.18653/v1/P19-1356Google ScholarCross Ref
- [13] . 2019. Cross-domain NER using cross-domain language modeling. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 2464–2474.Google Scholar
Cross Ref
- [14] . 2020. Multi-cell compositional LSTM for NER domain adaptation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 5906–5917.
DOI: DOI: https://doi.org/10.18653/v1/2020.acl-main.524Google ScholarCross Ref
- [15] . 2016. Neural architectures for named entity recognition. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 260–270.
DOI: DOI: https://doi.org/10.18653/v1/N16-1030Google ScholarCross Ref
- [16] . 2006. The Third International Chinese Language Processing Bakeoff: Word segmentation and named entity recognition. In Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing, Association for Computational Linguistics, 108–117.Google Scholar
- [17] . 2021. Modularized interaction network for named entity recognition. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Association for Computational Linguistics, 200–209.Google Scholar
Cross Ref
- [18] . 2018. Multi-head attention with disagreement regularization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2897–2903.
DOI: DOI: https://doi.org/10.18653/v1/D18-1317Google ScholarCross Ref
- [19] . 2020. FLAT: Chinese NER using flat-lattice transformer. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 6836–6842.
DOI: DOI: https://doi.org/10.18653/v1/2020.acl-main.611Google ScholarCross Ref
- [20] . 2021. MulDA: A multilingual data augmentation framework for low-resource cross-lingual NER. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Association for Computational Linguistics, 5834–5846.Google Scholar
Cross Ref
- [21] . 2021. Lexicon enhanced Chinese sequence labeling using BERT adapter. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Association for Computational Linguistics, 5847–5858.Google Scholar
Cross Ref
- [22] . 2020. Zero-resource cross-domain named entity recognition. In Proceedings of the 5th Workshop on Representation Learning for NLP. Association for Computational Linguistics, 1–6.
DOI: DOI: https://doi.org/10.18653/v1/2020.repl4nlp-1.1Google ScholarCross Ref
- [23] . 2020. Simplify the usage of lexicon in Chinese NER. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 5951–5960.
DOI: DOI: https://doi.org/10.18653/v1/2020.acl-main.528Google ScholarCross Ref
- [24] . 2019. Glyce: Glyph-vectors for chinese character representations. Adv. Neural Inf. Process. Syst. 32 (2019).Google Scholar
- [25] . 2020. Porous lattice transformer encoder for Chinese NER. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, 3831–3841.
DOI: DOI: https://doi.org/10.18653/v1/2020.coling-main.340Google ScholarCross Ref
- [26] . 2019. Are sixteen heads really better than one? In Advances in Neural Information Processing Systems, , , , , , and (Eds.), Vol. 32. Curran Associates, Inc.Retrieved from https://proceedings.neurips.cc/paper/2019/file/2c601ad9d2ff9bc8b282670cdd54f69f-Paper.pdf.Google Scholar
- [27] . 2018. Explainable prediction of medical codes from clinical text. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, 1101–1111.
DOI: DOI: https://doi.org/10.18653/v1/N18-1100Google ScholarCross Ref
- [28] . 2015. Named entity recognition for Chinese social media with jointly trained embeddings. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 548–554.
DOI: DOI: https://doi.org/10.18653/v1/D15-1064Google ScholarCross Ref
- [29] . 2018. An analysis of encoder representations in transformer-based machine translation. In Proceedings of the EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Association for Computational Linguistics, 287–297.
DOI: DOI: https://doi.org/10.18653/v1/W18-5431Google ScholarCross Ref
- [30] . 2018. Self-attention with relative position representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), Association for Computational Linguistics, 464–468.Google Scholar
Cross Ref
- [31] . 2021. Roformer: Enhanced transformer with rotary position embedding. arXiv preprint arXiv:2104.09864 (2021).Google Scholar
- [32] . 2019. Leverage lexical knowledge for Chinese named entity recognition via collaborative graph network. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 3830–3840.
DOI: DOI: https://doi.org/10.18653/v1/D19-1396Google ScholarCross Ref
- [33] . 2021. A multi-task approach for improving biomedical named entity recognition by incorporating multi-granularity information. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Association for Computational Linguistics, 4804–4813.Google Scholar
- [34] . 2017. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).Google Scholar
- [35] . 2019. Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 5797–5808.
DOI: DOI: https://doi.org/10.18653/v1/P19-1580Google ScholarCross Ref
- [36] . 2016. Multi-perspective context matching for machine comprehension. arXiv preprint arXiv:1612.04211 (2016).Google Scholar
- [37] . 2018. Label-aware double transfer learning for cross-specialty medical named entity recognition. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, 1–15.
DOI: DOI: https://doi.org/10.18653/v1/N18-1001Google ScholarCross Ref
- [38] . 2021. Synchronous dual network with cross-type attention for joint entity and relation extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2769–2779.Google Scholar
Cross Ref
- [39] . 2020. FGN: Fusion glyph network for chinese named entity recognition. In Proceedings of the China Conference on Knowledge Graph and Semantic Computing. Springer, 28–40.Google Scholar
- [40] . 2019. TENER: Adapting transformer encoder for named entity recognition. ArXiv abs/1911.04474 (2019).Google Scholar
- [41] . 2019. Beyond word attention: Using segment attention in neural relation extraction. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI’19), International Joint Conferences on Artificial Intelligence Organization, 5401--5407.Google Scholar
Cross Ref
- [42] . 2018. Chinese NER using lattice LSTM. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1554–1564.
DOI: DOI: https://doi.org/10.18653/v1/P18-1144Google ScholarCross Ref
- [43] . 2021. An end-to-end progressive multi-task learning framework for medical named entity recognition and normalization. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Association for Computational Linguistics, 6214–6224.Google Scholar
Cross Ref
- [44] . 2019. Dual adversarial neural transfer for low-resource named entity recognition. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 3461–3471.Google Scholar
Cross Ref
Index Terms
Multi-task Label-wise Transformer for Chinese Named Entity Recognition
Recommendations
An Empirical Study of Multi-domain and Multi-task Learning in Chinese Named Entity Recognition
Artificial Neural Networks and Machine Learning – ICANN 2019: Deep LearningAbstractNamed entity recognition (NER) often suffers from lack of annotation data. Multi-domain and multi-task learning solve this problem in some degree. However, previous multi-domain and multi-task learning are often studied in English. In the other ...
Chinese Named Entity Recognition with CRFs: Two Levels
CIS '08: Proceedings of the 2008 International Conference on Computational Intelligence and Security - Volume 02Named Entity Recognition (NER) is one of the key techniques in natural language processing tasks such as information extraction, text summarization and so on. Chinese NER is more complicated and difficult than other languages because of its ...
An Improved Chinese Named Entity Recognition Method with TB-LSTM-CRF
SSPS 2020: 2020 2nd Symposium on Signal Processing SystemsOwing to the lack of natural delimiters, Chinese named entity recognition (NER) is more challenging than it in English. While Chinese word segmentation (CWS) is generally regarded as key and open problem for Chinese NER, its accuracy is critical for the ...






Comments