Abstract
In recent years, a large number of Chinese electronic texts have been produced in the process of information construction in various fields. Identifying specific entities in these electronic texts has become a major research focus. Most existing research methods use radicals to extract the glyph features of Chinese characters but have seen its limitation. This paper extracts the features of Chinese characters from three aspects: glyph features, phonetic features, and character features, and improves conventional feature extraction methods for each kind of feature. A new named entity recognition method (AIP) is proposed by transforming Chinese characters into corresponding images for glyph feature extraction, dividing pinyin into initials, vowels, and tones for phonetic feature extraction, and fine-tuning the A Lite Bert model for character feature extraction to improve the performance of the model. This paper compares the performance of the AIP model and mainstream neural network models on Chinese named entity recognition tasks on commonly used data sets and the data sets in specific domains. The results showed that AIP achieved better results than the related work. The F1 values on the two data sets are 94.4% and 80.5%, respectively, which validates the model's versatility.
- [1] . 2016. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition. In Natural Language Understanding and Intelligent Applications. Springer, Cham, 239–250.Google Scholar
Cross Ref
- [2] . 2008. Named entity recognition approaches. International Journal of Computer Science and Network Security 8, 2 (2008), 339–344.Google Scholar
- [3] . 2005. Chinese named entity recognition using lexicalized HMMs. ACM SIGKDD Explorations Newsletter 7, 1 (2005), 19–25.Google Scholar
Digital Library
- [4] . 2006. Chinese named entity recognition with conditional random fields. In Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing. 118–121.Google Scholar
- [5] . 2017. Named entity recognition in Chinese electronic medical records based on CRF. In 2017 14th Web Information Systems and Applications Conference (WISA). IEEE, 105–110.Google Scholar
Cross Ref
- [6] . 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.Google Scholar
Digital Library
- [7] . 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991.Google Scholar
- [8] . 2016. End-to-end sequence labeling via Bi-directional LSTM-CNNs-CRF. arXiv preprint arXiv:1603.01354.Google Scholar
- [9] . 2018. Chinese NER using Lattice LSTM. arXiv preprint arXiv:1805.02023.Google Scholar
- [10] . 2019. CNN-based Chinese NER with lexicon rethinking. In IJCAI. 4982–4988.Google Scholar
- [11] . 2019. CAN-NER: Convolutional attention network for Chinese named entity recognition. arXiv preprint arXiv:1904.02141.Google Scholar
- [12] . 2010. Chinese named entity recognition with a sequence labeling approach: Based on characters, or based on words?. In International Conference on Intelligent Computing. Springer, Berlin, 634–640.Google Scholar
Cross Ref
- [13] . 2014. Comparison of the impact of word segmentation on name tagging for Chinese and Japanese. In LREC. 2532–2536.Google Scholar
- [14] . 2016. Multi-prototype Chinese character embedding. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16). 855–859.Google Scholar
- [15] . 2018. BeERTPre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.Google Scholar
- [16] . 2019. BioBERT based named entity recognition in electronic medical record. In 2019 10th International Conference on Information Technology in Medicine and Education (ITME). IEEE, 49–52.Google Scholar
Cross Ref
- [17] . 2014. Radical-enhanced Chinese character embedding. In International Conference on Neural Information Processing. Springer, Cham, 279–286.Google Scholar
Cross Ref
- [18] . 2015. Component-enhanced Chinese character embeddings. arXiv preprint arXiv:1508.06669.Google Scholar
- [19] . 2019. A radical-based method for Chinese named entity recognition. In Proceedings of the 2nd International Conference on Big Data Technologies. 125–130.Google Scholar
Digital Library
- [20] . 2015. Radical embedding: Delving deeper to Chinese radicals. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 594–598.Google Scholar
Cross Ref
- [21] . 2019. Effective combination of DenseNet and BiLSTM for keyword spotting. IEEE Access, 7, 10767–10775.Google Scholar
Cross Ref
- [22] . 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4700–4708.Google Scholar
Cross Ref
- [23] . 2019. Albert: A Lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942.Google Scholar
- [24] . 2019. How to fine-tune BERT for text classification?. In China National Conference on Chinese Computational Linguistics. Springer, Cham, 194–206.Google Scholar
Digital Library
- [25] . 1994. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5, 2 (1994), 157–166.Google Scholar
Digital Library
- [26] . 2020. A survey on deep learning for named entity recognition. IEEE Transactions on Knowledge and Data Engineering.Google Scholar
Cross Ref
- [27] . 2019. Glyce: Glyph-vectors for Chinese character representations. arXiv preprint arXiv:1901.10125.Google Scholar
- [28] . 2020. FGN: Fusion glyph network for Chinese named entity recognition. In China Conference on Knowledge Graph and Semantic Computing. Springer, Singapore, 28–40.Google Scholar
- [29] . 2021. ChineseBERT: Chinese pretraining enhanced by glyph and pinyin information. arXiv preprint arXiv:2106.16038.Google Scholar
- [30] . 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data.Google Scholar
Index Terms
AIP: A Named Entity Recognition Method Combining Glyphs and Sounds
Recommendations
Arabic Named Entity Recognition from Diverse Text Types
GoTAL '08: Proceedings of the 6th international conference on Advances in Natural Language ProcessingName identification has been worked on quite intensively for the past few years, and has been incorporated into several products. Many researchers have attacked this problem in a variety of languages but only a few limited researches have focused on ...
Simultaneous character-cluster-based word segmentation and named entity recognition in Thai language
KICSS'10: Proceedings of the 5th international conference on Knowledge, information, and creativity support systemsNamed entity recognition in inherent-vowel alphabetic languages such as Burmese, Khmer, Lao, Tamil, Telugu, Bali, and Thai, is difficult since there are no explicit boundaries among words or sentences. This paper presents a novel method to exploit the ...
Chinese Named Entity Recognition with Character-Word Mixed Embedding
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge ManagementNamed Entity Recognition (NER) is an important basis for the tasks in natural language processing such as relation extraction, entity linking and so on. The common method of existing Chinese NER systems is to use the character sequence as the input, and ...






Comments