skip to main content
research-article

AIP: A Named Entity Recognition Method Combining Glyphs and Sounds

Authors Info & Claims
Published:12 November 2022Publication History
Skip Abstract Section

Abstract

In recent years, a large number of Chinese electronic texts have been produced in the process of information construction in various fields. Identifying specific entities in these electronic texts has become a major research focus. Most existing research methods use radicals to extract the glyph features of Chinese characters but have seen its limitation. This paper extracts the features of Chinese characters from three aspects: glyph features, phonetic features, and character features, and improves conventional feature extraction methods for each kind of feature. A new named entity recognition method (AIP) is proposed by transforming Chinese characters into corresponding images for glyph feature extraction, dividing pinyin into initials, vowels, and tones for phonetic feature extraction, and fine-tuning the A Lite Bert model for character feature extraction to improve the performance of the model. This paper compares the performance of the AIP model and mainstream neural network models on Chinese named entity recognition tasks on commonly used data sets and the data sets in specific domains. The results showed that AIP achieved better results than the related work. The F1 values on the two data sets are 94.4% and 80.5%, respectively, which validates the model's versatility.

REFERENCES

  1. [1] Dong C., Zhang J., Zong C., Hattori M., and Di H.. 2016. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition. In Natural Language Understanding and Intelligent Applications. Springer, Cham, 239250.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Mansouri A., Affendey L. S., and Mamat A.. 2008. Named entity recognition approaches. International Journal of Computer Science and Network Security 8, 2 (2008), 339344.Google ScholarGoogle Scholar
  3. [3] Fu G. and Luke K. K.. 2005. Chinese named entity recognition using lexicalized HMMs. ACM SIGKDD Explorations Newsletter 7, 1 (2005), 1925.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Chen W., Zhang Y., and Isahara H.. 2006. Chinese named entity recognition with conditional random fields. In Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing. 118121.Google ScholarGoogle Scholar
  5. [5] Liu K., Hu Q., Liu J., and Xing C.. 2017. Named entity recognition in Chinese electronic medical records based on CRF. In 2017 14th Web Information Systems and Applications Conference (WISA). IEEE, 105110.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Hochreiter S. and Schmidhuber J.. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 17351780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Huang Z., Xu W., and Yu K.. 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991.Google ScholarGoogle Scholar
  8. [8] Ma X. and Hovy E.. 2016. End-to-end sequence labeling via Bi-directional LSTM-CNNs-CRF. arXiv preprint arXiv:1603.01354.Google ScholarGoogle Scholar
  9. [9] Zhang Y. and Yang J.. 2018. Chinese NER using Lattice LSTM. arXiv preprint arXiv:1805.02023.Google ScholarGoogle Scholar
  10. [10] Gui T., Ma R., Zhang Q., Zhao L., Jiang Y. G., and Huang X.. 2019. CNN-based Chinese NER with lexicon rethinking. In IJCAI. 49824988.Google ScholarGoogle Scholar
  11. [11] Zhu Y., Wang G., and Karlsson B. F.. 2019. CAN-NER: Convolutional attention network for Chinese named entity recognition. arXiv preprint arXiv:1904.02141.Google ScholarGoogle Scholar
  12. [12] Liu Z., Zhu C., and Zhao T.. 2010. Chinese named entity recognition with a sequence labeling approach: Based on characters, or based on words?. In International Conference on Intelligent Computing. Springer, Berlin, 634640.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Li H., Hagiwara M., Li Q., and Ji H.. 2014. Comparison of the impact of word segmentation on name tagging for Chinese and Japanese. In LREC. 25322536.Google ScholarGoogle Scholar
  14. [14] Lu Y., Zhang Y., and Ji D.. 2016. Multi-prototype Chinese character embedding. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16). 855859.Google ScholarGoogle Scholar
  15. [15] Devlin J., Chang M. W., Lee K., and Toutanova K.. 2018. BeERTPre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.Google ScholarGoogle Scholar
  16. [16] Yu X., Hu W., Lu S., Sun X., and Yuan Z.. 2019. BioBERT based named entity recognition in electronic medical record. In 2019 10th International Conference on Information Technology in Medicine and Education (ITME). IEEE, 4952.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Sun Y., Lin L., Yang N., Ji Z., and Wang X.. 2014. Radical-enhanced Chinese character embedding. In International Conference on Neural Information Processing. Springer, Cham, 279286.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Li Y., Li W., Sun F., and Li S.. 2015. Component-enhanced Chinese character embeddings. arXiv preprint arXiv:1508.06669.Google ScholarGoogle Scholar
  19. [19] Wu Y., Wei X., Qin Y., and Chen Y.. 2019. A radical-based method for Chinese named entity recognition. In Proceedings of the 2nd International Conference on Big Data Technologies. 125130.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Shi X., Zhai J., Yang X., Xie Z., and Liu C.. 2015. Radical embedding: Delving deeper to Chinese radicals. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 594598.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Zeng M. and Xiao N.. 2019. Effective combination of DenseNet and BiLSTM for keyword spotting. IEEE Access, 7, 1076710775.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Huang G., Liu Z., van der Maaten L., and Weinberger K. Q.. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 47004708.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Lan Z., Chen M., Goodman S., Gimpel K., Sharma P., and Soricut R.. 2019. Albert: A Lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942.Google ScholarGoogle Scholar
  24. [24] Sun C., Qiu X., Xu Y., and Huang X.. 2019. How to fine-tune BERT for text classification?. In China National Conference on Chinese Computational Linguistics. Springer, Cham, 194206.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Bengio Y., Simard P., and Frasconi P.. 1994. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5, 2 (1994), 157166.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Li J., Sun A., Han J., and Li C.. 2020. A survey on deep learning for named entity recognition. IEEE Transactions on Knowledge and Data Engineering.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Meng Y., Wu W., Wang F., Li X., Nie P., Yin F., and Li J.. 2019. Glyce: Glyph-vectors for Chinese character representations. arXiv preprint arXiv:1901.10125.Google ScholarGoogle Scholar
  28. [28] Xuan Z., Bao R., and Jiang S.. 2020. FGN: Fusion glyph network for Chinese named entity recognition. In China Conference on Knowledge Graph and Semantic Computing. Springer, Singapore, 2840.Google ScholarGoogle Scholar
  29. [29] Sun Z., Li X., Sun X., Meng Y., Ao X., He Q., and Li J.. 2021. ChineseBERT: Chinese pretraining enhanced by glyph and pinyin information. arXiv preprint arXiv:2106.16038.Google ScholarGoogle Scholar
  30. [30] Lafferty J., McCallum A., and Pereira F. C.. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data.Google ScholarGoogle Scholar

Index Terms

  1. AIP: A Named Entity Recognition Method Combining Glyphs and Sounds

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 6
      November 2022
      372 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3568970
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 November 2022
      • Online AM: 15 March 2022
      • Accepted: 25 February 2022
      • Revised: 10 February 2022
      • Received: 6 August 2021
      Published in tallip Volume 21, Issue 6

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)97
      • Downloads (Last 6 weeks)11

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!