skip to main content
research-article

A Hybrid Model for Named Entity Recognition on Chinese Electronic Medical Records

Authors Info & Claims
Published:23 April 2021Publication History
Skip Abstract Section

Abstract

Electronic medical records (EMRs) contain valuable information about the patients, such as clinical symptoms, diagnostic results, and medications. Named entity recognition (NER) aims to recognize entities from unstructured text, which is the initial step toward the semantic understanding of the EMRs. Extracting medical information from Chinese EMRs could be a more complicated task because of the difference between English and Chinese. Some researchers have noticed the importance of Chinese NER and used the recurrent neural network or convolutional neural network (CNN) to deal with this task. However, it is interesting to know whether the performance could be improved if the advantages of the RNN and CNN can be both utilized. Moreover, RoBERTa-WWM, as a pre-training model, can generate the embeddings with word-level features, which is more suitable for Chinese NER compared with Word2Vec. In this article, we propose a hybrid model. This model first obtains the entities identified by bidirectional long short-term memory and CNN, respectively, and then uses two hybrid strategies to output the final results relying on these entities. We also conduct experiments on raw medical records from real hospitals. This dataset is provided by the China Conference on Knowledge Graph and Semantic Computing in 2019 (CCKS 2019). Results demonstrate that the hybrid model can improve performance significantly.

References

  1. Ming Cheng, LiMing Li, Yafeng Ren, yinxia Lou, and Jianbo Gao. 2019. A hybrid method to extract clinical information from Chinese electronic medical records. IEEE Access 7 (2019), 70624–70633.Google ScholarGoogle ScholarCross RefCross Ref
  2. Shan Zhao, Zhiping Cai, Haiwen Chen, Ye Wang, Fang Liu, and Anfeng Liu. 2019. Adversarial training based lattice LSTM for Chinese clinical named entity recognition. Journal of Biomedical Informatics 99 (2019), 103290.Google ScholarGoogle ScholarCross RefCross Ref
  3. Parminder Bhatia, Busra Celikkaya, and Mohammed Khalilia. 2019. Joint entity extraction and assertion detection for clinical text. In Proceedings of the Meeting of the Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  4. Shaker El-Sappagh, José M. Alonso, Farman Ali, Amjad Ali, Jun-Hyeog Jang, and Kyung-Sup Kwak. 2018. An ontology-based interpretable fuzzy decision support system for diabetes diagnosis. IEEE Access 6 (2018), 37371–37394.Google ScholarGoogle ScholarCross RefCross Ref
  5. Shengtian Sang, Zhihao Yang, Xiaoxia Liu, Lei Wang, Hongfei Lin, Jian Wang, and Michel Dumontier. 2018. GrEDeL: A knowledge graph embedding based method for drug discovery from biomedical literatures. IEEE Access 7 (2018), 8404–8415.Google ScholarGoogle ScholarCross RefCross Ref
  6. Aurélie Névéol, Hercules Dalianis, Sumithra Velupillai, Guergana Savova, and Pierre Zweigenbaum. 2018. Clinical natural language processing in languages other than English: Opportunities and challenges. Journal of Biomedical Semantics 9, 1 (2018), 12.Google ScholarGoogle ScholarCross RefCross Ref
  7. R. B. Altman. 2017. Artificial intelligence (AI) systems for interpreting complex medical datasets. Clinical Pharmacology & Therapeutics 101, 5 (2017), 585–586.Google ScholarGoogle ScholarCross RefCross Ref
  8. Mengge Xue, Weiming Cai, Jinsong Su, Linfeng Song, Yubin Ge, Yubao Liu, and Bin Wang. 2019. Neural collective entity linking based on recurrent random walk network learning. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 5327–5333. Google ScholarGoogle ScholarCross RefCross Ref
  9. Abhyuday N. Jagannatha and Hong Yu. 2016. Bidirectional RNN for medical event detection in electronic health records. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 473.Google ScholarGoogle Scholar
  10. Carol Friedman, Philip O. Alderson, John H. M. Austin, James J. Cimino, and Stephen B. Johnson. 1994. A general natural-language text processor for clinical radiology. Journal of the American Medical Informatics Association 1, 2 (1994), 161–174.Google ScholarGoogle ScholarCross RefCross Ref
  11. Robert Gaizauskas, George Demetriou, and Kevin Humphreys. 2000. Term recognition and classification in biological science journal articles. In Proceedings of the Computional Terminology for Medical and Biological Applications Workshop of the 2nd International Conference on NLP.Google ScholarGoogle Scholar
  12. GuoDong Zhou and Jian Su. 2002. Named entity recognition using an HMM-based chunk tagger. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 473–480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Rafal Jozefowicz, Wojciech Zaremba, and Ilya Sutskever. 2015. An empirical exploration of recurrent network architectures. In Proceedings of the International Conference on Machine Learning. 2342–2350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Shaodian Zhang, Tian Kang, Xingting Zhang, Dong Wen, Noémie Elhadad, and Jianbo Lei. 2016. Speculation detection for Chinese clinical notes: Impacts of word segmentation and embedding models. Journal of Biomedical Informatics 60 (2016), 334–341. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jinsong Su, Zhixing Tan, Deyi Xiong, Rongrong Ji, Xiaodong Shi, and Yang Liu. 2017. Lattice-based recurrent neural network encoders for neural machine translation. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. 3302–3308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Maryam Habibi, Leon Weber, Mariana Neves, David Luis Wiegandt, and Ulf Leser. 2017. Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics 33, 14 (2017), i37–i48.Google ScholarGoogle ScholarCross RefCross Ref
  17. Yonghui Wu, Min Jiang, Jianbo Lei, and Hua Xu. 2015. Named entity recognition in Chinese clinical text using deep neural network. Studies in Health Technology and Informatics 216 (2015), 624.Google ScholarGoogle Scholar
  18. Peng Zhou, Suncong Zheng, Jiaming Xu, Zhenyu Qi, Hongyun Bao, and Bo Xu. 2017. Joint extraction of multiple relations and entities by using a hybrid neural network. In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. Springer, 135–146.Google ScholarGoogle Scholar
  19. Min Song, Hwanjo Yu, and Wook-Shin Han. 2015. Developing a hybrid dictionary-based bio-entity recognition technique. BMC Medical Informatics and Decision Making 15, 1 (2015), S9.Google ScholarGoogle ScholarCross RefCross Ref
  20. Jun Liang, Xuemei Xian, Xiaojun He, Meifang Xu, Sheng Dai, Jun’yi Xin, Jie Xu, Jian Yu, and Jianbo Lei. 2017. A novel approach towards medical entity recognition in Chinese clinical text. Journal of Healthcare Engineering. Epub 2017 July 5.Google ScholarGoogle Scholar
  21. Zhenzhen Li, Qun Zhang, Yang Liu, Dawei Feng, and Zhen Huang. 2017. Recurrent neural networks with specialized word embedding for Chinese clinical named entity recognition. In CEUR Workshop Proceedings, Vol. 1976. 55–60.Google ScholarGoogle Scholar
  22. Yuhang Xia and Qi Wang. 2017. Clinical named entity recognition: ECUST in the CCKS-2017 shared task 2. In CEUR Workshop Proceedings, Vol. 1976. 43–48.Google ScholarGoogle Scholar
  23. Yuying Zhu and Guoxin Wang. 2019. CAN-NER: Convolutional attention network for Chinese named entity recognition. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long and Short Papers). 3384–3393.Google ScholarGoogle Scholar
  24. Liang Chen, Liting Song, Yue Shao, Dewei Li, and Keyue Ding. 2019. Using natural language processing to extract clinically useful information from Chinese electronic medical records. International Journal of Medical Informatics 124 (2019), 6–12.Google ScholarGoogle ScholarCross RefCross Ref
  25. Qi Wang, Yangming Zhou, Tong Ruan, Daqi Gao, Yuhang Xia, and Ping He. 2019. Incorporating dictionaries into deep neural networks for the Chinese clinical named entity recognition. Journal of Biomedical Informatics 92 (2019), 103133.Google ScholarGoogle ScholarCross RefCross Ref
  26. Wenkang Huang Rui Qiao, Xiaoran Yang. Medical Named Entity Recognition Based on BERT and Model Fusion. n.d. Retrieved January 30, 2021 from https://conference.bj.bcebos.com/ccks2019/eval/webpage/pdfs/eval_paper_1_1_1.pdf.Google ScholarGoogle Scholar
  27. Xiaoya Li, Yuxian Meng, Xiaofei Sun, Qinghong Han, Arianna Yuan, and Jiwei Li. 2019. Is word segmentation necessary for deep learning of Chinese representations? arXiv:1905.05526Google ScholarGoogle Scholar
  28. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina N. Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805Google ScholarGoogle Scholar
  29. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781Google ScholarGoogle Scholar
  30. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111–3119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. GitHub. n.d. ymcui/Chinese-BERT-wwm. Retrieved January 30, 2021 from https://github.com/ymcui/Chinese-BERT-wwm.Google ScholarGoogle Scholar
  32. Zachary C. Lipton, John Berkowitz, and Charles Elkan. 2015. A critical review of recurrent neural networks for sequence learning. arXiv:1506.00019Google ScholarGoogle Scholar
  33. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Z. Huang, W. Xu, and K. Yu. 2019. Bidirectional LSTM-CRF models for sequence tagging. arXiv:1508.01991Google ScholarGoogle Scholar
  35. Guohua Wu, Guangen Tang, Zhongru Wang, Zhen Zhang, and Zhen Wang. 2019. An attention-based BiLSTM-CRF model for Chinese clinic named entity recognition. IEEE Access 7 (2019), 113942–113949.Google ScholarGoogle ScholarCross RefCross Ref
  36. John Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (ICML’01). 282–289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv:1408.5882Google ScholarGoogle Scholar
  38. Peng Zhou, Suncong Zheng, Jiaming Xu, Zhenyu Qi, Hongyun Bao, and Bo Xu. 2017. Joint extraction of multiple relations and entities by using a hybrid neural network. In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. Springer, 135–146.Google ScholarGoogle Scholar
  39. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998–6008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Hybrid Model for Named Entity Recognition on Chinese Electronic Medical Records

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Article Metrics

        • Downloads (Last 12 months)55
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!