Abstract
Electronic medical records (EMRs) contain valuable information about the patients, such as clinical symptoms, diagnostic results, and medications. Named entity recognition (NER) aims to recognize entities from unstructured text, which is the initial step toward the semantic understanding of the EMRs. Extracting medical information from Chinese EMRs could be a more complicated task because of the difference between English and Chinese. Some researchers have noticed the importance of Chinese NER and used the recurrent neural network or convolutional neural network (CNN) to deal with this task. However, it is interesting to know whether the performance could be improved if the advantages of the RNN and CNN can be both utilized. Moreover, RoBERTa-WWM, as a pre-training model, can generate the embeddings with word-level features, which is more suitable for Chinese NER compared with Word2Vec. In this article, we propose a hybrid model. This model first obtains the entities identified by bidirectional long short-term memory and CNN, respectively, and then uses two hybrid strategies to output the final results relying on these entities. We also conduct experiments on raw medical records from real hospitals. This dataset is provided by the China Conference on Knowledge Graph and Semantic Computing in 2019 (CCKS 2019). Results demonstrate that the hybrid model can improve performance significantly.
- Ming Cheng, LiMing Li, Yafeng Ren, yinxia Lou, and Jianbo Gao. 2019. A hybrid method to extract clinical information from Chinese electronic medical records. IEEE Access 7 (2019), 70624–70633.Google Scholar
Cross Ref
- Shan Zhao, Zhiping Cai, Haiwen Chen, Ye Wang, Fang Liu, and Anfeng Liu. 2019. Adversarial training based lattice LSTM for Chinese clinical named entity recognition. Journal of Biomedical Informatics 99 (2019), 103290.Google Scholar
Cross Ref
- Parminder Bhatia, Busra Celikkaya, and Mohammed Khalilia. 2019. Joint entity extraction and assertion detection for clinical text. In Proceedings of the Meeting of the Association for Computational Linguistics.Google Scholar
Cross Ref
- Shaker El-Sappagh, José M. Alonso, Farman Ali, Amjad Ali, Jun-Hyeog Jang, and Kyung-Sup Kwak. 2018. An ontology-based interpretable fuzzy decision support system for diabetes diagnosis. IEEE Access 6 (2018), 37371–37394.Google Scholar
Cross Ref
- Shengtian Sang, Zhihao Yang, Xiaoxia Liu, Lei Wang, Hongfei Lin, Jian Wang, and Michel Dumontier. 2018. GrEDeL: A knowledge graph embedding based method for drug discovery from biomedical literatures. IEEE Access 7 (2018), 8404–8415.Google Scholar
Cross Ref
- Aurélie Névéol, Hercules Dalianis, Sumithra Velupillai, Guergana Savova, and Pierre Zweigenbaum. 2018. Clinical natural language processing in languages other than English: Opportunities and challenges. Journal of Biomedical Semantics 9, 1 (2018), 12.Google Scholar
Cross Ref
- R. B. Altman. 2017. Artificial intelligence (AI) systems for interpreting complex medical datasets. Clinical Pharmacology & Therapeutics 101, 5 (2017), 585–586.Google Scholar
Cross Ref
- Mengge Xue, Weiming Cai, Jinsong Su, Linfeng Song, Yubin Ge, Yubao Liu, and Bin Wang. 2019. Neural collective entity linking based on recurrent random walk network learning. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 5327–5333. Google Scholar
Cross Ref
- Abhyuday N. Jagannatha and Hong Yu. 2016. Bidirectional RNN for medical event detection in electronic health records. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 473.Google Scholar
- Carol Friedman, Philip O. Alderson, John H. M. Austin, James J. Cimino, and Stephen B. Johnson. 1994. A general natural-language text processor for clinical radiology. Journal of the American Medical Informatics Association 1, 2 (1994), 161–174.Google Scholar
Cross Ref
- Robert Gaizauskas, George Demetriou, and Kevin Humphreys. 2000. Term recognition and classification in biological science journal articles. In Proceedings of the Computional Terminology for Medical and Biological Applications Workshop of the 2nd International Conference on NLP.Google Scholar
- GuoDong Zhou and Jian Su. 2002. Named entity recognition using an HMM-based chunk tagger. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 473–480. Google Scholar
Digital Library
- Rafal Jozefowicz, Wojciech Zaremba, and Ilya Sutskever. 2015. An empirical exploration of recurrent network architectures. In Proceedings of the International Conference on Machine Learning. 2342–2350. Google Scholar
Digital Library
- Shaodian Zhang, Tian Kang, Xingting Zhang, Dong Wen, Noémie Elhadad, and Jianbo Lei. 2016. Speculation detection for Chinese clinical notes: Impacts of word segmentation and embedding models. Journal of Biomedical Informatics 60 (2016), 334–341. Google Scholar
Digital Library
- Jinsong Su, Zhixing Tan, Deyi Xiong, Rongrong Ji, Xiaodong Shi, and Yang Liu. 2017. Lattice-based recurrent neural network encoders for neural machine translation. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. 3302–3308. Google Scholar
Digital Library
- Maryam Habibi, Leon Weber, Mariana Neves, David Luis Wiegandt, and Ulf Leser. 2017. Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics 33, 14 (2017), i37–i48.Google Scholar
Cross Ref
- Yonghui Wu, Min Jiang, Jianbo Lei, and Hua Xu. 2015. Named entity recognition in Chinese clinical text using deep neural network. Studies in Health Technology and Informatics 216 (2015), 624.Google Scholar
- Peng Zhou, Suncong Zheng, Jiaming Xu, Zhenyu Qi, Hongyun Bao, and Bo Xu. 2017. Joint extraction of multiple relations and entities by using a hybrid neural network. In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. Springer, 135–146.Google Scholar
- Min Song, Hwanjo Yu, and Wook-Shin Han. 2015. Developing a hybrid dictionary-based bio-entity recognition technique. BMC Medical Informatics and Decision Making 15, 1 (2015), S9.Google Scholar
Cross Ref
- Jun Liang, Xuemei Xian, Xiaojun He, Meifang Xu, Sheng Dai, Jun’yi Xin, Jie Xu, Jian Yu, and Jianbo Lei. 2017. A novel approach towards medical entity recognition in Chinese clinical text. Journal of Healthcare Engineering. Epub 2017 July 5.Google Scholar
- Zhenzhen Li, Qun Zhang, Yang Liu, Dawei Feng, and Zhen Huang. 2017. Recurrent neural networks with specialized word embedding for Chinese clinical named entity recognition. In CEUR Workshop Proceedings, Vol. 1976. 55–60.Google Scholar
- Yuhang Xia and Qi Wang. 2017. Clinical named entity recognition: ECUST in the CCKS-2017 shared task 2. In CEUR Workshop Proceedings, Vol. 1976. 43–48.Google Scholar
- Yuying Zhu and Guoxin Wang. 2019. CAN-NER: Convolutional attention network for Chinese named entity recognition. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long and Short Papers). 3384–3393.Google Scholar
- Liang Chen, Liting Song, Yue Shao, Dewei Li, and Keyue Ding. 2019. Using natural language processing to extract clinically useful information from Chinese electronic medical records. International Journal of Medical Informatics 124 (2019), 6–12.Google Scholar
Cross Ref
- Qi Wang, Yangming Zhou, Tong Ruan, Daqi Gao, Yuhang Xia, and Ping He. 2019. Incorporating dictionaries into deep neural networks for the Chinese clinical named entity recognition. Journal of Biomedical Informatics 92 (2019), 103133.Google Scholar
Cross Ref
- Wenkang Huang Rui Qiao, Xiaoran Yang. Medical Named Entity Recognition Based on BERT and Model Fusion. n.d. Retrieved January 30, 2021 from https://conference.bj.bcebos.com/ccks2019/eval/webpage/pdfs/eval_paper_1_1_1.pdf.Google Scholar
- Xiaoya Li, Yuxian Meng, Xiaofei Sun, Qinghong Han, Arianna Yuan, and Jiwei Li. 2019. Is word segmentation necessary for deep learning of Chinese representations? arXiv:1905.05526Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina N. Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805Google Scholar
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111–3119. Google Scholar
Digital Library
- GitHub. n.d. ymcui/Chinese-BERT-wwm. Retrieved January 30, 2021 from https://github.com/ymcui/Chinese-BERT-wwm.Google Scholar
- Zachary C. Lipton, John Berkowitz, and Charles Elkan. 2015. A critical review of recurrent neural networks for sequence learning. arXiv:1506.00019Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780. Google Scholar
Digital Library
- Z. Huang, W. Xu, and K. Yu. 2019. Bidirectional LSTM-CRF models for sequence tagging. arXiv:1508.01991Google Scholar
- Guohua Wu, Guangen Tang, Zhongru Wang, Zhen Zhang, and Zhen Wang. 2019. An attention-based BiLSTM-CRF model for Chinese clinic named entity recognition. IEEE Access 7 (2019), 113942–113949.Google Scholar
Cross Ref
- John Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (ICML’01). 282–289. Google Scholar
Digital Library
- Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv:1408.5882Google Scholar
- Peng Zhou, Suncong Zheng, Jiaming Xu, Zhenyu Qi, Hongyun Bao, and Bo Xu. 2017. Joint extraction of multiple relations and entities by using a hybrid neural network. In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. Springer, 135–146.Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998–6008. Google Scholar
Digital Library
Index Terms
A Hybrid Model for Named Entity Recognition on Chinese Electronic Medical Records
Recommendations
A Neural Framework for Chinese Medical Named Entity Recognition
Artificial Intelligence and Mobile Services – AIMS 2020AbstractNamed Entity Recognition (NER) in the medical field targets to extract names of disease, surgery, and the organ location from medical texts, which is considered as the fundamental work for medical robots and intelligent diagnosis systems. It is ...
Research on Named Entity Recognition of Traditional Chinese Medicine Electronic Medical Records
Health Information ScienceAbstractThe electronic medical record (EMR) is a patient’s individual medical record written by health care providers to describe the medical activities of patients. Named entity recognition (NER) of EMR is helpful to extract important information from a ...
Improving Chinese Clinical Named Entity Recognition Based on BiLSTM-CRF by Cross-Domain Transfer
HPCCT & BDAI '20: Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial IntelligenceNamed entity recognition (NER) serves as an essential resource in natural language processing (NLP) applications. Most existing named entity recognition models mainly focus on social media, biomedicine and finance. However, the number of researches on ...






Comments