Abstract
The recognition of entities in an electronic medical record (EMR) is especially important to downstream tasks, such as clinical entity normalization and medical dialogue understanding. However, in the medical professional field, training a high-quality named entity recognition system always requires large-scale annotated datasets, which are highly expensive to obtain. In this article, to lower the cost of data annotation and maximizing the use of unlabeled data, we propose a hybrid approach to recognizing the entities in Chinese electronic medical record, which is in combination of loss-based active learning and semi-supervised learning. Specifically, we adopted a dynamic balance strategy to dynamically balance the minimum loss predicted by a named entity recognition decoder and a loss prediction module at different stages in the process. Experimental results demonstrated our proposed framework’s effectiveness and efficiency, achieving higher performances than existing approaches on Chinese EMR entity recognition datasets under limited labeling resources.
- [1] . 2010. Toward an architecture for never-ending language learning. In Proceedings of the AAAI Conference on Artificial Intelligence. 1306–1313.Google Scholar
Cross Ref
- [2] . 2016. Named entity recognition with bidirectional LSTM-CNNs. Trans. Assoc. Comput. Ling. 4 (2016), 357–370.Google Scholar
Cross Ref
- [3] . 1996. Active learning with statistical models. J. Artif. Intell. Res. 4 (1996), 129–145.Google Scholar
Cross Ref
- [4] . 1999. Unsupervised models for named entity classification. In Proceedings of the 1999 Joing SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-99). 100–110.Google Scholar
- [5] . 2011. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, (2011), 2493–2537.Google Scholar
Digital Library
- [6] . 2017. Electronic health records to facilitate clinical research. Clin. Res. Cardiol. 106, 1 (2017), 1–9.Google Scholar
Cross Ref
- [7] . 2015. Big biomedical data and cardiovascular disease research: Opportunities and challenges. Eur. Heart J. Qual. Care Clin. Outcomes 1, 1 (2015), 9–16.Google Scholar
Cross Ref
- [8] . 2016. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition. In Natural Language Understanding and Intelligent Applications. Springer, 239–250.Google Scholar
Cross Ref
- [9] . 2017. Deep bayesian active learning with image data. In Proceedings of the International Conference on Machine Learning (ICML’17). PMLR, 1183–1192.Google Scholar
- [10] . 2005. The emergence of national electronic health record architectures in the United States and Australia: Models, costs, and questions. J. Med. Internet Res. 7, 1 (2005), e383.Google Scholar
Cross Ref
- [11] . 2010. Active instance sampling via matrix partition. In Proceedings of the 23rd International Conference on Neural Information Processing Systems. 802–810.Google Scholar
- [12] . 2015. Distributed representations of words to guide bootstrapped entity classifiers. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1215–1220.Google Scholar
Cross Ref
- [13] . 2019. Biomedical named entity recognition with multilingual BERT. In Proceedings of the 5th Workshop on BioNLP Open Shared Tasks. 56–61.Google Scholar
Cross Ref
- [14] . 2008. Entropy-based active learning for object recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 1–8.Google Scholar
Cross Ref
- [15] . 2021. Semi-supervised active learning with temporal output discrepancy. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3447–3456.Google Scholar
Cross Ref
- [16] . 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv:1508.01991. Retrieved from https://arxiv.org/abs/1508.01991.Google Scholar
- [17] . 2002. Efficient support vector classifiers for named entity recognition. In Proceedings of the 19th International Conference on Computational Linguistics. 1–7.Google Scholar
Digital Library
- [18] . 2011. Named entity recognition from biomedical text using SVM. In Proceedings of the 5th International Conference on Bioinformatics and Biomedical Engineering.
IEEE , 1–4.Google ScholarCross Ref
- [19] . 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations.Google Scholar
- [20] . 2016. Neural architectures for named entity recognition. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 260–270.Google Scholar
Cross Ref
- [21] . 2004. Biomedical named entity recognition using two-phase model based on SVMs. J. Biomed. Inf. 37, 6 (2004), 436–447.Google Scholar
Digital Library
- [22] . 1994. A sequential algorithm for training text classifiers. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’94). Springer, 3–12.Google Scholar
Cross Ref
- [23] . 2021. Neural natural language processing for unstructured data in electronic health records: A review. arXiv:2107.02975. Retrieved from https://arxiv.org/abs/2107.02975.Google Scholar
- [24] . 2019. Efficient active learning for electronic medical record de-identification. In AMIA Summits on Translational Science Proceedings, 462.Google Scholar
- [25] . 2021. Medical term and status generation from chinese clinical dialogue with multi-granularity transformer. IEEE/ACM Trans. Audio Speech Lang. Process. 29 (2021), 3362–3374.Google Scholar
Digital Library
- [26] . 2020. FLAT: Chinese NER using flat-lattice transformer. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 6836–6842.Google Scholar
Cross Ref
- [27] . 2021. Word embedding bootstrapped deep active learning method to information extraction on Chinese electronic medical record. J. Shanghai Jiaotong Univ. (Sci.) 26, 4 (2021), 494–502.Google Scholar
Cross Ref
- [28] . 2003. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In Proceedings of the 7th conference on Natural language learning at HLT-NAACL. 188–191.Google Scholar
- [29] . 2004. Active learning using pre-clustering. In Proceedings of the 21st International Conference on Machine Learning. 79.Google Scholar
Digital Library
- [30] . 2021. Loss-based active learning for named entity recognition. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’21).
IEEE , 1–8.Google Scholar - [31] . 2021. A method based on multi-standard active learning to recognize entities in electronic medical record. Math. Biosci. Eng. 18 (2021), 1000–1021.Google Scholar
Cross Ref
- [32] . 2019. Competence-based curriculum learning for neural machine translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 1162–1172.Google Scholar
Cross Ref
- [33] . 2016. A maximum entropy framework for semisupervised and active learning with unknown and label-scarce classes. IEEE Trans. Neural Netw. Learn. Syst. 28, 4 (2016), 917–933.Google Scholar
Cross Ref
- [34] . 2015. Semi-supervised learning with ladder networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems. 3546–3554.Google Scholar
- [35] . 2017. Temporal ensembling for semi-supervised learning. In Proceedings of the International Conference on Learning Representations (ICLR’17), Vol. 4. 6.Google Scholar
- [36] . 2001. Active hidden Markov models for information extraction. In In Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis. 309–318..Google Scholar
Digital Library
- [37] . 2018. Active learning for convolutional neural networks: A core-set approach. In Proceedings of the 6th International Conference on Learning Representations.Google Scholar
- [38] . 2004. Biomedical named entity recognition using conditional random fields and rich feature sets. In Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP’14). 107–110.Google Scholar
Cross Ref
- [39] . 2008. An analysis of active learning strategies for sequence labeling tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1070–1079.Google Scholar
Digital Library
- [40] . 2018. In-domain context-aware token embeddings improve biomedical named entity recognition. In Proceedings of the 9th International Workshop on Health Text Mining and Information Analysis. 160–164.Google Scholar
- [41] . 2019. Variational adversarial active learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5972–5981.Google Scholar
Cross Ref
- [42] . 2022. Global pointer: Novel efficient span-based approach for named entity recognition. arXiv:2208.03054. Retrieved from https://arxiv.org/abs/2208.03054.Google Scholar
- [43] . 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Adv. Neural Inf. Process. Syst. 30 (2017).Google Scholar
- [44] . 2021. COVID-AL: The diagnosis of COVID-19 with deep active learning. Med. Image Anal. 68 (2021), 101913.Google Scholar
Cross Ref
- [45] . 2018. Opportunities and challenges in developing deep learning models using electronic health records data: A systematic review. J. Am. Med. Inf. Assoc. 25, 10 (2018), 1419–1428.Google Scholar
Cross Ref
- [46] . 2020. A knowledge-driven generative model for multi-implication chinese medical procedure entity normalization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 1490–1499.Google Scholar
Cross Ref
- [47] . 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. 189–196.Google Scholar
Digital Library
- [48] . 2021. Self-paced active learning for deep CNNs via effective loss function. Neurocomputing 424 (2021), 1–8.Google Scholar
Cross Ref
- [49] . 2019. Learning loss for active learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 93–102.Google Scholar
Cross Ref
- [50] . 2004. Enhancing HMM-based biomedical named entity recognition by studying special phenomena. J. Biomed. Inf. 37, 6 (2004), 411–422.Google Scholar
Digital Library
- [51] . 2020. Conceptualized representation learning for chinese biomedical text mining. arXiv:2008.10813. Retrieved from https://arxiv.org/abs/2008.10813.Google Scholar
- [52] . 2004. Named entity recognition in biomedical texts using an HMM model. In Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP’04). 87–90.Google Scholar
Cross Ref
- [53] . 2002. Named entity recognition using an HMM-based chunk tagger. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 473–480.Google Scholar
Digital Library
Index Terms
Combination of Loss-based Active Learning and Semi-supervised Learning for Recognizing Entities in Chinese Electronic Medical Records
Recommendations
A Hybrid Semi-supervised Learning Approach to Identifying Protected Health Information in Electronic Medical Records
IMCOM '16: Proceedings of the 10th International Conference on Ubiquitous Information Management and CommunicationDe-identification of electronic medical records is one of the main tasks to make clinical data sharable for more researchers outside the associated institutions. Indeed, this de-identification task has been considered very much with positive research ...
Building a National Electronic Medical Record Exchange System - Experiences in Taiwan
Electronic medical record (EMR) can support a secure, real-time, point-of-care, patient centric information resource for clinical care.Taiwan's government has been promoting the EMR adoption since 2000.We describe the EMR adoption strategies, current ...
Semantic-based exchanger of electronic medical records
MoMM '08: Proceedings of the 6th International Conference on Advances in Mobile Computing and MultimediaConsidering the importance of the patient's medical information for the caregivers to ensure that patients receive appropriate and safe treatment, especially the emergency room (ER) patients, thus, sharing distributed medical information among ...






Comments