TCM Automatic Diagnosis System Based on Knowledge Graph and BERT

Artificial intelligence technology has provided significant benefits to Traditional Chinese Medicine (TCM) diagnosis. In this paper, we build a TCM automatic system utilizing knowledge graphs and natural language processing. We first train a standard word alignment model by fine-tuning the Bidirectional Encoder Representations from Transformers (BERT) based model to help nonstandard input align to standard text. Then we propose an algorithm for calculating the recommended score for each prescription in the knowledge graph, thereby obtaining the recommended results for a given set of patient symptoms. To evaluate the effectiveness of our system, we conducted experiments using our TCM diagnosis dataset. The results demonstrate that our system has the potential to be a traditional Chinese medicine AI assistant with low computing resource consumption and high accuracy.


INTRODUCTION
In recent years, there has been significant progress in artificial intelligence, leading to its popularity in the medical domain.Traditional Chinese medicine (TCM) diagnosis process primarily involves identifying patients' symptoms and applying TCM theories to make comprehensive decisions [1][2], which is similar to the process of utilizing knowledge and identifying patterns from data in artificial intelligence.Therefore, the application of artificial intelligence in TCM automatic diagnosis has aroused the interest of more and more researchers.
Currently, there are two main paradigms for AI-assisted automatic diagnosis.One is to determine whether a patient has a disease based on his/her status collected by a specific diagnostic method, which includes visual inspection, listening and smelling, patient inquiry, and physical palpation.The other is to make comprehensive decisions by the four diagnostic methods mentioned above.Visual inspection benefits from computer vision technology, which extracts features related to TCM diseases through tongue [3][4] or face images [5].There are also researches [6][7] focused on pulse or voice, which treated them as digital signals, and employed machine learning-based or deep learning-based approaches to perform classification.These studies only extract disease-related patterns from a single symptom, which is easy to learn for artificial intelligence.However, making comprehensive decisions based on multiple symptoms proves to be more challenging.This paradigm requires simulating doctors' reasoning processes to make accurate decisions regarding complex symptoms, including expert systems [8] and deep-learning based ways [9].
Current machine learning-based or deep learning-based methods lack interpretability and often require training on a large number of clinical cases, disregarding the essence of TCM theory.As a result, the knowledge graph, which serves as a structured semantic knowledge base [10] and can be employed in theory-related work, has emerged as a popular alternative [11].There are studies using knowledge graph embeddings to perform deep learning classification.Ye et al [12] used a knowledge graph to enhance knowledge expression in symptom features for deep learning-based method.Weng et al [13] improved a representation learning method for TCM knowledge graphs and performed link predictions to obtain results.Additionally, learning the reasoning path in the knowledge graph has been adapted in TCM automatic diagnosis studies.Xie et al [14] utilized reinforcement learning to learn the reasoning path in the TCM knowledge graph, thereby obtaining the complete path from symptoms to prescriptions.And Zhang et al [15] employed the Naïve Bayes formula on the knowledge graph to calculate a score for the meta-path pattern.
While these studies excel at handling reasoning within the knowledge graph, they fall short in language processing, which is crucial due to the presence of nonstandard texts in real-life clinical cases.There are several natural language processing (NLP) models in the medical filed which aim to normalize the synonymous symptoms.Some studies treated this task as calculating language similarity to match standard words, including DNorm [16] and Word2Vec [17].Others treated it as a text-classification task by using Bi-LSTM-CRF-based models [18] or models based on Bidirectional Encoder Representations from Transformers (BERT) [19].The BERT-based models [20][21][22] have demonstrated state-of-the-art performance in medical term normalization, which could also help in processing nonstandard inputs in TCM diagnosis.
Inspired by previous studies, we propose a theory-based method using a knowledge graph and employ natural language processing to address nonstandard texts.We finetune several BERT-based models on nonstandard-standard text pairs dataset.Furthermore, we construct a TCM knowledge graph by incorporating information from authoritative TCM books under the guidance of TCM specialists.Subsequently, we introduce an algorithm to compute recommendation scores for prescriptions based on a patient's symptom set.The experimental results demonstrate the effectiveness of our system in automated diagnosis tasks, and the ability to serve as a TCM AI assistant for patients.

METHODS AND MATERIALS 2.1 Overall structure
The overall structure of our TCM automatic diagnosis system is shown in Figure 1.First, we input the patients' symptoms into our BERT-based standard word alignment model, thus offering optional standard words for patients to choose from.Once the patients have made their selections, we obtain their standard symptom set and proceed to calculate scores for candidate prescriptions within the TCM knowledge graph.As a result, we can generate recommended prescriptions for each patient.

Train a standard word alignment model
We designed a standard word alignment model for giving standard candidates to patients when they give natural language input not in database.
For an input symptom set, we first use the BERT tokenizer to convert it to a vector, which can be denoted as Υ = { 1 , . . ., } We enter this vector into the BERT-based pre-trained language model and take the last hidden layer as its language embeddings, which can be denoted as X = [ 1 , ..., ] as in: Simultaneously, we build the language embeddings matrix A = [ 1 , ..., ] for all standard symptom labels Φ = { 1 , . . ., }, which can be described as: We employ the soft-attention mechanism [23] to establish the correlation between input text and standard labels.Firstly, we calculate the attention score matrix Θ of the input symptoms X with the standard symptoms' matrix A: Here, is the dimension of the hidden layer.Then we can get the attention weights matrix Ω through a SoftMax layer: We use these attention weights to construct attention feature for the input symptom text: For every single dimension in , we can get: where indicates the attention feature for the jth symptom on the ith label.
Finally, we input this feature into a fully connected network (FCN) classification layer, thus we can obtain candidate results for nonstandard input.

Build the TCM knowledge graph
We build the TCM knowledge graph from TCM theory books under the guidance of specialists.Concepts and relationships in our TCM knowledge graph are shown in Figure 2.There are three types of relationships between symptoms and prescriptions.Major symptom means the symptom is significant for one syndrome, while ancillary symptom means this symptom occurs as an additional symptom of this syndrome.For a symptom that shouldn't be cured by the specific prescription, we named the relationship as contraindicated symptom.

Calculate the recommended score
Based on the knowledge graph, we propose an algorithm to calculate the recommended score for each prescription and choose the top fifth prescription as the recommended result.
For a given patient's symptom set, we first calculate hit counts of each prescription on every symptom, therefore we can screen out prescriptions with a relatively large overlap in symptoms.Considering that when patients describe their symptoms, they may also have some associated symptoms that are not described, resulting in incompleteness of the symptom set.We also use symptom similarity in hit counts to provide a more complete selection of symptoms.The similarity between symptoms is stored in the knowledge graph as the value of relation, and can be calculated by: Language similarity can be defined as the cosine similarity of language embeddings, and measure symptoms' semantic similarity in general corpus: The graph similarity is used to measure the similarity of symptom nodes in the knowledge graph, and we utilize the Leacock-Chodorow similarity [24] to calculate the graph similarity Where D is the longest length in the knowledge graph.Then we calculate the recommended score for this filtered prescription to determine the final result.For a given patient's symptom set = { 1 , . . ., } and a candidate prescription p, p's major symptom set 1 = { 1 1 , . . ., 1 1 }, ancillary symptom set 2 = { 2 1 , . . ., 2 2 } and contraindicated symptom set 3 = { 3 1 , . . ., 3 3 }, the recommend score is defined as: For major symptoms, the weights = 1, and for ancillary symptoms, the weights = 0.5, while for contraindicated symptoms the value is −1.Here we calculate the overlap degree between the prediction and recommendation set and this equation also includes some rules in TCM diagnosis, which is, that the major symptom and the contraindicated symptom should be considered first, while the influence of ancillary symptom should be minimized.
The specific details of the entire process are as follows.
Algorithm  10) end sort filtered prescriptions by score pick up Top-5 scores and prescription as the result

Experimental Materials
We use different experimental materials for different subtasks, which are detailed as follows.

2.5.1
For training the standard word alignment model.We take several pre-trained Chinse language models [19,25,26] based on BERT to perform our text-align experiments.We utilize standard symptom labels compiled from medical literature [27] under the guidance of TCM specialists.We extract nonstandard descriptions from clinical cases and make word pairs by manual annotation.Consequently, we take 9066 nonstandard-standard word pairs and 626 labels as training materials for the text-align model.

2.5.2
For building the TCM Knowledge graph.We use TCM theory books to build the knowledge graph, each book contains rules for determining which category a prescription belongs to, giving a prescription to a symptom set, and selecting medicine to make up a prescription.The number of entities and relationships in our TCM knowledge graph is listed in Table 1 and Table 2 2.5.3For performing automatic diagnosis.We take 3200 clinical cases compiled from ancient Chinese TCM books as our automatic diagnosis dataset.Each case includes a symptom set, prescription labels, and syndrome channels for those prescriptions.These cases contain nonstandard natural language and will be converted by our standard word alignment model.

Evaluation metrics
The standard word alignment experiment can be considered as a multi-label classification process; therefore, we use recall, precision, and F1 score to measure the performance of different models.
For the automatic diagnosis experiment, we will offer 5 recommended prescriptions for patients.Therefore, we employ the top-N accuracy to measure the effectiveness method of our method.

Comparative results of alignment experiments
The experimental results for the standard word alignment experiment are summarized in Table 3.Several Chinese pre-trained language models are used in this experiment.We can observe that the Chinese-BERT-wwm-ext [21] model outperforms other models in all metrics, with an average improvement of 1%.Therefore, this model will be used to convert nonstandard words in our diagnosis system for automatic diagnosis experiments.

Experimental Results of Automatic Diagnosis
The experimental results for the automatic diagnosis experiment are shown in Figure 3.To better demonstrate the effectiveness of our method, we employ two other methods as a comparison.
Our method is denoted as Method I, and the method that doesn't consider counting hits for similar symptoms is denoted as Method II.The method that ignores the difference between major symptoms, ancillary symptoms, and contradicted symptoms is denoted as Method III.
The following observations can be made through the results: (1) Our method outperforms the other two comparative methods among all recommended results, with an improvement of nearly 30% in the top-1 recommended result.(2) There is only a subtle difference in the performance of Method II and Method III, while Method II performs slightly better.The result demonstrates the

CONCLUSION
In this paper, we introduce a novel method for constructing a TCM automatic system by leveraging a knowledge graph and a BERTbased pre-trained language model.Our approach involves aligning the patient's input with standard words in our knowledge graph, enabling us to calculate recommendation scores for prescriptions based on the similarity between the patient's standard symptom set and the symptom sets of the prescriptions.We conducted several experiments to identify the most effective Chinese base model for our method and utilized it for automatic diagnosis.The results demonstrate that our method achieves the highest accuracy compared to other comparative methods, which could inspire future artificial intelligence medical assistants.However, our method does have some limitations.We only finetuned the language models on our task-specific dataset for standard word alignment, and their ability to express TCM semantics needs further enhancement through pre-trained tasks.Additionally, we acknowledge the need to continue mining more TCM rules to improve our algorithm for calculating recommendation scores and achieving more accurate results.

Figure 1 :
Figure 1: Overall structure of TCM automatic diagnosis system

Figure 2 :
Figure 2: Concepts and relationships in TCM knowledge graph

Figure 3 :
Figure 3: Top-N accuracy of automatic diagnosis for different methods

Table 1 :
Number of Entities in TCM knowledge graph

Table 2 :
Number of Relationships in TCM knowledge graph

Table 3 :
Experimental results of the standard word alignment experiment