GRAPPEL: A Graph-based Approach for Early Risk Assessment of Acute Hypertension in Critical Care

An acute hypertensive episode (AHE) refers to a period of extremely high blood pressure (BP) that can arise suddenly in critical care, and, if not identified early, can subject patients to the risk of severe organ damage and even potential mortality. The early assessment of AHE risk saves lives by enabling proactive medical intervention. We propose GRAPPEL, a novel graph-based approach that assesses a patient's risk of experiencing an AHE before it occurs based on the analysis of their BP recorded over time in critical care. Our algorithm consists of two major components: (1) the construction of a time-evolving graph representation of a patient's time-series BP data to encode the temporal BP variations into a graph and (2) the generation of real-time AHE risk scores based on quantifying the graph changes at each time step, triggered by the arrival of a new BP record. Notably, GRAPPEL provides real-time and early AHE risk assessment based solely on BP records that can be irregularly spaced in time, making it suitable for critical care environments. Via our extensive experiments on 3,476 critical-care visit records, we demonstrate the superiority of our approach over existing methods by achieving an AUC-ROC score of 91% in identifying patients at risk of experiencing an AHE up to 170 minutes in advance (and an AUC-ROC score of 94% up to 20 minutes in advance).


INTRODUCTION
In critical care settings, such as surgery and intensive care units (ICUs), early assessment of health risks is of utmost importance Thus, early risk assessment provides healthcare professionals the necessary time to evaluate the risks, rapidly determine a course of medical intervention, and proactively administer the medical intervention, thereby preventing a severe medical event from occurring.One condition in critical care units that exemplifies this urgency is an acute hypertensive episode (AHE) [11].As shown in Figure 1, AHE refers to a period of high blood pressure (BP), which if not identified early, can lead to severe consequences, including irreversible organ damage, stroke, and even potential mortality [11,17].The early risk assessment of AHE in patients enables doctors to proactively administer milder interventions, e.g., oral BP-lowering agents, to prevent such episodes from occurring in the first place.
Previous research efforts for hypertension prediction include clinical threshold-based methods [16,17], machine learning (ML) [4,10,12,13], and deep learning (DL) [14], but these methods have their limitations.For example, as illustrated in Figure 1, thresholdbased methods may mistakenly associate isolated BP spikes that are above a predefined threshold with AHE, resulting in false positives.These isolated spikes can be caused by phenomena such as the white coat syndrome [9], where a patient experiences a temporary spike in BP due to anxiety in a clinical setting without necessarily being at risk of encountering AHE in the future.ML methods face challenges in processing time-series data and typically rely on static statistical features computed over batches of data, which restricts their ability to provide real-time and early risk assessment.DL methods, such as long short-term memory (LSTM), exhibit promise in achieving high prediction accuracy, but they show diminished performance with irregularly spaced time-series data, as they rely heavily on regularly spaced time steps.
Previous studies have highlighted the effectiveness of risk-scoring methods, such as the early warning score (EWS) [2], to facilitate early risk assessment.These methods are based on the premise that the decline in patient health within critical care settings often manifests as abnormal deviations in continuously monitored physiological vitals such as pulse, BP, and respiratory rate.By analyzing the time-series data of these vitals, prolonged abnormal deviations from the expected normal values can be detected and flagged as "high risk." Real-time and early risk-scoring algorithms can therefore alert healthcare professionals, enabling them to administer timely medical interventions to prevent adverse outcomes and mortality.Moreover, this approach can help mitigate the challenge of the limited availability of qualified staff in critical care.
Building upon this concept, we propose GRAPPEL, a novel graphbased real-time and early risk assessment method for identifying patients at risk of experiencing AHE, prior to its occurrence.To the best of our knowledge, our GRAPPEL methodology is the first to employ a time-evolving graph-based approach for real-time and early risk scoring in the context of AHE.Our algorithm only relies on BP data, which can be irregularly spaced in time, mirroring the realistic data availability in critical care units.
We outline our contributions as follows: • A novel methodology for real-time and early risk assessment of AHE that offers the following features: -A time-evolving graph representation of a patient's time-series BP data that encodes the temporal variations of BP into a graph.-Real-time AHE risk scores that quantify changes in the graph at each time step, triggered by the arrival of a new BP record.-The ability to use sparse and irregularly spaced BP records.
• A rigorous evaluation of the AUC-ROC score, sensitivity, specificity, and early risk assessment capability of GRAPPEL, compared to existing methods using the MIMIC-III [7] dataset.The rest of this paper is organized as follows.First, Section 2 provides background about the MIMIC-III dataset and AHE, along with a review of prior research.Next, Section 3 describes the GRAPPEL methodology, followed by Section 4, which evaluates GRAPPEL and compares it to the existing state-of-the-art baseline methods.Finally, Section 5 concludes and details the scope of future work.

BACKGROUND AND RELATED WORK
Here we provide an overview of the MIMIC-III [7] dataset that GRAPPEL will analyze; elaborate on the specific problem we are addressing, namely the acute hypertensive episode (AHE); and review the prior studies conducted in this field.

MIMIC-III dataset
We utilize the MIMIC-III dataset, which contains the physiological vital records of over 40,000 critical care patients.For our analysis, evaluation, and comparisons against the baseline methods, we focus solely on the mean blood pressure (BP) recorded over time.
The MIMIC-III dataset includes three types of mean BP values: (1) mean arterial blood pressure (ABPm) and (2) mean arterial blood pressure (ARTm), acquired through invasive methods from radial arteries of either hand, as well as (3) non-invasive blood pressure (NBPm), gathered using non-invasive techniques.Figure 2 depicts each step of our preprocessing pipeline for the MIMIC-III dataset.

Early risk assessment of AHE in critical care
Before analyzing the data for AHEs, we articulate a precise definition of an AHE.Following some similar approaches [3,8], we define an AHE as a 30-minute time window during which the BP data points remain above 100 mmHg for at least 90% of the duration.Since the data points in the MIMIC-III dataset are irregularly spaced, a 30-minute window can encompass as few as one BP data point.To address this variability, we introduce a window of interest (WoI), which is a 30-minute time window with a minimum of 10 BP data points.If the window meets the AHE criteria (BP data points above 100 mmHg for at least 90% of the duration), it is considered an AHE window; otherwise, it is regarded as a non-AHE window.
We further partition the time leading up to the WoI into two segments: the observation window (O) and the early risk assessment window (E).The observation window (O), ranging from time  0 to   , serves as the primary input for our algorithm and the baseline methods.It comprises the time-series BP data and is utilized to assess the effectiveness of GRAPPEL and the baselines in predicting the occurrence of an AHE before it actually occurs at time    .
In contrast, the early risk assessment window (E) is a flexible time window that is specifically designed to evaluate the early risk assessment capabilities of our algorithm and those of the baseline methods.It varies the time interval between time   and    for evaluating how far in advance a model can anticipate the occurrence of an AHE or, in our case, identify a high risk of AHE before it actually occurs at time    .The value of   represents the degree of earliness relative to time    (Note that the input to the models will always be time-series BP data from time  0 to   ).
Figure 3 summarizes all the three time windows that we use for the early risk assessment of AHE in this paper.

Prior art
AHE prediction has received relatively less attention compared to the prediction of acute hypotensive episodes, which was addressed as a challenge in the 2009 Computers in Cardiology competition [8].Nonetheless, predicting AHE remains a significant research problem [6,15].Recently, researchers have proposed machine learning (ML) methods for hypertension detection, including SVM [12], KNN [13], XGBoost [4], and random forest [10].However, these studies have primarily focused on non-critical patients, where early detection or risk assessment specifically related to AHE is not the AHE if: 90% of the BP values > 100 mmHg.
Time before window of interest (WoI)

Early risk assessment window (E) Observation window (O)
BP data points analyzed by the methods  priority.Furthermore, these studies often rely on large datasets derived from wearable devices or electronic health records (EHRs).In critical care settings, however, data availability is limited, and physiological vitals are typically recorded from the moment the patient is admitted to the unit.This reduced data availability can pose challenges for ML methods.Additionally, ML approaches commonly rely on static patient features such as mean and standard deviation of physiological vitals and other clinical information, which do not account for temporal changes.This means that these ML methods require the entire batch of monitored vital data to be available at once, rendering real-time or early assessment impractical.
Recent advancements in deep learning (DL) models, such as LSTM [14], have demonstrated promising results in hypertension detection using time-series BP data, achieving accuracy rates of above 90%.However, the efficacy of these models diminishes rapidly when regular intervals of data are not available.In other words, LSTM models rely heavily on consistent and continuous data inputs over time to maintain their performance.

THE GRAPPEL METHODOLOGY
Figure 4 shows GRAPPLE, our graph-based approach for early risk assessment of AHE in critical care.It consists of two primary steps: (1) the construction of a time-evolving graph using time-series BP data and (2) the generation of AHE risk scores at each time step to facilitate the real-time and early assessment of AHE risk.
We construct the time-evolving graph to capture the variations of a patient's BP over time, effectively encoding the patient's BP dynamics in a graph structure.The acute deterioration of a patient's health, such as an AHE, is often preceded by abnormal deviations in their physiological vitals, in this case, BP elevation over time.Prolonged elevation in the patient's BP gets encoded as significant changes in the time-evolving graph, both of which serve as indicators of a high risk of AHE occurrence.We describe the timeevolving graph's construction process and explain how the graph structure encodes the temporal fluctuations in BP in Section 3.1.
The generation of the AHE risk score at each time step involves quantifying the changes in the graph triggered by the arrival of a new BP record at that specific time step.Section 3.2 delves into an explanation of this process, demonstrating that prolonged elevations in BP records and the corresponding changes in the graph result in a high AHE risk score.This high AHE risk score signifies an increased likelihood of AHE occurrence in the future.1).This calibration is crucial for capturing the elevated BP characteristics associated with AHE and ensuring positive risk scores for future assessments.
At each time step  > 0, a new node   and corresponding edges   are introduced to expand the existing graph   −1 , resulting in the updated graph   .The newly added node   represents the BP data point   at time , while the edges   connect node   with the preexisting nodes in   −1 , which represent previous BP data points occurring prior to time .
The existence of an edge   ∈   between node   and an existing node   in   −1 is determined by a distance criterion   with the condition   > 0. Distance   incorporates (1) the BP values   and   associated with nodes   and   , reflecting their respective magnitudes, and (2) the temporal proximity between node   and   , indicating the time interval between them.A higher value of   is indicative of nodes   and   having elevated BP values and being closer to each other in time.This means that nodes with higher BP values are more likely to have more edges (higher degree) and are connected to nodes representing records closer in time.The formulation of criterion   is shown in Equation (2): The level of connectivity between node   and temporally distant nodes is determined by the time scaling factor .For instance, increasing the value of  establishes connections between node   and nodes that are further away in time from time .The entire graph becomes fully connected if we set  to infinity.On the other hand, as  approaches zero, the graph will have no edges.

Real-time AHE risk scoring
After we have constructed a time-evolving graph, we proceed to calculate the AHE risk score at each time step following the update of the graph with the addition of a new node (BP record).To explain our real-time AHE risk scoring, we first define a node's local outlier factor (LOF) and time weightage factor (w) as follows: Definition 1. Local Outlier Factor (LOF).In a time-evolving graph   at time , local outlier factor LOF for a node   where  ∈ [1, ] can be formulated as follows: where  (  ) is the node degree of   ,  is the total number of neighbors of   , and  (  ) is the node degree of neighbour   .
Definition 2. Time Weightage Factor (w).In a time-evolving graph   at time , time weightage factor w for a node   where  ∈ [1, ] can be formulated as follows: where  is a weight scaling factor.It scales the AHE risk score from decimal to readable integer ranges.
The AHE risk score at time , denoted as   , is calculated using the following values associated with each node   in the graph   : (1) the time weightage factor   , (2) the BP value   , and (3) the local outlier factor   where  ∈ [1, 𝑡].
First, the time weightage factor () of each node reflects the node's importance based on its temporal position in the graph.Nodes closer to time , have a higher  and contribute higher to the risk score as our focus is real-time AHE risk assessment.Second, the BP value associated with each node in the graph reflects the magnitude of BP elevation.Nodes with high BP values contribute higher to the risk score as we see elevated BP as a sign of AHE risk.Third, the local outlier factor ( ) of each node reflects its degree of dissimilarity with its preceding nodes in terms of BP value or node degree (nodes with higher BP values have higher node degrees).Nodes that are similar to their preceding nodes have lower LOF values and contribute higher to the risk score as we see the prolonged elevation of BP as a sign of AHE risk.
Previous studies, such as [17], and our analysis of the MIMIC-III dataset indicate that the rate and magnitude of BP elevation may be at least as important as the absolute value of BP in assessing the risk of AHE.We observe that the AHE event is usually preceded by a prolonged elevation in BP instead of a sudden spike and fall, as shown Figure 5.We can distinguish the former from the latter using LOF.LOF has previously been used for graph anomaly detection, such as [1], but we modify it slightly to use it for our AHE risk assessment task.
We calculate the AHE risk score  at time  using the following equation:

RESULTS AND EVALUATION
To evaluate the effectiveness of GRAPPEL and compare it against the existing state-of-the-art baseline methods, we use three key metrics: AUC-ROC score, specificity, and sensitivity.Additionally, we test the methods for their early risk assessment capabilities.

Experimental Setup
To evaluate our methodology, we first divide a patient's time-series BP data into windows as shown in Figure 3 and then cast the problem of early risk assessment of AHE as a classification problem.
If the window of interest (  ) is an AHE window, we consider the patient to be a positive class sample and if   is a non-AHE window, it is a negative class sample.Due to the irregular time spacing of the BP records in the MIMIC-III dataset, it becomes challenging to define the observation window () and early risk assessment window () in minutes.For instance, setting the window to 30 minutes would result in some patients having over 40 data points in the  window, while others would have only one.As a result, we define  and  based on the number of BP data points rather than minutes.
We set the total number of data points in  +  combined to 60, ensuring that  comprises at least 10 data points.We then slide  in the range of {0, 1, ..., 50} data points, which subsequently results in  having a range of {60, 59, ..., 10} data points.
To make our results readable, as discussed in Section 4.4, we convert the "number of BP data point" scale to the "time (in mins) before WoI" scale.We achieve this conversion by calculating   −    i.e,  for every patient and then averaging them across all the patient's (s).For instance, if the first patient's 's 60 data points correspond to 200 minutes and the second patient's 's 60 data points correspond to 100 minutes, then the average duration of 60 data points is 150 minutes.The x-axis of Figure 7 thus represents the average duration of  or the time before the   in minutes.
Finally, we divide our dataset into training and testing data with a 60-40 ratio.For training, we set  to 0 and O to 60, allowing the methods to utilize the full duration of time-series BP data for optimal learning.Since our methodology focuses on risk scoring rather than classification, we introduce a threshold variable in our to transform it into a classifier model.Using the training data, we determine the most optimal threshold for our model.If the AHE risk score surpasses the threshold at any time within the O, we classify the   as AHE indicating that the patient will have AHE as per high risk scores.Additionally, we utilize the training dataset to find the optimal values for , , and .Through experimentation, we heuristically determine that our optimal threshold is 24.7, while the optimal values for , , and  are 60, 100, and 80, respectively.

Existing classification methods
While there is limited work on predicting AHE, we use the most popular models that have been used for hypertension detection for our comparisons, as explained previously in Section 2. To evaluate the MEAN method, we find the moving average of the time-series BP data and make a positive AHE prediction if the moving average surpasses a threshold.We find the best threshold for optimal prediction through the ROC curve analysis of the training dataset.

4.2.2
Machine Learning (ML):.The ML methods do not incorporate the temporal component of data into their models, thus requiring a transformation of the time-series BP data into a feature space.Following a similar approach as described in the paper [3], we convert our BP data into 15 statistical features including mean, standard deviation, and histogram-based features with the frequency of measurements divided into 13 bins.The bin boundaries are defined as [10,60,80,90,95,100,105,110,120,130,140,170,200,250].This binning strategy assigns more bins of elevated BP values to the histogram compared to the lower BP values.

4.2.3
Deep Learning (DL):.In time-series classification, long short-term memory (LSTM) is a commonly employed technique with promising results, as elaborated in Section 2.3.However, LSTM requires regularly spaced data points, creating a challenge for training and testing.To overcome this limitation, we incorporate timestamps as additional features alongside the BP data when feeding them into the LSTM model.Additionally, to address the class imbalance of our dataset, we employ an oversampling technique [5] that ensures a balanced training dataset for both ML and DL models.Balanced datasets enable the most optimal performance for ML and DL algorithms.The LSTM architecture is configured with an input size of 2, a hidden size of 32, and an output size of 2. For training, we utilize the Adam optimizer with a learning rate of 0.001, and the loss function employed is cross-entropy.The model is trained for 200 epochs.

Evaluation
To evaluate the efficiency of our model and the baseline methods for the early identification of AHE risk, we ask the following question: Can a model predict, based on the input of an observation window (), whether the window of interest (  ) is an acute hypertensive episode (AHE) or not, as early as before the early risk assessment window ()?.We define the positive class as the time-series BP data of patients where the   is an AHE window and the negative where   is a non-AHE window.We measure a model's performance over three metrics: (1) AUC-ROC score, (2) specificity, and (3) sensitivity.
All of these metrics are applied for each observation window () value that ranges from 10 data points to 60 data points.We find the best AUC-ROC score for every model to present the best overall result and to keep the comparison fair, as shown in Figure 6.We also plot the AUC-ROC score, sensitivity, and specificity results for every value of O in Figure 7. Negative values on the X-axis reflect the time before the start of WoI.For example, if we are looking at a metric at -30 minutes, it reflects how well the model performs in assessing the risk of AHE 30 minutes in advance.

Discussion
As shown in Figure 6, GRAPPEL outperforms all the other models in the AUC-ROC score and sensitivity and comes ina close second to random forest with respect to specificity.We observe that the models XGBoost, random forest, and SVM performed relatively well and have a higher specificity than sensitivity.This means that they are better at predicting non-AHE events than AHE events.Ideally, both specificity and sensitivity are important in critical care, since any misdiagnosis can be fatal for the patient.GRAPPEL shows least differences between sensitivity and specificity.
While the AUC-ROC score reflects a method's overall performance, our primary goal is to evaluate the early detection capability of our model compared to the baseline models.Figure 7 shows the early detection performance of all the methods.We observe that the AUC-ROC scores, sensitivity, and specificity of GRAPPEL stay stable along the axis which reflects the time before the   that can be an AHE window or a non-AHE window.The AUC-ROC score for GRAPPEL ranges from 0.91 at -170 minutes before the   to 0.94 at -20 minutes.We see a similar trend with specificity and sensitivity as well.
While other models show a similar trend like GRAPPEL, we see a completely different pattern with LSTM.The AUC-ROC score for LSTM fluctuates substantially, but the peak score occurs around -90 minutes before   .We also see the specificity rise closer to   and the sensitivity dropping rapidly compared to other models.
An analysis of GRAPPEL, as shown in Figure 5, illustrates the real-time AHE risk scores generated by GRAPPEL across the timeseries BP data.We observe in Figure 5a, that the AHE risk score does not spike even though the BP data of the patient shows an isolated spike.The local outlier factor ( ) of the nodes with high BP values raises the AHE risk score temporarily, but at a later time, new nodes with lower BP values bring the score down again.In contrast, we observe in Figure 5b that sustained high BP value nodes around -20 minutes before   incrementally increase the AHE risk scores.

Figure 1 :
Figure 1: Motivation for early risk assessment of AHE because (1) the gravity of the health risks observed in critical care is severe and often life-threatening and (2) the condition of patients within these environments deteriorate rapidly, leaving very little time for life-saving medical interventions if detection is delayed.Thus, early risk assessment provides healthcare professionals the necessary time to evaluate the risks, rapidly determine a course of medical intervention, and proactively administer the medical intervention, thereby preventing a severe medical event from occurring.One condition in critical care units that exemplifies this urgency is an acute hypertensive episode (AHE)[11].As shown in Figure1, AHE refers to a period of high blood pressure (BP), which if not identified early, can lead to severe consequences, including irreversible organ damage, stroke, and even potential mortality[11,17].The early risk assessment of AHE in patients enables doctors to proactively administer milder interventions, e.g., oral BP-lowering agents, to prevent such episodes from occurring in the first place.Previous research efforts for hypertension prediction include clinical threshold-based methods[16,17], machine learning (ML)[4,10,12,13], and deep learning (DL)[14], but these methods have their limitations.For example, as illustrated in Figure1, thresholdbased methods may mistakenly associate isolated BP spikes that are above a predefined threshold with AHE, resulting in false positives.These isolated spikes can be caused by phenomena such as the white coat syndrome[9], where a patient experiences a temporary spike in BP due to anxiety in a clinical setting without necessarily being at risk of encountering AHE in the future.ML methods face challenges in processing time-series data and typically rely on static statistical features computed over batches of data, which restricts their ability to provide real-time and early risk assessment.DL methods, such as long short-term memory (LSTM), exhibit promise in achieving high prediction accuracy, but they show diminished performance with irregularly spaced time-series data, as they rely heavily on regularly spaced time steps.Previous studies have highlighted the effectiveness of risk-scoring methods, such as the early warning score (EWS)[2], to facilitate early risk assessment.These methods are based on the premise that the decline in patient health within critical care settings often manifests as abnormal deviations in continuously monitored physiological vitals such as pulse, BP, and respiratory rate.By analyzing

Figure 3 :
Figure 3: Early risk assessment of AHE

𝜷 is a weight scaling factor 1 )𝛼 is a time scaling factor 2 )Figure 4 :
Figure 4: GRAPPEL: real-time generation of AHE risk scores using a time-evolving graph

Figure 5 :
Figure 5: AHE risk score of non-AHE and AHE patients 4.2.1 Statistical:To evaluate the MEAN method, we find the moving average of the time-series BP data and make a positive AHE prediction if the moving average surpasses a threshold.We find the best threshold for optimal prediction through the ROC curve analysis of the training dataset.

Figure 6 :
Figure 6: Comparison of GRAPPEL with the baselines