Adolescent Idiopathic Scoliosis Patient Subphenotyping for Surgical Planning and Improved Patient Outcomes

Adolescent idiopathic scoliosis (AIS) is a complex condition characterized by abnormal spinal curvature, and surgical intervention is often required to correct the deformity. However, there is significant variability in postoperative outcomes among AIS patients, suggesting the existence of distinct subgroups within this population. Using a comprehensive dataset, we employed hierarchical clustering analysis to identify subphenotypes within the AIS patient population. The statistical analysis revealed distinct subgroups characterized by unique radiographic features. Furthermore, our study demonstrated significant differences in patient-reported outcomes (PROs) among the subphenotypes, underscoring the clinical relevance of subphenotyping. This divergence in postoperative outcomes emphasizes the need for personalized treatment approaches based on individual patient characteristics. The findings of this study contribute to the current knowledge of AIS, offer insights into patient stratification, and have the potential to guide clinicians in tailoring interventions and optimizing surgical decision-making for improved patient care.


INTRODUCTION
Adolescent idiopathic scoliosis (AIS) is a spinal disorder that affects adolescents during their growth phase, causing abnormal curvature of the spine [33].It impacts around 1-4% of adolescents globally [5], with varying degrees of severity and impact on their quality of life.Although surgical interventions have shown success in correcting spinal deformities [22], there is significant diversity in post-surgery Patient-Reported Outcomes (PROs) among AIS patients.This diversity may be attributed to various factors, such as disparities in patient demographics, preoperative characteristics, surgical techniques utilized, and individual responses to the surgery.By identifying separate subphenotypes among the AIS patient population, we can gain valuable insights into these variations and develop customized treatment plans to maximize patient outcomes.
Precision medicine is becoming increasingly important, as it recognizes the importance of personalizing medical treatments based on the unique characteristics of each patient [8,12,13,34].With the abundance of clinical and demographic data at our disposal, data-driven approaches have emerged as valuable tools for uncovering hidden patterns and subtypes within complex diseases [10,19,24,[26][27][28][29].By utilizing these methods to analyze patients with AIS after spinal deformity surgery, we can gain a deeper understanding of the factors that contribute to differing outcomes.
The main goal of this study is to use a data-driven approach to categorize adolescent idiopathic scoliosis patients who have undergone spinal deformity surgery into subgroups.We aim to identify distinct patient subgroups based on various preoperative and postoperative radiographic parameters.In addition, we seek to assess the associations between these subphenotypes and PROs.
The main contribution of our work is three-fold: • Guidance for Surgical Decision-Making.Subphenotypes provide valuable insights into which surgical approaches may be more suitable for specific patient groups.• Improved Personalized Medicine.Identifying subphenotypes will help clinicians tailor their treatment plans based on specific patient characteristics, optimizing the chances of successful outcomes and minimizing potential complications.• Identification of Predictive Biomarkers.The data-driven analysis uncovers predictive biomarkers associated with distinct subphenotypes.

RELATED WORKS
In the field of AIS and PROs, there have been several ongoing research efforts.Some researchers conducted long-term follow-up studies to assess the outcomes of spinal deformity surgery in AIS patients [2], [7].These studies examine factors such as self-image, mental health, quality of life, and functional outcomes over an extended period.Furthermore, researchers have developed predictive models and risk assessment tools to identify patients who are more likely to experience complications or have poorer outcomes following spinal deformity surgery [18], [31].Various subphenotype methods have been used to identify different subgroups of AIS patients.Identifying AIS subphenotypes can help improve the diagnosis, prognosis, and treatment of AIS.The methods for identifying AIS subphenotypes differ based on the purpose and the setting.Some of the methods include screening, classification, and subgrouping.
Screening methods aim to detect scoliosis in asymptomatic adolescents, usually in school-based programs or primary care settings.
The most common screening methods are the forward bend test [9], the scoliometer measurement [6], and the Moiré topography [30].These methods are helpful to identify spinal curvature and asymmetry, but they cannot measure the Cobb angle or classify the curve type.Horne et al. [11] review the diagnosis and management of AIS, including the use of physical examination, radiographic assessment, and treatment options such as observation, bracing, and surgery.The authors classify scoliosis as congenital, neuromuscular, or idiopathic.The authors subdivide idiopathic scoliosis by age of onset: infantile, juvenile, and adolescent.This paper characterizes AIS as a type of scoliosis that occurs in individuals who are 10 years of age or older with a lateral curve to the spine that is greater than 10 degrees with vertebral rotation.
Classification methods aim to classify AIS based on the curve type, location, magnitude, flexibility, and balance.The most widely used classification method is the Lenke classification system.Lenke et al. [14] examine the effectiveness of selective fusions for patients with adolescent idiopathic scoliosis (AIS) and different curve patterns according to the Lenke classification system.The study highlights the significance of thorough preoperative planning and clinical evaluation.The authors introduce a triad classification system for AIS that includes a curve type, lumbar spine modifier, and sagittal thoracic modifier.The research analyzes the success of selective thoracic or thoracolumbar/lumbar fusions in various curve patterns and concludes that such fusions can enhance the mobility of spinal segments in AIS patients.Other classification methods include the King-Moe classification [4], and the Risser-Ferguson method [25].Although the Lenke classification system is considered to be comprehensive, reliable, and treatment-based; its shortcomings include its applicability to patients who may have different curve patterns, degenerative changes, and global alignment issues.Also, it may have some interobserver and intraobserver variability in measuring the Cobb angles and identifying the end vertebrae.To mitigate this concern, methods that use computational techniques to learn from data and make classifications have been investigated.Liu et al. [15] use Deeplab V3+ for spine vertebrae segmentation to automatically measure the Cobb angle from spinal radiographs of AIS patients.Nonetheless, the model only uses frontal images and does not include both frontal and sagittal plane spinal images.
Subgrouping methods aim to identify subgroups of patients with AIS based on their clinical, genetic, or radiographic characteristics.The most widely used method is cluster analysis, which is a statistical method that groups patients with AIS based on their similarities.Saba Pasha and John Flynn [21] used K-means clustering of the isotropically scaled 3D spinal curves to provide an effective, data-driven method for the classification of AIS patients.
Although the literature that was highlighted showed good performance, the existing researches in the field have some limitations.Existing studies often focus on evaluating overall outcomes of spinal deformity surgery in AIS patients without specifically exploring subphenotypes within the patient population.Additionally, many existing studies primarily rely on clinical and radiographic data, neglecting the potential integration of other data sources.This study aims to incorporate various types of data, such as patient-reported outcomes and demographic factors, to provide a more comprehensive understanding of AIS subphenotypes and their implications.

METHODOLOGY 3.1 Clustering
Clustering is an unsupervised machine learning and data mining technique that groups data points together based on their similarities and differences [16].It does not rely on pre-existing labeled data, instead, it relies on the similarities and differences in the data points themselves to identify natural clusters.Hierarchical clustering is a type of clustering technique that creates a hierarchy of clusters by repeatedly merging or splitting them based on their similarity [23].The two main types of hierarchical clustering are agglomerative (bottom-up) and divisive (top-down) [23].Agglomerative methods start with each data point as a separate cluster and successively merge the closest clusters until a single cluster containing all the data points is formed while divisive methods take the opposite approach [23].
A similarity function is used to quantify the similarity or dissimilarity between pairs of data points [20].The choice of the similarity function depends on the type of data being clustered.Some commonly used similarity functions for structured data are distance function (Euclidean or Manhattan), cosine similarity, and correlation distance.
Hierarchical clustering uses the linkage method to calculate the similarity or distance between clusters when combining or separating them.The selection of the linkage method can greatly affect the clusters that are produced.Some commonly used linkage methods are single linkage, complete linkage, centroid linkage, average linkage, and Ward's method.
The agglomerative hierarchical clustering method was used for AIS patient sub-phenotyping with Euclidean distance, eq.1, as the similarity function and Ward's method, eq.2 and 3, as the linkage method.The Ward's linkage method minimizes the increase in the sum of squared distances within each cluster when merging them.

For two points, 𝐴
where  is the distance function between points and   ,   ,   are the coordinates the  ℎ point in 3 dimensions.
For two clusters   and   , where  is the distance function between clusters and   is the mean of the  ℎ cluster.Specifically, we can define   and  as: (3)

Statistical Analysis
Hypothesis testing is a statistical test widely used in scientific research, quality control, and business decision-making to make conclusions about a population based on a sample of data.Hypothesis testing can be either parametric or nonparametric depending on the assumptions and the types of data involved [32].Parametric tests are suitable when we can assume that our sample data come from a specific probability distribution, usually the normal distribution [32].These tests rely on parameters such as means and variances to make statistical inferences.Nonparametric tests, on the other hand, are used when we lack knowledge about the population distribution of the variable we are testing [32].They are often used when the data do not meet the assumptions of parametric tests.
The hypothesis testing process involves setting up two hypotheses: The middle plot depicts the visualization of clusters when k = 2, using PCA.Similarly, the right plot showcases the cluster visualization when k = 2, but using t-SNE.On the left, the silhouette plot displays the scores of each patient, and the average silhouette score obtained was 0.296, suggesting a high clustering quality.
• Null Hypothesis (H0): the hypothesis that no significant difference exists between the observed data and what would be expected under a particular assumption.• Alternative Hypothesis (Ha): the hypothesis that there is a significant difference between the observed data and what would be expected under a particular assumption.
After setting up these hypotheses, statistical tests are performed on the sample data to determine whether to reject or fail to reject the null hypothesis.The test results are expressed in terms of a p-value, which is the probability of obtaining a sample result as extreme as or more extreme than the observed result, assuming that the null hypothesis is true.A significant p-value yields a rejection of the null hypothesis and therefore, we conclude that the alternative hypothesis holds.Statistical tests involve comparing the observed data to a null hypothesis, which assumes that there is no significant difference or relationship between the variables being studied.Several statistical tests are available, each suited for different data types and research questions.Examples of parametric statistical tests are student's t-tests, z-tests, Analysis of Variance (ANOVA), chi-square ( 2 ) tests, correlation tests, and regression analysis [32].Examples of nonparametric tests include the Mann-Whitney U test, the Wilcoxon signed-rank test, the Kruskal-Wallis test, and Spearman's rank correlation coefficient [17].The appropriate statistical test is chosen based on the nature of the data and the research question.
In case of skewed data, it is important to consider appropriate statistical test correction methods to ensure accurate and reliable results.A few correction approaches involve: • Non-parametric tests: Tests that are non-parametric tend to be more resilient to skewed data as they don't depend on particular distribution assumptions.
• Transformations: One way to make the distribution more symmetric is by implementing mathematical transformations such as logarithmic, square root, or Box-Cox.• Bootstrapping: Bootstrapping is a technique for resampling that allows estimation of the sampling distribution of a statistic, without making any assumptions about the underlying distribution.

Clusters visualization
Visualizing clusters is an essential method for identifying patterns, assessing the efficacy of clustering algorithms, and gaining deeper insights into data structure.To visualize clusters in a lower-dimensional space when handling high-dimensional data, dimensionality reduction techniques can be utilized.We showcase the efficiency of our representation clustering approach by presenting two-dimensional embeddings generated by Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE).PCA reduces the dimensions of large data sets by identifying the directions in the data with the highest variance, allowing the data to be projected onto a lower-dimensional space without losing important information from the original data set [1]. t-SNE is another technique for reducing the dimensionality of large data sets by converting the distances between data points into probabilities that represent how likely they are to be neighbors and then minimizing the difference between these probabilities in the highand low-dimensional spaces [3].

EXPERIMENTS 4.1 Dataset
This research is based on data collected from adolescent patients at Shriners Children's Hospital, following the guidelines of Setting Scoliosis Straight (SSS), Surgeon Performance Program (SPP),

Implementation Details
Agglomerative hierarchical clustering was employed to identify distinct subphenotypes within the AIS patients after spinal deformity surgery.Euclidean distance was used as the distance metric to calculate the similarity or dissimilarity between patients.This choice was suitable for the numerical variables in the dataset, providing a measure of the overall distance between patients based on their feature values.Ward's linkage method was employed to determine the distance between clusters during the hierarchical clustering process.The optimal number of clusters was determined by analyzing the structure of the dendrogram in figure 2 and identifying points of significant merging.Statistical analysis was conducted to assess the differences between subphenotypes.For this study, we assessed the significance level using 0.05.Mann Whitney U and Kruskal non-parametric tests were used to evaluate the significance of differences in numerical radiographic parameters and PROs among subphenotypes.While Chi-square tests were used to evaluate the significance of differences in categorical radiographic parameters among subphenotypes.

Distinct Subphenotypes within the AIS Patients after Spinal Deformity Surgery
The hierarchical clustering analysis revealed the presence of distinct subphenotypes.Based on the evaluation results, 2 and 3 clusters were identified as optimal solutions due to clear separation and high internal cohesion, with an average silhouette score of 0.296 and 0.091, and Davies-Bouldin index of 1.776, and 2.778 respectively; as shown in figures 3 and 4.

Differences in PROs per Subphenotypes
PROs are evaluations of a patient's health and personal experiences, as reported directly by the patient themselves.In our study of the potential two distinct subphenotypes, we observed notable differences in PROs as shown in table 2. These differences shed light on the varying postoperative experiences 1 https://www.srs.org/professionals/online-education-and-resources/patientoutcome-questionnairesRegarding the potential three distinct subphenotypes from table 4, we notice that Subphenotype 0 reported a higher pain relief score, lower satisfaction, and lower self-image compared to Subphenotype 1, and Subphenotype 2. In contrast, patients assigned to Subphenotype 1 demonstrated higher Mental Health, Satisfaction, and Self-Image Scores as shown in table 4.

Specific Radiographic Parameters that Differentiate Subphenotypes
Further analysis of post-radiographic parameters among the distinct subphenotypes identified several key variables that demonstrated statistically significant differences and contributed to subphenotype differentiation.
In the case of two distinct subphenotypes, subphenotype 0 exhibited significantly lower Lateral Radiographs T5 -T12 compared to subphenotype 1 as shown in table 3. Additionally, the Posterior/Anterior Radiographs (PAR) Thoracic Apical Translation to CSVL, PAR Thoracic Curve, and PAR Upper Thoracic Curve differed significantly across subphenotypes.Subphenotype 0 showed  The middle plot depicts the visualization of clusters when k = 3, using PCA.Similarly, the right plot showcases the cluster visualization when k = 3, but using t-SNE.The silhouette plot on the left displays the scores of each patient, with an average silhouette score of 0.091 since there is not a distinct distance between clusters 0 and 2. For the case of three distinct subphenotypes, subphenotype 1 remains identical to the distinct cluster 1 identified in  = 2, the biggest difference here is the previous cluster 0 in  = 2 gets split into subphenotype 0 and 2 in  = 3 as shown in table 7. Subphenotype 0 indicated significantly lower Lateral Radiographs Junctional Kyphosis -Distal compared to subphenotype 1 and subphenotype Subphenotype 0 has a significantly lower absolute difference in T5-T12 measurements before and after surgery compared to subphenotypes 1 and 2. Subphenotype 1, on the other hand, has a higher absolute difference in T5-T12, Coronal C7 to CSVL, and Upper Thoracic Curve measurements before and after surgery compared to the other two subphenotypes.
Upon analyzing figures 3 and 4, we observed that subphenotypes 0 and 2 in figure 4 are a split of subphenotype 0 in figure 3. The main distinguishing radiographic features are ThL-Lumbar Apical Translation, Thoracic Curve, and Upper Thoracic Curve.Subphenotype 2 displays a significantly lower absolute difference in Thoracic Curve and Upper Thoracic Curve, but a higher absolute difference in ThL-Lumbar Apical Translation measurements before and after surgery compared to subphenotypes 1 and 2.
According to the SRS-22 measurements, there are significant differences between subphenotypes in terms of mental health, pain, satisfaction, and self-image SRS domains.As shown in table 8, Subphenotype 0 has a higher score in improved mental health compared to subphenotypes 1 and 2. Subphenotype 2 has the highest score in improved pain, while subphenotype 1 has a higher score

DISCUSSION
To further highlight the clinical relevance of the identified subphenotypes and their impact on postoperative outcomes, we present a case study involving two case pairs.
In our study of the potential three distinct subphenotypes, we identified two patients in subphenotype 1, patient Id1 and patient Id2, with distinctive preoperative characteristics and demographic profiles, 12 and 15 years old African American male and white female.That is, Lateral C7 to Sacrum of 10, and 28, Lordosis of 64.7 and 52.5, T2-T5 of 22.10 and 12.50, T5-T12 of 14.70 and 21.80, Lumbar bend of 7.9 and 0.49, Thoracic Apical to C7 Plumb of −9.31 and 26.7, Thoracic Bend 44.9 and 19.6, and, Upper Thoracic Curve of 21.6 and 42.2.Both patients underwent surgery to correct severe AIS curvature.Patients Id1 and Id2 experienced excellent postoperative outcomes, with significant improvements in their mental health scores increasing from 3.88 and 3.84 to 5 each.Both patients also reported high levels of satisfaction, with scores of 5 each from 3.8 and 3.9, and improved self-image, with scores of 4.40 and 4.80 from 3.16 and 3.04 respectively.Despite their distinctive preoperative and surgical profiles, this case pair demonstrates the existence of substantial within-cluster conformity in postoperative outcomes.
In subphenotype 0, we identified another case pair, patient Id3, and patient Id4 who exhibited similar preoperative characteristics and demographic profiles.That is, C7 to Sacrum of 35 and 41, Lordosis of 47.10 and 50.90, T10 -L2 of 9.86 and 8.4, T5 -T12 of 2.16 and 3.07, Coronal C7 to CSVL of 1.99 and 1.47, Thoracic Apical to C7 Plumb of 31.50 and 39.90.These two patients underwent the same surgery but did not have a great mental health score after surgery, only 4.04 each.Additionally, the self-image score was also low 4.08 each.

CONCLUSION
Data-driven subphenotyping of AIS patients after spinal deformity surgery sheds light on the heterogeneity within this patient population and its implications for postoperative outcomes.By utilizing a comprehensive dataset and employing hierarchical clustering analysis, we successfully identified distinct subphenotypes.The findings of our study contribute to the understanding of AIS by highlighting the importance of subphenotyping in personalized treatment approaches.We demonstrated that patients from different subphenotypes exhibited distinct postoperative outcomes, underscoring the potential of subphenotyping as a predictive tool for personalized treatment strategies and optimizing surgical decision-making.
The identification and characterization of specific radiographic parameters that differentiate subphenotypes further enhance our understanding of the anatomical variations and spinal alignments associated with each subphenotype.Furthermore, our study highlights the clinical relevance of subphenotyping by demonstrating significant differences in patient-reported outcomes (PROs) among the identified subphenotypes.These findings have important implications for treatment planning, allowing for a more targeted approach to patient care.Our study contributes to the growing body of knowledge on AIS by employing a data-driven subphenotyping approach to better understand the heterogeneity of AIS patients after spinal deformity surgery.The insights gained from this study have the potential to inform personalized treatment strategies, optimize patient selection for specific interventions, and improve overall patient care in AIS management.

Figure 1 :
Figure 1: Analysis schema: (a) Data collection for subphenotype derivation.(b) Data pre-processing for clustering analysis.(c) Derivation of subphenotypes via agglomerative hierarchical clustering.Clustering results were validated using silhouette score, Davies-Bouldin index, and (d) Statistical analysis to identify biomarkers.(e) Dimensionality reduction to interpret subphenotypes

Figure 2 :
Figure 2: Dendrogram from hierarchical clustering to find the number of clusters (k): Dendrograms are tree-like diagrams that represent the arrangement of data points based on their similarity or dissimilarity.

Figure 3 :
Figure3: Agglomerative Clustering: The middle plot depicts the visualization of clusters when k = 2, using PCA.Similarly, the right plot showcases the cluster visualization when k = 2, but using t-SNE.On the left, the silhouette plot displays the scores of each patient, and the average silhouette score obtained was 0.296, suggesting a high clustering quality.

Figure 4 :
Figure4: Agglomerative Clustering: The middle plot depicts the visualization of clusters when k = 3, using PCA.Similarly, the right plot showcases the cluster visualization when k = 3, but using t-SNE.The silhouette plot on the left displays the scores of each patient, with an average silhouette score of 0.091 since there is not a distinct distance between clusters 0 and 2.

Table 1 :
Description of radiographic measurements, including general radiographic information, posterior/anterior radiographic measurements, lateral radiographic measurements, and trunk shape.The dataset includes preoperative and postoperative follow-up information from 428 patients, including radiographic data, patient-reported outcomes (PROs), and demographic data such as age, race, smoking history, comorbidities, and neurological results.The PROs were collected using the Scoliosis Research Society 22R instrument (SRS-22R) questionnaire1, which measures health-related quality of life in patients with adolescent idiopathic scoliosis across five domains namely function, mental health, pain, satisfaction, and self-image.The curated radiographic parameters from the full-length, posterior/anterior, and lateral spine radiographs were manually measured by Clinicians.A full description of radiographic features is shown in Table1.

Table 2 :
Mann Whitney U non-parametric test results comparing clusters of SRS-22 questionnaire

Table 4 :
Kruskal non-parametric test results comparing three clusters of SRS-22 questionnaire

Table 7 :
Kruskal non-parametric test results on 6 months post-operative numerical radiographic parameters to compare three clusters

Table 8 :
Pre-and post-operation SRS-22 questionnaire score in comparing three clusters.