Assessing the Efficacy of CEA and CA19.9 in Predicting Ovarian Cancer: Based on Logistic Regression Analysis

Due to its late-stage detection and high fatality rates, ovarian cancer poses a significant health risk. This work uses logistic regression to create a prognostic model for ovarian cancer, with an emphasis on determining CEA and CA19.9's potential as biomarkers. A dataset of clinical information from ovarian cancer patients is used to build the model. According to the findings, the CEA shows a significant role in predicting ovarian cancer, and the well-known tumor marker CA19.9 does not show a statistically significant correlation with ovarian cancer. The logistic regression model's prediction accuracy does not significantly improve when CA19.9 values are included. Additionally, this research clarifies the link between CEA, CA19.9, and ovarian cancer and reveals a comparatively effective model to predict ovarian cancer. The results emphasize the role of CEA in ovarian cancer prediction and reach a contradictory result with some articles and imply that CA19.9 does not play a significant role in the early identification of ovarian cancer. This study analyzes the role of CEA and CA19.9 in diagnostic precision and patient outcomes, which advances our understanding of ovarian cancer prediction.


INTRODUCTION
According to Bray et al. [1], ovarian cancer (OC), one of the most lethal malignancies in women, accounts for 3-4% of all female cancer diagnoses and kills 13,940 individuals annually [2].Ovarian cancer is difficult to recognize in the early stages and is always diagnosed at advanced stages with nonspecific and diffuse symptoms, such as abdominal bloating and swelling, increased urinary urgency, etc.As a result, the long-term survival rate is about 90% if ovarian cancer is detected early and limited to the ovaries, but the survival rate is relatively low due to the lack of sensitive and effective methods to detect ovarian cancer in the early stages.Consequently, research towards a suitable prediction model and exploring reasonable biomarkers for early detection and treatments for ovarian cancer is required to increase the survival rate of patients with the disease.
To detect ovarian cancer early, various biomarkers are explored.Carbohydrate antigen 125 (CA-125) is the most widely used ovarian cancer biomarker [3], it is a component of the female reproductive tract epithelia in humans, which enables to create a hydrophilic environment to protect people from the invading particles and agents [4].In a normal situation, the value of CA-125 is less than 35U/mL [5], however, the level of CA-125 typically increases in the blood of ovarian cancer patients [4].Human epididymis secretory protein 4 (HE4) is also a noticeable biomarker for the prediction of ovarian cancer [3].Except for these widely used biomarkers, some antigens and proteins are also considered possible biomarkers for ovarian cancer prediction.Carbohydrate antigen 19-9 (CA 19-9) can play an important role in predicting pancreatic cancer [5], and it has also been used in ovarian cancer detection, Carcinoembryonic antigen (CEA) is a widely used tumor biomarker, which is significant for colon cancer, breast cancer prediction and prognosis [6].
There are also some limitations among these biomarkers.For example, approximately 20% of ovarian cancers do not express elevated CA-125 levels [3], the elevation of CA-125 levels is also expressed in other diseases [5], thus, only using CA-125 to detect ovarian cancer is not sensitive and specific.In addition, the elevation of HE4 is not expressed in every type of ovarian cancer [4].Concerning Khan et al. [6], they evaluated the utility of CA 19-9 as an ovarian cancer biomarker, the study found that CA 19-9 is better than CA-125 in predicting the underlying histological subtype of ovarian tumor, however, this study only contains 45 satisfied samples which limits the effectiveness, the larger sample size is needed to evaluate the usefulness of CA 19-9.
Ovarian cancer screening strategies based on biomarkers have been investigated; investigations primarily concentrate on CA-125 and HE4.Research is required to determine whether the combination of CA-125, HE4, CA 19-9, and CEA is beneficial in predicting ovarian cancer because CA 19-9 is still under investigation as a potential biomarker.With the use of the four biomarkers, this study will create a suitable model to evaluate the sensitivity and potency of ovarian cancer prediction and explore the effectiveness of CA19.9 in the prediction of ovarian cancer.

METHODOLOGY
This study analyzed data from Lu et al. [8], they collected the dataset from the Third Affiliated Hospital of Soochow University ranging from July 2011 to July 2018, which covers 171 ovarian cancer patients and 178 patients with benign ovarian tumors, and the dataset shows the content of various biomarkers, particulates, and substances in each patient.Following surgery, pathology determined the diagnoses for each patient.None of the patients with ovarian cancer underwent radiotherapy or chemotherapy prior to surgery.The World Health Organization (WHO) classification system was used to determine the histological type.
Different gastrointestinal neoplasms are diagnosed using the glycoprotein tumor marker CA 19-9.Additionally, CA 19-9 is very important in determining the prognosis of biliary tract, pancreatic, and colorectal cancers.It is a more accurate tumor marker for pancreatic cancer, though.There is evidence that CA 19-9 is elevated in non-gut neoplasms like dermoid cysts and even non-neoplastic inflammatory conditions like inflammatory bowel disease, liver cirrhosis, cystic fibrosis, and so on, in addition to its diagnostic and prognostic roles in gastrointestinal malignancies.Rare studies have been done on the utility of CA 19-9 in the diagnosis of ovarian cancers [7].
The application of CEA has been described previously.In essence, for the detection of solid malignant carcinomas like epithelial ovarian cancer, colorectal cancer, and lung cancer, CEA was one of the most frequently employed serum biomarkers [9].A single blood biomarker of CEA did not have a high adequate diagnostic sensitivity or specificity to identify ovarian cancer, though.When multiple serum biomarkers are detected together, the diagnostic value may be increased [9].
In addition, HE4 and CA-125, which are the two most widely accepted biomarkers in the prediction of ovarian cancer, age, and menopausal status are also considered in the establishment of the model.
To analyze the dataset, the missing values are first filled up using medians, and summary statistics for important biomarkers are calculated.Violin plots are used to illustrate data distribution, highlighting cancer-type-specific biomarker trends.Calculated correlation coefficients between biomarkers are shown in a heatmap.To evaluate the model's predictive ability, additional analysis entails creating baseline tables, producing ROC curves that include the unit variable and selected variables, and calculating AUC values.The kind of ovarian cancer is the response variable, while the predictor factors are age, biomarker concentrations, and menopausal status.The coefficients of many logistic regression models are calculated and fitted, which provides information about the extent of the connection between the selected variables and the possibility of ovarian cancer.Nomograms are developed to show how predictors and cancer-type probability relate to one another.The p-value will be discussed to provide information concerning the significance of the selected variables.To sum up, the statistical analysis handles missing data, investigates data distributions, evaluates correlations, and creates prediction models in a methodical manner.The application of logistic regression provides an understanding of the impact of biomarkers and demographic variables on the type of ovarian cancer.The analysis offers a basis for comprehending the connections in the information and the potential for selecting trustworthy biomarkers for ovarian cancer.

RESULTS
There were 349 patients who had either benign or malignant ovarian tumors.In order to construct the models for this investigation, CA-125, HE4, CEA, CA19.9, age, and menopausal state were taken into account.The violin graphs show that the likelihood that the patients have ovarian cancer increases with increasing concentrations of CA-125, HE4, CEA, and CA19.9, respectively.Similar to this, it is seen that older patients have a higher risk of developing ovarian cancer.Regarding menopausal status, women who are postmenopausal are less likely to get ovarian cancer than those who are premenopausal, who are more likely to do so.
The fundamental details of the chosen features in the study population are shown in Table 1.The results of violin plots are particularly supported by the median ages of patients with benign ovarian tumors, which is 36, and those with ovarian cancer, which is 53.Similar findings can be demonstrated for other variables; ovarian malignancies and benign ovarian tumors differ noticeably in their CA-125 and HE4 concentrations, and an increase in CA19.9 and CEA levels is only marginal.Table 2 shows the coefficients of each selected model.
According to Figure 1, HE4 displays the highest true positive rate and the lowest false positive rate.In contrast, the lowest true positive rate and largest false positive rate are seen in CA19.9.The second most accurate biomarker among the chosen variables is CA-125.However, the unit variables have a comparatively high false positive rate, making it insufficient to test for ovarian cancer just using unit variables.Furthermore, the two best-fitted logistic models (Logic 1 and Logic 2) are examined (Figure 2), together with the ROC curve for each chosen variable.On the basis of various combinations of the target variables, including the modification of the three different types of biomarkers, six logistic models are developed.Using ROC curve expressions and AUC values, the two best-fit models are chosen.All of the chosen factors are included in Logic 1, while Logic 2 takes age, CA-125, CEA, HE4, and menopausal state into account.With a 25% false positive rate, the ROC curve graph shows that Logic 1 and Logic 2 have a true positive rate of about 90%, which is higher than the unit variables.Another logistic model (Logic 3) was fitted to detect the importance of CEA which consists of age, CA-125, CA19.9, HE4, and menopausal state.

DISCUSSION
According to the data in Table 2, the exponential results for each variable in Logic 1-aside from the intercept-are 0.945 for age,    and 1.064 for menopause.The value of CA19.9 is the closest to 1 and the value of CEA is the lowest of the four biomarkers, indicating that CEA has a reasonably substantial impact on ovarian cancer prediction whereas CA19.9 is the least important of the variables chosen.Age and HE4 levels also contribute significantly to the diagnosis of ovarian cancer.The ROC curves for the two logistic models are likewise explained by this statistical finding.Given that the two models' curves are remarkably similar and the only variation between them is whether or not CA19.9exists, thus, CA19.9 levels have little impact on the output of logistic models.However, in Logic 3, the loss of CEA as a detection marker has a comparatively obvious decrease in the accuracy of the prediction model, which means CEA has a significant impact on the prediction precision.Compared to unit variable curves, the accuracy of prediction has risen significantly in logistic model curves, and the combination of all selected variables has the highest accuracy when compared to alternative combinations of the variables.
A meta-analysis including 12 articles concerning the combined detection of CA-125, CA19.9, and CEA for detecting ovarian cancer was conducted by Guo et al. [9], they concluded that ovarian cancer diagnosis with a single detection serum of CA125, CA19.9, or CEA has limited sensitivity and specificity, which reached a similar result with the analysis of the unit variable of my study.Wan et al. [10] suggested that It is crucial to identify CEA with other markers to prevent missed diagnoses because the positive rate of CEA in patients with serous ovarian cancer is low and the sensitivity of CEA in detecting ovarian cancer was only 51.64% in their study, and they concluded that serum CA125, HE4, and CEA levels were significantly correlated with the onset and progression of epithelial ovarian cancer, and their combined detection was crucial for early diagnosis and ovarian cancer prognosis evaluation.Their study's findings corroborate the findings concerning CEA of my study.Regarding CA19.9, this study comes to a different result from Khan et al. [7].Based on ANOVA and an independent sample t-test, they concluded that CA19.9 is more helpful than CA-125 in the diagnosis of ovarian cancer.Khan et al. [7] only included 45 persons who fit the criteria and took part in the study, however, the dataset for my study included 349 participants, suggesting that it may have more dependability than Khan et al. [7].Additionally, a wide range of participants with various ages is important to determine the role that age plays in the prediction of ovarian cancer.The mean age of all the participants in their study was 46.36±6.44,and the age in my study ranged from 15 to 83.So, a better conclusion could be drawn by taking into account the effects of age.
The study's limitations should be taken into account when evaluating the findings.Future research with a bigger sample size is still required to fully understand the significance of CA 19.9 in the prediction of ovarian cancer because the precision of the estimation may be constrained by the number of participants.Considering that all participants are Chinese, the accuracy of the model results may also be constrained by the area factor.There may be more effective and efficient models that need to be looked at in future studies due to the limitations of the biomarkers that were collected.

CONCLUSION
In conclusion, my study evaluates the effectiveness of CEA and CA19.9 and finds that CEA plays a comparatively significant role in the prediction of ovarian cancer, however, CA19.9 does not play a significant role in ovarian cancer prediction.Future research is encouraged by my findings to examine the underlying biomarkers and techniques for ovarian cancer prediction.

Figure 2 :
Figure 2: ROC curves of selected logistic regression models.

Table 1 :
Basic information on Biomarkers.

Table 2 :
The coefficients of each selected model