Breast Cancer Detection with Topological Machine Learning

Screening for breast cancer using mammograms and ultrasound images is an essential but time-consuming and expensive process that requires a trained clinician’s interpretation. To address this issue, machine learning (ML) methods have been developed in recent years as clinical decision-support tools. However, most of these algorithms face challenges related to computational feasibility, reliability, and interpretability. We present a new approach for feature extraction in mammograms and ultrasound images using topological data analysis (TDA) methods. The proposed method uses persistent homology to capture distinct topological patterns in healthy and unhealthy patient images, which are then transformed into powerful feature vectors. These vectors are combined with standard ML techniques to create the Topo-BRCA model, which provides competitive results with state-of-the-art deep learning (DL) models in several benchmark datasets. Unlike most DL models, Topo-BRCA does not require data augmentation or preprocessing and is effective for both small and large datasets. Additionally, the topological feature vectors can easily be integrated into future DL models to enhance their performance further.


INTRODUCTION
Breast cancer is a global health problem with a significant impact on women's health, as it is the most prevalent cancer and leads to more disability-adjusted life years (DALYs) lost than any other type of cancer [39].In 2020 alone, 2.3 million women were diagnosed with breast cancer, resulting in 685,000 deaths worldwide.Additionally, 7.8 million women who were previously diagnosed with breast cancer were alive at the end of 2020.Early diagnosis is crucial for the successful treatment of breast cancer, and mammography is a commonly used screening method to detect it [12].However, due to limitations such as a small dataset, long pre-processing times, and the lack of interpretability in decision-making, the existing machine learning (ML) algorithms for breast cancer detection have not been implemented at the clinical stage.
To address this challenge, we propose using topological data analysis (TDA) techniques.TDA has been effective in feature extraction and can reveal hidden patterns in data.By identifying relevant topological features and combining them with ML models, we can create more reliable and transparent models for breast cancer detection.Our proposed approach aims to improve early diagnosis, which is crucial for successful treatment.In 2020 alone, 2.3 million women were diagnosed with breast cancer, leading to 685,000 deaths worldwide.Our work seeks to contribute to reducing these statistics and improving women's health globally.
Our proposed method uses TDA to identify distinct topological patterns in breast ultrasound and mammogram images that distinguish healthy and cancerous images (Figure 4 and 5).The main tool used in TDA, persistent homology, converts these patterns into highly effective feature vectors that, when paired with suitable ML models, produce outstanding results in benchmark datasets (See Table 2).Our model is computationally efficient and robust, does not require any data augmentation or pre-processing, and can easily be combined with deep learning methods to boost their performance.

Our contributions:
We study a novel approach to breast cancer diagnosis by applying the latest topological data analysis methods.
• By studying the evolution of topological patterns in mammogram and ultrasound images, we observe that normal and abnormal images produce distinct topological patterns (Figure 4 and 5).• With our unique topological feature extraction method, our computationally feasible model gives competitive results in detecting breast cancer for mammograms, it falls behind for ultrasound images.• With our powerful topological descriptors, our proposed model is highly explainable and interpretable (Section 4.2).• Our topological feature vectors provide a key ingredient for any future ML and DL models in the domain to boost their performance and improve robustness.

RELATED WORK 2.1 ML in Breast Cancer Detection
In recent years there has been significant enhancement in ML tools and they have been widely employed in breast cancer image classification.In [43]  On the other hand, in [2], the authors proposed the Full resolution Convolutional Network (FrCN) with a CAD framework for X-ray mammograms.To detect the mass as malignant or benign and classify it.The publicly accessible and annotated INbreast database was used for the calculation of the suggested integrated CAD framework in terms of accuracy of classification, identification, and segmentation.
Similarly, in [9] they study the consequence of transfer learning and by experimentation establish the fine-tuning tactic to implement when working out a CNN model.They fine-tuned some of the recent, most powerful CNNs and achieved better results compared to other state-of-the-art methods which classified the same public datasets.After pre-processing and normalizing all the obtained Regions of Interest (ROIs) from the full mammograms, they combined all the datasets to create one large dataset of images and applied it to fine-tune the CNNs.In [27] new CAD system is proposed for classifying benign and malignant mass tumors in breast mammography images.The deep convolutional neural network (DCNN) architecture named AlexNet is used and is fine-tuned to classify two classes.The last fully connected (fc) layer is connected to the support vector machine (SVM) classifier to obtain better accuracy on publicly available datasets (1) the digital database for screening mammography (DDSM); and (2) the Curated Breast Imaging Subset of DDSM (CBIS-DDSM).
There are several other works on breast cancer diagnosis with deep learning methods.For a thorough review and comparison of these approaches, see excellent surveys [34], [17], [31]

TDA in Image Analysis
Persistent homology (PH) has been quite effective for pattern recognition in image and shape analysis in the past two decades.In medical image analysis, PH produced power results in cell development [22], tumor detection [10], neuronal morphology [19], brain artery trees [5], fMRI data [30], and genomic data [6].
See the excellent survey [36] for a thorough review of TDA methods in biomedicine.For a collection of TDA applications in several domains, see TDA Applications Library [16].For a nice introduction to applications of TDA in biology, see the book on the subject [26].

BACKGROUND 3.1 Persistent Homology
Persistent homology is a mathematical concept and computational tool used in topological data analysis (TDA) to analyze and understand the shape and structure of complex data sets.In general, it can be applied to various forms data, including point clouds, graphs, and images.PH allows us to systematically assess the evolution of various hidden patterns in the data as we vary a scale parameter [8].While PH is a very effective data mining method for many data types (e.g., point clouds, networks), here we only summarize PH setup in image data case, in particular cubical persistence.For details of other forms of data, see [13].Next three images X 0 , X 50 and X 100 are binary images in the filtration where in X  , all pixels with color value ≤  are activated (black) where all others are not activated (white). 0 (X  ) and  1 (X  ) represent the number of components and loops in X  respectively.
In practice, PH machinery is a 3-step process.For a given image X (say  ×  resolution), the first step is to create a nested sequence of binary images (aka cubical complexes).To create such sequence, one can use grayscale (or other color channels) values    of each pixel Δ   ⊂ X .In particular, for a sequence of grayscale values 1).In other words, we start with empty  ×  image, and start activating (coloring black) pixels when their grayscale value reach the given threshold.This is called sublevel filtration for X with respect to given function (grayscale in this case).Then in the second step, PH captures the evolution of topological features in this sequence, and records as persistence diagram (PD).In particular, if a topological feature  first appears in X  and disappears in X  with 1 ≤  <  ≤  , we call   =   birth time and   =   the death time of the topological feature .Then, PD is the collection of all such 2-tuples   (X) = {(  ,   )} where  represent the dimension of the topological features.The difference   −   is called lifespan of the topological feature.
PDs being collection of 2-tuples are not very practical to be used with ML tools.Instead, a common way is to convert PD information into a vector or a function, called vectorization [13], which is the final step of PH process.A common function for this purpose is the Betti function, which basically keeps track of the number of "alive" topological features at the given threshold.In particular, the Betti function is a step function with  0 (  ) the count of connected components in the binary image X  , and  1 (  ) the number of holes (loops) in X  .In ML applications, Betti functions are usually taken as a vector ì   of size  with entries  (  ) for 1 ], e.g.,  0 (1) = 3 is the count of components in X 1 and  1 (4) = 2 is the count of holes (loops) in X 4 .There are several ways to convert PDs into a vector, e.g, Persistence landscapes, Persistence Images, Silhouettes [13], but to keep the model interpretable, we specifically chose to use Betti functions in this work as others are not easy to interpret.

BREAST CANCER DIAGNOSIS WITH TDA 4.1 Topo-BRCA for Breast Cancer Detection
In the flowchart (Figure 3), we summarized our machine learning model.For a given breast cancer ultrasound or mammogram image X, we first get its grayscale image.Then, by constructing a sublevel filtration with a grayscale function, we obtain the persistence diagrams PD 0 (X) and PD 1 (X).As explained in Section 3.1, the filtration is nothing but a sequence of black-white figures where the dark points represent the pixels with a grayscale value less than the given threshold (Figure 2).Then, the persistence diagram PD 0 (X) is the summary of 0-dimensional topological features (connected components in the figures in the sequence), and PD 1 (X) s the summary of 1-dimensional topological features (loops in the figures in the sequence).Then, we induce functions (topological summaries) out of these persistence diagrams to use ML tools more effectively.
In this paper, we mainly used one type of topological summaries to convert the induced persistence diagrams from breast cancer ultrasound or mammogram images to topological fingerprints, i.e., induced feature vectors/functions as a unique identifier of an input image.These vectorization methods with different parameters gave us two topological fingerprints for each input image, i.e., Betti-0 and Betti-1 functions.There are other vectorization methods to be used in this setting (3.1), but to keep the discussion focused; we only used these two topological fingerprints.For details of Betti functions.
We employed the Random Forest and XGBoost algorithms in the machine-learning portion of our model.All the 200 features can currently be used as input for these methods.However, initially, with PCA, we tried to perform dimensionality reduction, but our models did not perform well.We used feature significance values generated by the models to make feature selection.One can perform this process manually by looking into the Betti confidence bands.However, to fully automate the process, feature selection from the model is used to choose the most important characteristics and remove collinearity between features, which increases the performance of the ML model.The next step is to apply ML tools to these topological fingerprints.For each dataset, we applied different ML methods like Random Forest and XGBoost to these topological fingerprints of the chest X-ray images in the dataset for the classification problem (Benign/Malignant classes).The feature selection helps a lot in fine-tuning the models as it reduces the computation time.In Tables 2, 3, and 4, we give the performance of several variations of our methods obtained by different ML models on our benchmark datasets.

Explainability and Interpretability of Topological Fingerprints
As mentioned in the introduction, one of the main advantages of our model is explainability and interpretability.In Figure 4 and 5, we illustrate the topological patterns created by each class in Breast Mammogram and Ultrasound images.In these figures, we give median curves and 40% confidence bands of each class for the corresponding dataset.To obtain our median curves and confidence bands we used a common method called nonparametric confidence band for the median [15].In Figure 4 and 5, we observe that our topological feature vectors (Betti-0 and Betti-1 vectors) successfully distinguish different classes in Mammogram and Ultrasound images.As mentioned before while other vectorization methods for PH are hard to interpret, we use Betti functions as vectorization methods to keep the model interpretable.Recall that in grayscale, the value 0 represents black, and 255 represents white.In our experiments, we renormalized [0, 255] interval to [0, 100], hence we get 100-dimensional Betti-0 and Betti-1 vectors.In the figures, -axis represent the grayscale value in [0, 100] (0 is black, 100 is white).For Betti-0 curves, the -axis represents the count of components, and for Betti-1 curves, -axis represents the count of holes/loops.Hence, for an ultrasound image X, for grayscale value  ∈ [0, 100],  0 () represent the number of components in the binary image X  (See Figure 2).For example, in Figure 5-left, we observe that median curves and confidence bands for two classes are very different around  = 10 and  = 40, e.g. 0 (40) ∼ 75 for benign and  0 (40) ∼ 125 for malignant.This interprets benign ultrasound images at  = 40 (unnormalized grayscale value 100) have much more components than the benign class.In other words, benign binary images X 40 are more spread out (disconnected) than malignant ones.Similarly, in Figure 4-right, we have Betti-1 curves for a mammogram (CC view) images.For benign class  1 (50) ∼ 350 while for malignant class  1 (50) ∼ 200.This means benign mammogram images have much more holes (loops) than the malignant ones at  = 50 (unnormalized grayscale value 125), i.e., benign binary images X 50 have more holes than the malignant ones (Figure 2).
From ML perspective, these figures prove how strong our feature vectors are.For any image, we get 100-dimensional Betti-0, and Betti-1 vectors.The thinness/thickness of the confidence bands represents how well the given class forms a cluster in R 100 .In our case, one can consider that the median curves represent the center of the cluster for each class, and the feature vector of each image   in that class lands somewhere nearby.The separation of median curves and confidence bands represents the distance between the clusters in the latent (feature) space.

EXPERIMENTS 5.1 Datasets
BUSI Dataset was compiled in 2018 from Baheya Hospital for Early Detection and Treatment of Women's Cancer in Cairo, Egypt is publicly available [3].As one of the few publicly available datasets for breast ultrasounds, the dataset serves as benchmark dataset for different ML models in the domain (Table 4).CBIS-DDSM is a revised and standardized edition of DDSM dataset [21].It involves a subset of the DDSM data selected and curated by a trained mammographer, revised mass segmentation and bounding boxes, and pathologic diagnosis for training data, configured similarly to modern computer vision data sets.It serves as a benchmark dataset for ML models for mammograms (Table 2).The statistical details of the datasets are given in Table 1.

Experimental Setup
We give the details of our datasets in 1.Note that the majority of datasets do not have a predefined train-test split.This is why many models used their train:test split, as it can be seen from our accuracy tables (Tables 2,3 and 4).We used 5 and 10-fold cross-validation in this study.To ensure a fair comparison, the basic details of each method are provided in the accuracy tables.The study did not use data augmentation as the topological features extracted are invariant under rotations or flips and perform well on small and unbalanced datasets.
We have used different configurations of classes for classification.For the ultrasound image dataset, we have done 3-class, as well as binary classification.We have used only binary classification (benign vs. malignant) for the Mammogram dataset as they provided only two classes.
We first obtained each image's topological fingerprints (TFs), generated 200 features, and used feature selection to select the 50 most important features for the model.This is achieved using importance weights from the model trained with the default configuration.We selected 50 features with the highest importance values.We used select_from_model method from the python-sklearn package [25] to obtain these features.Then, we obtained our accuracy results by applying ML models (RF, KNN, XGBoost, QDA) and fine-tuning these models to these TFs.In the tables below, we only reported our best results.In Table 5, we give the performances of all variations of our models with different combinations of topological fingerprints and ML models.
In this expriment, for 500 tiles, it takes < 8 minutes for 200 feature extraction on the system with Intel Core i5-9600K and 16GB Memory (No GPU).We used Giotto-TDA [37] to obtain persistence diagrams and Betti functions.The code is given at https://anonymous.4open.science/r/TOPO-BRCA-F415.

Results
We give the comparison of the performance of our Topo-BRCA model with state-of-the-art methods in Tables 2, 3,4.In Table 2, we observe that our topological approach gives highly accurate results for mammogram images by outperforming most of the SOTA DL models.Considering that we use no data augmentation or preprocessing, this high performance in such a small dataset show the robustness of our model.It also indicates that topological feature vectors can be very helpful to obtain a reliable clinical-decision support method for mammogram images.On the other hand, our experiments show that topological approach does not work very well in ultrasound images (Table 3 and 4).One of the possible reasons for this performance is the mixed resolution images in BUSI dataset [3].The values of our Betti functions depends on the resolution, and mixed size images can produce highly different Betti functions for the same class.

Ablation Study.
In our ablation study, we study the relative performance of our topological feature vectors (Betti functions) in different dimensions (Table 5).Our results indicate that combining all dimensions improves the results in general, and

CONCLUSION
Breast cancer is the most prevalent type of cancer in women next to lung cancer, and early detection significantly increases the survival rate.Screening for breast cancer using mammograms and ultrasound images is an essential but time-consuming and expensive process that requires a trained clinician's interpretation.In this work, we studied a novel approach to this problem by applying the latest topological data analysis tools.Our computationally efficient model has yielded highly competitive results when compared to the latest deep learning models in mammogram screening.It's noteworthy, however, that similar methods don't yield the same success when applied to ultrasound images.Considering that mammograms remain the primary mode of breast cancer screening, we anticipate that our topological features will emerge as invaluable resources in the years of research and study ahead.Furthermore, when fused with state-of-the-art deep learning models, our distinctive topological feature vectors have the potential to play a pivotal role in the development of exceptionally precise and robust topological deep learning models, thus addressing this crucial necessity more effectively.Moving forward, our forthcoming studies will focus on this specific direction.

Figure 1 :
Figure 1: In the left, we have original grayscale mammogram image X.Next three images X 0 , X 50 and X 100 are binary images in the filtration where in X  , all pixels with color value ≤  are activated (black) where all others are not activated (white). 0 (X  ) and  1 (X  ) represent the number of components and loops in X  respectively.

Figure 2 :
Figure 2: Sublevel filtration.For an image X of 5 × 5 size with the given pixel values, the sublevel filtration is the sequence of binary images X 1 ⊂ • • • ⊂ X 5 .

Figure 3 :
Figure 3: Flowchart of our Topo-BRCA model: For any input image.we get their persistence diagrams by using these pixel values and obtain their topological features (Betti curves).Along with other features, we employ standard ML tools (RF, XGBoost), which provide highly accurate results for breast cancer diagnosis.

Figure 5 :
Figure 5: Breast Ultrasound: In the figures above, we give the median curves and 40% confidence bands of our topological feature vectors (Betti functions) for each class in Breast Ultrasound datasets.-axis represents grayscale values (renormalized from [0,255] to [0,100]) and -axis represents count of components (Betti-0) or count of loops (Betti-1).

Table 1 :
Summary Statistics of Benchmark datasets for Breast Cancer images

Table 2 :
Accuracy results for tumor diagnosis from mammogram images on CBIS-DDSM dataset.Note that train-test splits are different.

Table 3 :
Accuracy results for tumor diagnosis from ultrasound images on BUSI dataset.Note that train-test splits are different.

Table 4 :
Accuracy results for tumor diagnosis from ultrasound images on BUSI dataset for three classes.Note that train-test splits are different.

Table 5 :
Comparison of the performances of different topological feature vectors for BUSI dataset.Accuracy results are given in %.