A Comparative ResNet-50, InceptionV3 and EfficientNetB3 with Retinal Disease

Globally, a staggering 2.2 billion individuals suffer from either near or distance vision impairment, of which at least 1 billion cases could have been prevented or remain unaddressed. The predominant causes of such vision impairment at a global scale are refractive errors and cataracts. It is noteworthy that only 36% of individuals with distance vision impairment due to refractive errors and a mere 17% with vision impairment due to cataracts have gained access to suitable interventions. The escalating prevalence of eye diseases underscores the necessity for advanced AI algorithms capable of analyzing intricate image data beyond the reach of conventional methods. For example, the complex nature of retinal imagery necessitates the use of Convolutional Neural Networks (CNNs), which rely on convolution operations for feature extraction. Networks such as ResNet-50, InceptionV3, and EfficientNetB3 exemplify such architectures. In this study, EfficientNetB3, with a 30-epoch training cycle, demonstrated superior performance in retinal disease classification, achieving training and testing accuracies of 99.9% and 96%, respectively. While these findings attest to the potential of using CNNs, particularly EfficientNetB3, for diagnosing retinal diseases, additional research is essential for validating these results across diverse populations and healthcare environments. The integration of these models into telemedicine platforms holds promise for alleviating healthcare system burdens and enhancing patient outcomes.


INTRODUCTION
The global landscape of retinal diseases is undergoing a significant transformation, primarily driven by an aging population [1].Out of an estimated 2.2 billion individuals with near or distance vision impairment, approximately 1 billion experience conditions that are either preventable or yet to be corrected.These conditions encompass a broad spectrum of eye disorders, including but not limited to, refractive errors (88.4 million), cataracts (94 million), glaucoma (7.7 million), corneal opacity (4.2 million), diabetic retinopathy (3.9 million), and trachoma (2 million), as well as presbyopia (826 million) leading to near vision impairment [2].Particularly, cataracts and glaucoma present substantial challenges as they can lead to irreversible blindness, among other common ocular diseases [3]- [4].
The ongoing COVID-19 pandemic has compounded these challenges by making medical consultations increasingly difficult, thereby escalating the urgency for efficient diagnostic solutions [5].This sets the stage for leveraging deep learning techniques aimed at streamlining diagnostic processes and reducing patient-physician interaction time [6].
Classification of retinal diseases is fundamentally based on unique characteristics inherent to each condition.For example, cataracts often manifest as blurred vision or night blindness, attributable to protein aggregation in the lens, appearing as white, yellow, or brown discolorations.Glaucoma affects the optic nerve and is generally caused by increased aqueous humor or obstruction in the trabecular meshwork, resulting in elevated intraocular pressure (IOP) that may damage the optic nerve.This condition manifests through symptoms such as eye pain, nausea, vomiting, or severe headaches.Recent studies have explored the classification of diabetic retinopathy based on its severity, mainly using EfficientNet-B3 and other pre-trained models like ResNet50, In-ceptionV3, EfficientNetB7, and VGG16 for high-resolution image classification [7]- [8].
In the context of this evolving landscape, our study focuses on the application of three prominent transfer learning models ResNet-50, InceptionV3, and EfficientNetB3 to diagnose and categorize retinal diseases such as cataracts, glaucoma, and diabetic retinopathy.Utilizing these advanced algorithms not only accelerates the diagnostic process but also enhances the precision of disease classification.Consequently, this research addresses the dual challenges posed by the increasing prevalence of vision impairment and the concomitant difficulty in accessing timely healthcare.

RELATED WORKS
Image classification techniques have found applications in diverse domains, ranging from quality control in manufacturing to inventory assessment in logistics, among other tasks.Algorithms renowned for their high accuracy in image classification include Convolutional Neural Networks (CNN), ResNet50, InceptionV3, and EfficientNetB7.Fine-tuning these pre-trained models for specific tasks can further optimize performance.

Eye Disease for the Elderly
Eye diseases affecting the elderly can generally be categorized into three principal types: cataracts, glaucoma, and additional retinal diseases.2.1.2Glaucoma.This disease is primarily caused by elevated intraocular pressure, resulting in optic nerve damage.Classified as a group of conditions, glaucoma is characterized by optic nerve deterioration and is a leading cause of irreversible blindness globally.

Other Retinal Diseases:
Vitreous Floaters: Originating from the vitreous a clear, gel-like substance filling the posterior chamber of the eye these floaters occur when the vitreous degenerates.This degeneration leads to liquefaction of some regions and clumping in others, potentially detaching from the retina's surface.Consequently, individuals may perceive dark shadows or shapes that float around in their visual field, often resembling small dots, lines, or threads.These floaters move in sync with eye movements and may manifest as brief, flash-like illuminations.
Age-Related Macular Degeneration (AMD): This condition involves the degeneration of the macula, the central part of the retina responsible for visual processing.Predominantly affecting individuals aged 60 and above, AMD can advance to significant vision loss.Symptoms include blurry or distorted vision, wavy lines, haziness, and the presence of dark spots in the central visual field.
Diabetic Retinopathy: Common among diabetic patients, this retinal disease occurs due to elevated blood sugar levels that cause damage to blood vessels and nerves in the retina.This, in turn, deteriorates the retinal layers, as depicted in Figure 1.

Deep Learning
Deep learning originates from the idea of stacking multiple layers in a neural network.It begins with having more than 2 hidden layers, as these layers are organized in a stacked manner, creating a structure that becomes progressively deeper [9].
The most popular algorithm for Image Classification is the Convolutional Neural Network (CNN), which is a type of Deep Neural

ResNet-50.
ResNest50 is a deep neural network based on the convolutional neural network architecture, which was developed in 2015.The ResNet architecture was created to tackle the issue of vanishing gradients in deep neural networks.This problem arises when gradients become too small during backpropagation, impeding effective learning by the network.The architecture of ResNet50 consists of about 50 layers which is an amalgamation of convolutional layers, pooling layers as well as fully connected layers.The model was pre-trained on the ImageNet dataset, which encompasses a massive amount of about 1.2 million images, which were classified into 1000 categories.The ResNet-50 architecture can be trained using the backpropagation algorithm and stochastic gradient descent to minimize a loss function, such as cross-entropy, between the predicted output and the ground truth labels.This expansive set of training data makes this the most widely used CNN model in computer vision applications and hence was used to develop the agricultural technology [8], [11], [12].

InceptionV3
. The InceptionV3 is designed to be computationally efficient while maintaining high accuracy.It consists of multiple parallel branches, each with its own set of convolutional filters of different sizes.The architecture of this pre-trained model is highlighted by the use of inception models which perform a variety of operations on the input data, with the use of batch normalization to achieve high accuracy in terms of image classification.In addition, the Inception-v3 architecture also uses the concept of "bottleneck layers", to reduce the dimensionality of the feature maps before applying the larger convolutional filters.This helps to reduce the computational cost of the network.The final layers of the Inception-v3 architecture consist of fully connected layers with softmax activation for classification [8], [11], [12].

EfficientNetB3
. EfficientNet-B3 is using a compound scaling method to automatically scale up the model's architecture, in terms of depth (number of layers) and width (number of filters per layer), based on the input image resolution.This model allows us to use more resources when processing large and small images and perform better results and efficiency.The EfficientNet-B3 is the third model in the EfficientNet family, and it has the same depth as the EfficientNet-B0 but with 1.3 times the width and 1.2 times the resolution.The model can capture more fine-grained details in the input images due to having more filters per layer and a higher spatial resolution.Thus, the model has more filters per layer and a higher spatial resolution of the feature maps.The model consists of stacked convolutional layers interspersed with batch normalization layers and ReLU activation functions.Feature maps are also convolutional using depth wise separable convolutions, reducing the number of parameters by performing one convolution per channel.Using this approach, the model can capture more complex patterns in input data while maintaining a relatively small parameter set.With the EfficientNet-B3, less computation is used to achieve better accuracy instead of the exact calculation for worse accuracy.Compared to the B1, the B3 uses depth wise separable convolutions, which make the model more efficient and result in smaller model size, quicker inference, and lower memory usage.EfficientNet-B3 is a cost-efficient, robust model.In this architecture, 26 convolution blocks are followed by a convolution layer with batch normalization activation.An MB-Convolution is an inverted residual block (a convolution layer, then a depth-wise convolution, and then a convolution layer with skip connections at the beginning and end).After that, the dimensions' latent spaces are minimized through a global average pooling [13]- [14].

METHODOLOGY
The dataset in this study contained 3,000 retinal images with 3 classes, Cataract, Glaucoma, and other disease.With some techniques in data pre-processing.Chosen for classification was ResNet50, InceptionV3, and EfficientNetB3, which was pretrained on the ImageNet.Figure 2 shows the model architecture after modification.
The results models are learning the number of times with hyper parameter tuning from train data 70 and test data 30 percent for better models and evaluated with confusion matrix which has precision, recall, F1-score, and accuracy to select the best-performing model.

Data Collection and Preparation
3.1.1Dataset.The primary dataset utilized in this study is from https://www.kaggle.com/datasets/gunavenkatdoddi/eyediseases-classification that authors use 3 classes of it all total images have 3,000 images separate to 1,000 in each classes.

Image Preprocessing.
To ensure consistency in input data, every image was resized to a standard resolution of 200×200 pixels.To evaluate and track the efficacy of the model's learning, two primary metrics were utilized: accuracy and loss.Accuracy defines the ratio of correct predictions to the total number of predictions made.On the other hand, loss quantifies the difference between the model's predictions and the ground-truth labels.Figure 3 shows the accuracy and loss of EfficientNetB3 being trained for 30 epochs.

Predictive Assessment.
Upon training completion, the model was subjected to an evaluative test using novel images representing three eye diseases: Glaucoma, Cataract, and a generic disease.Each successful prediction is record.A detailed classification report was generated, shedding light on the model's precision, recall, F1-score, and accuracy for each disease type.
This allowed for a visualization of the model's accuracy across different disease classes.
Based on the empirical results, the EfficientNetB3 model with 30 epochs demonstrated superior classification accuracy on the retinal disease datasets, achieving 99.9% accuracy in training and 96% in testing.While the Inception model yielded a higher test accuracy of 97.33%, multiple rounds of testing revealed that its accuracy exhibited a marginally higher variance compared to other algorithms.

CONCLUSION
This study conclusively demonstrates the effectiveness of the Ef-ficientNet architecture for accurate detection and categorization of retinal diseases, as evidenced by the confusion matrix metrics.There is ample scope for future research to expand the application of these deep learning architectures to the diagnosis of other common eye conditions.The incorporation of such models into telemedicine platforms offers a promising avenue for alleviating healthcare system burdens and enhancing patient outcomes, especially in the context of the ongoing COVID-19 pandemic.
Our findings emphasize the increasing need for sophisticated AI algorithms capable of analyzing intricate retinal images, given the escalating global incidence of vision impairment.Utilizing Convolutional Neural Networks, specifically the EfficientNetB3 model, we attained remarkable classification accuracies of 99.9% in training and 96% in testing phases.These results substantiate the capability of such neural networks in diagnosing retinal diseases and accentuate the necessity for additional research to extend these conclusions to diverse healthcare environments.
Furthermore, the development of explainable algorithms enriches the transparency and interpretability of these AI-based diagnostic tools, thereby augmenting their acceptance and trustworthiness in clinical settings.

2. 1 . 1
Cataracts.Characterized by the clouding of the eye's natural lens, cataracts manifest as a white or cloudy appearance.This opacification compromises normal vision, leading to blurriness or obfuscation.If untreated, or if complications affect surrounding eye structures, it could result in vision loss.Various etiological factors contribute to the development of cataracts, including agerelated changes, excessive exposure to sunlight, diabetes, specific medication use, eye injuries, and the degradation of lens proteins [3]-[4].

Figure 1 :
Figure 1: Cataract, Glaucoma and Other Retinal Diseases Characteristics

Figure 2 :
Figure 2: Model Architecture after Modification

3. 2
Model Architecture and Configuration 3.2.1 Base Model Selection.The foundational architecture chosen for classification was ResNet50, InceptionV3, and EfficientNetB3, which was pretrained on the ImageNet database.

3. 2 . 2
Custom Modifications and Training.To fine-tune the model for the specific task.The top layers of each transfer learning structure were excluded.Subsequent layers, such as Global Max Pooling, Batch Normalization, Dropout, and Dense layers, were appended to adapt the model for the task.Training sessions were conducted over 30 epochs.During each epoch, the model processed data in batches of 16 images.

Figure 3 :
Figure 3: The accuracy and loss of EfficientNetB3 30 epochs

3. 3 . 3
Model Insight Tools.For a comprehensive understanding of the model's predictions: Confusion matrices were constructed for the training and validation datasets.