A Systematic Collection of Medical Image Datasets for Deep Learning

The astounding success made by artificial intelligence in healthcare and other fields proves that it can achieve human-like performance. However, success always comes with challenges. Deep learning algorithms are data dependent and require large datasets for training. Many junior researchers face a lack of data for a variety of reasons. Medical image acquisition, annotation, and analysis are costly, and their usage is constrained by ethical restrictions. They also require several other resources, such as professional equipment and expertise. That makes it difficult for novice and non-medical researchers to have access to medical data. Thus, as comprehensively as possible, this article provides a collection of medical image datasets with their associated challenges for deep learning research. We have collected the information of approximately 300 datasets and challenges mainly reported between 2007 and 2020 and categorized them into four categories: head and neck, chest and abdomen, pathology and blood, and others. The purpose of our work is to provide a list, as up-to-date and complete as possible, that can be used as a reference to easily find the datasets for medical image analysis and the information related to these datasets.


Introduction
Since the invention of medical imaging technology, the field of medicine had entered a new era.The beginning of medical imaging started with the adoption of X-Rays.With further technical advancements, many other imaging methods, including 3D computed tomography (CT), magnetic resonance imaging (MRI), nuclear medicine, ultrasound, endoscopy, and optical coherence tomography (OCT), were also exploited.Directly or indirectly, these imaging modalities have contributed to the diagnosis and treatment of various diseases, and the research related to the human body's structure and intrinsic mechanisms.
Medical images can provide critical insight into the diagnosis and treatment of many diseases.The human body's different reactions to imaging modalities are used to produce scans of the body.Reflection and transmission are commonly used in medical imaging because the reflection or transmission ratio of different body tissues and substances are different.Some other methods acquire images by changing the energy transferred to the body, e. g., magnetic field changes or the rays radiated from a chemical agent.
Before modern AI was applied in medical image analysis, radiologists and pathologists needed to manually look for the critical "biomarkers" in the patient's scans.These "biomarkers", such as tumors and nodules, are the basis for the medics to diagnose and devise treatment plans.Such a diagnostic process needs to be performed by medics with extensive medical knowledge and clinical experience.However, problems such as diagnostic bias and the lack of medical resources are prevalent and cannot be avoided.After the recent breakthroughs in AI (which achieve human-like performance, e. g., for image recognition [1,2,3], and can win games such as Go [4] and real-time strategy games [5]), the development of AI-based automatic medical image analysis algorithms has attracted lots of attention.Recently, the application of AI in medical image analysis has become one of the major research focuses and has attained many significant achievements [6,7,8].
Many researchers brought their focus to AI-based medical image analysis methods thinking that it might be one of the solutions to the challenges (e.g., medical resource scarcity) and taking advantage of the technological progress [9,10,11,12,13].Traditional medical image analysis focuses on detecting and identifying biomarkers for diagnosis and treatment.AI imitates the medic's diagnosis through classification, segmentation, detection, regression, and other AI tasks in an automated or semi-automated way.
AI has achieved a significant performance for many computer vision tasks.This success is yet to be translated to the medical image analysis domain.Deep learning (DL), a branch of AI, is a data-dependent method as it needs massive training data.However, when DL is applied to medical image analysis, the paucity of labeled data becomes a major challenge and a bottleneck.
Data scarcity is a common problem when applying DL methods to a specific domain, and this problem becomes more severe in the case of medical image analysis.Researchers, who apply DL methods to medical image analysis research, do not usually have a medical background, commonly computer scientists.They cannot collect data independently because of the lack of access to medical equipment and patients, and they cannot annotate the acquired data either because they lack the relevant medical knowledge.Furthermore, medical data is owned by institutions who cannot easily make it public due to privacy and ethics restrictions.When researchers evaluate their algorithms on their private data, the results of their research become incomparable.
To address some of these problems, MICCAI, ISBI, AAPM, and other conferences and institutions have launched many DL-related medical image analysis chal-lenges.These aim to design and develop automatic or semi-automatic algorithms and promote medical image analysis research with computer-aided methods.Concurrently, some researchers and institutions also organize projects to collect medical datasets and publish them for research purposes.
Despite all these developments, it is still challenging for novice medical image analysis researchers to find medical data.This paper addresses this challenge and presents a comprehensive survey of existing medical datasets.The paper also identifies and summarizes medical image analysis challenges.It also provides a pathway to identify the most relevant datasets for evaluation and the suitable methods they need in the respective challenge leader board.
This paper refers to other research papers with a number between square brackets and refers to the datasets listed in the tables with numbers between parentheses.
The following sections present the details of the key datasets and challenges.Section 2 summarizes the datasets and challenges, including the years, body parts, tasks, and other information that is relevant to the dataset development.Section 3 discusses the datasets and challenges of the head and neck.Section 4 covers the datasets and challenges related to the chest and abdomen organs.Section 5 examines the datasets and challenges of pathology and blood related tasks.Section 6 introduces other datasets and challenges related to bone, skin, phantom, and animals.We have also created a website with a git repo 1 , which shows the list of these datasets and their respective challenges.

Medical image datasets
In this section, we provide an overview of the image datasets and challenges.Our collection contains over three hundred medical image datasets and challenges organized between 2004 and 2020.This paper focuses mainly on the ones between 2013 and 2020.Subsections 2.1, 2.2, 2.3, and 2.4 provide information about the year, body parts, modalities, and tasks, respectively.In Subsection 2.5, we introduce the sources from where we have collected these datasets and the challenges.Details about the categorization of these image datasets and challenges into four groups are provided in the subsequent sections.We provide a taxonomy of our paper in Figure 1 to help the reader navigate through the different sections.

Medical Image Datasets
Fig. 1: An overall taxonomy to outline the organization of the paper.

Years
The timeline of these medical image datasets can be split into two, starting from 2013 as the watershed, since Krizhevsky et al.'s excellent success in the ILSVRC competition with their AlexNet [14] in 2012.
The continuous advancement of deep learning has, to some extent, driven more and more researchers to focus on medical image analysis and indirectly led to an increase in the number of datasets and competitions each year.The number of datasets and challenges before 2013 are irregular according to our statistics.The main reason is that many datasets developed before 2012 are not aimed at computer-aided diagnosis, for example, ADNI-1 (52), although those data could be used for DL.Therefore, we only focus on the datasets and challenges which were released after 2013.
Figure 2A shows the statistics of the datasets and challenges per year between 2013 and 2020.As shown in Figure 2A, the number of related datasets and challenges increased year by year because of the progress and success of DL in computer vision and medical image analysis.That led more and more researchers to focus on medical image analysis with DL-based methods and more and more datasets and challenges with different body parts and tasks started to appear.
As shown in Figures 2B and 2C, there was not only an increase in the number of datasets and challenges but also in their variety with respect to the body parts and types of tasks.The research focus ranges from a simple diagnosis or structural analysis (e. g., segmentation and classification) in the early stages to more complex tasks or combinations of tasks that are closer to the clinical needs, including classification, segmentation, detection, regression, generation, tracking, and registration, as time progresses.The focus of these datasets and challenges has also changed from cancer diagnosis to the entire healthcare system.Meanwhile, the organs focused on by researchers also range from the single and simple, but important ones, such as the brain and lungs, to many different other parts of the human body accounting for different sizes, shapes, and other characteristics.

Body parts
With the success of DL, the number of focused body parts has increased, as shown in Figure 2B.We also show the most targeted researched body parts in Figure 2E, and the top-5 researched organs include the brain, lung, heart, eye, and liver.These organs have been the focus of research because they are the most important parts of the human body.Fig. 2: Summary of medical image datasets and challenges from 2013 to 2020. Figure 2A shows the number of datasets and challenges published in each year.Figures 2 B, C, and D show the year-by-year trends along with the trends in relative numbers for each of the different categories by year.The numbers listed right are the summary count of each category, and the summary counts are not the same with the total numbers, because of 1) some of the categories are not shown, and 2) a dataset counts two times if it includes two of the categories.Figures 2 E,  F, and G show the most predominant body parts, data modalities, and main tasks with the percentage of their respective dataset.
In the beginning, the main reason which motivated researchers to focus on these organs and parts was that a simple diagnosis and a structural study greatly helped in the diagnosis and treatment of cancer (a major threat to human life).Many datasets focus on brain, lung and other organs, without considering DL, and many challenges focus on simple tasks, such as segmentation and classification.Subsequently, AI showed to be more competent to tackle complex tasks, and therefore researchers started to focus on several other organs.For example, eye related diseases, which cause blindness, incited the collection of eye related datasets and the release of challenges.Some other datasets and challenges focus on the small organs, such as the prostate, which are challenging to analyze due to the low resolution of images.

Modalities
There are several types of medical image modalities.As shown in Figure 2F, the frequently used modalities to acquire medical datasets include Magnetic Resonance Images (MRI), Computed Tomography (CT), Ultrasound (US), Endoscopy, Positron Emission Tomography (PET), Computed Radiography (CR), Electrocardiography, and Optical Coherence Tomography (OCT).We introduce below these main modalities and provide a summary at the end of this subsection.

Radiography:
Radiography is an imaging technique based on the difference of attenuation when X-rays passes through the different organs and tissues of the human body.The primary used modalities include CR and CT.CR is a 2D image, and CT is a volume (3D) image.Radiography is the most commonly used method to image the human body.For example, CR is frequently used to diagnose chest related diseases, such as pneumonia, tuberculosis, and COVID-19.Meanwhile, 3D CT plays an important role in the diagnosis and treatment related to cancer and lesions.The advantages of radiography are 1) high resolution of the hard tissues (e. g. bones), 2) lower cost, and 3) compatibility of contrast agents, but the disadvantages are 1) X-rays are harmful for human health, 2) X-rays are ideal for distinguishing between healthy tissues and tumors without the help of contrast agents, and 3) their resolution is limited by the radiation intensity.Moreover, as the main component of the human bone is calcium, CT plays an important role in many bone related diagnoses.
Magnetic resonance: MR images display the body structure caused by the difference of signal released by the different substances of the imaged organ as the magnetic field is changed.MR has many submodalities, such as T1 and T2.For essential organs and tissues, MR is a commonly used imaging method because it is considered non-invasive, effective, and safe.Due to the principle of MR imaging, MR plays an essential role in the diagnosis of brain, heart, and soft tissues.Because higher resolution MR images can be obtained by increasing the magnetic field strength, MR is also suitable for small organs or tissues.However, MR images do have disadvantages such as high cost and incompatibility with metal (e. g., metallic orthopedic implants).
Nuclear medicine: Nuclear medicine captures images by the absorption of the targeted tissue of specific chemical components marked by radioactive isotopes.Tumors and healthy tissues absorb different chemical components, so medics use the specific chemical marked with the radioactive isotope and receive the ray radiated by the chemical.An example is Positron Emission Computed Tomography, i. e., PET, which performs imaging by capturing radiations produced by fluorodeoxyglucose or other similar contrast agents absorbed by the tissue or tumor.Nuclear medicine is good at imaging regions of interest, such as tumors, but the disadvantage is their high cost and the low-resolution.
Ultrasound: Ultrasound operates by acquiring the differences in the absorption and reflection of ultrasound waves when applied to tissues.It is widely used in imaging the heart and fetus because ultrasound causes no damage to these parts and provides real-time imaging.Nevertheless, the main disadvantage is the noise caused by the reflection of irregular shapes of organs and tissues, and the interference with their imaging.
Eye-related modalities: An OCT image is obtained by using low-coherence light to capture 2D and 3D micrometer-resolution images within optical scattering media to diagnose eye-related diseases.The fundus photo is also used for diagnosis purposes.These two methods are non-invasive eye-specific imaging modalities.
Pathology: Pathological data is the gold standard in diagnosing diseases.It is taken with microscopy of the stained tissue slides by the camera to show cell-level features.Pathology is used in the cell-level diagnosis for cancer and tumors.
Other modalities: Other imaging modalities are usual but specific to certain body parts, such as endoscopy, and provide the medics with various biomarkers to make critical decisions when diagnosing, curing, and researching.
Overall, MR, CT, and other modalities are the most commonly used imaging modalities.MR can provide sharp images without harmful radiations of soft tissues.It is therefore widely used in the imaging of brain, heart and many other small organs.CT is an economical and simple imaging approach, and it is widely used for the diagnosis of cancer, e. g., the neck, chest, and abdomen.A pathology image is different from MR and CT, because it is a cell-level imaging method.Pathology is widely used in cancer-related diagnosis.

Tasks
According to our analysis, our collected datasets and challenges have been used for the tasks of classification, prediction, detection, segmentation, location, characterization, up-sampling, tracking, registration, regression, estimation, coding, automatic annotation, and other tasks.As Figure 2G shows, we grouped these tasks into seven categories: classification, segmentation, detection, regression, generation, registration, and tracking.The following subsections briefly describe each task.
Classification: Classification is used for qualitative analysis.According to pre-defined specific rules, the classification task aims to group medical images or particular regions of an image into two or more distinct categories.The classification task can be used alone for medical image analysis or as a subsequent task after other lower level tasks, such as segmentation and detection, in order to analyze the results and further extract features.There are many ways to express the classification task, such as detection and prediction.The detection tasks (which are also sometimes termed as classification) are different from the ones introduced in the following paragraph, although sometimes the same word is used synonymously.The typical examples of classification tasks include AD prediction and the attributes classification of pulmonary nodules.AD prediction aims to group MR images in Alzheimer's disease (AD) and normal cognition (NC).The attributes classification of pulmonary nodules aims to analyze the pathology attributes of pulmonary nodules.Classification performance measures mainly include accuracy, precision, specificity, sensitivity, F-score, ROC, and AUC.All these measures are based on four basic measures: true positive (TP), false positive (FP), true negative (TN), and false negative (FP).

Segmentation:
The segmentation task can be regarded as a pixel-level or voxel-level classification task, but the difference is that the segmentation task is limited to the context.It aims to split an image into different areas or contour specific regions.The regions can contain tumor, tissue, or other specific targets.The results of the segmentation task consist of areas and boundaries.Since segmentation can be seen as a pixel-level classification, the average precision (AP) can be used as a metric.Other performance metrics include intersection over union (IoU), Dice index, Jaccard Index, Hausdorff distance, and average surface distance.

Detection:
The detection task aims to find an object of interest, and it also usually needs to classify such an object (classification task).In this work, we categorize the tasks which aim to determine the location of the object of interest with a bounding box or a point.The detection task is sometimes represented as a localization task.A typical example of detection is pulmonary nodules detection, which aims to find the pulmonary nodules in chest CT images and annotate the nodules with a bounding box.The performance measures used in the detection tasks include mainly the intersection over union (IoU), mean Average Precision (mAP), precision and recall, false positive rate, receiver operating characteristic curve (ROC), and other metrics.For the task to locate an object without the boundary, the Euclidean Distance is the most commonly used measure.
Regression: Classification is used for qualitative analysis, while regression is used for quantitative analysis.A typical example is the estimation of the volume of a lesion.For the regression task, the root mean square error, i. e., rMSE, mean absolute error, and correlation coefficient are the most commonly used metrics.
Tracking: The tracking task aims to locate specific targets, but the tracking is a dynamic process and is therefore different from the detection task.That means the tracking algorithms need to detect or localize targets in different frames.For medical image analysis, the tracking tasks include the tracking of organs and tissues.The tracking is not just of one point, but it can also be of an area, e. g., every part of an organ or tissue.An example is the tracking of the lung when the subject breathes.

Generation:
The image data generation task has many different aims, but for simplicity we categorize all of these aims under the "generation task" because they focus on generating image data from other image data.Typical generation tasks include 1) to generate a T2weight image from T1-weight images and 2) to generate a pathology image stained with one stain from an image stained by another stain.

Registration:
The image registration task aims to align one image with another image, i. e., to find a transformation (e. g., rotation and translation) to align the two images.Registration is a necessary process for computer-aided diagnosis algorithms from multimodalities.During medical scanning, the movement of the human body cannot be avoided and is a challenge.At the same time, imaging cannot be taken instantly.As a result, images from different viewpoints cannot be aligned directly or when two or more modalities are used.Therefore, researchers rely on registration techniques to solve these alignment problems.

Source and Term
We collected the datasets and challenges mainly from The Cancer Imaging Archive [15], Grand Challenge, Kaggle, OpenNeuro, PhysioNet [16], and Codalab.
The original records of datasets and challenges that we collected include four to five hundred, and we removed some of them as some datasets are not suitable for DL and AI methods.We then categorized the remaining datasets and challenges into different groups.Categorizing the datasets and challenges is not easy because all these datasets and challenges are derived from clinical research sources.Thus we used an asymmetric categorization to group these datasets and challenges into four groups, as shown in Figure 1.This means that we did not use the same sub-taxonomy in each category or sub-category.
First, we split the medical datasets and challenges into two groups: body-level and cell-level (Section 5), according to the imaged body part.The bodylevel datasets focus on specific tissues, while the celllevel ones focus on cells.Second, we grouped the datasets and challenges of the brain, eye, and neck into one group (Section 3), because these are parts of the head.Third, we organized the datasets and challenges related to the chest and abdomen into the same group (Section 4).These datasets and challenges relate to the diagnosis, anatomical segmentation, and treatments.Finally, for the datasets and challenges that cannot be categorized into the above groups, we grouped them under "other" (Section 6), and these datasets and challenges are related to the skin, bone, phantom, and animals.
The introduction of each group and sub-group includes mainly the type of modality, the task, the disease, and the body part.However, not all the groups of datasets can be introduced in that way.For some groups, we introduce the datasets and challenges according to the domain-specific problems.For example, we categorize the pathology datasets into microcosmic and macrocosmic tasks.

Head and neck related datasets and challenges
The head and neck are significant parts of the human body because many essential organs, glands, and tissues are located there.Several researchers' image analysis work relate to the head and neck.To make an effective use of computers for research, diagnosis, and treatment, many researchers have released datasets and challenges, for examples: 1) the analysis of tissue structure and functions (2,3,4,6) and 2) diseases diagnosis (30,39,47).
Because the brain controls emotions, actions and functions of other organs, the brain's area is significant.First, we introduce the datasets and challenges related to the analysis of the brain structure, function, imaging, and other basic tasks in Subsection 3.1.Second, we introduce the datasets and challenges related to brain disease diagnosis in Subsection 3.2.
Moreover, since the eyes are crucial to our vision, the computer-aided diagnosis of eye-related diseases is also an important research focus.The eye-related datasets and challenges are covered in Subsection 3.3.We introduce other datasets and challenges of the neck and the datasets related to the brain's behavior and cognition in Subsection 3.4.

Structural analysis tasks of the brain
The basic analysis and processing of the brain medical images are clinically critical for the diagnosis, treatment, and other brain-related analysis tasks.The datasets and challenges we discuss are mainly for the segmentation tasks and center around the brain structure.In contrast, some datasets focus on imaging, including MR imaging acceleration, the non-linear registration of different resolutions, and tissue reconstruction.One of the most popular tasks is the segmentation of white matter (WM), gray matter (G<), and cerebrospinal fluid (CSF), and their respective datasets and challenges are introduced in Subsection 3.1.1.Meanwhile, other tissues and functional areas' segmentation are also the focus of research, and their related datasets and challenges are discussed in Subsection 3.1.2.Subsection 3.1.3describes the other basic tasks.Table 1 shows the datasets and challenges of these basic tasks.

Segmentation of white and gray matter
The segmentation of WM, GM, and CSF has great significance for brain structure research and computeraided diagnosis, particularly using AI.Similarly, for AI algorithms, it is also of great significance to understand the human brain's structure.Therefore, MICCAI and others have held many challenges with this research focus, and researchers could design automatic algorithms to segment magnetic resonance images into different parts.We introduce these datasets and challenges with respect to their modalities and tasks.

Modality:
The datasets and challenges which focus on the WM, GM, and CSF segmentation, usually provide MR images.Challenges (2,3,4,5,6,7) provide mainly two modalities: T1, T2, while datasets (1, 8) only provide T1 for the white matter hyperintensities segmentation task.Note that, MR scans are sensitive to the hydrogen atom, and such a feature can effectively help image analysts to distinguish between different tissues and parts of the image.Moreover, due to the color of the tissue imaged by MR, these scans are named as "white matter" and "gray matter".

Task:
The main focus of these datasets and challenges is the segmentation of WM, GM, and CSF.However, they do not only focus on that.Challenges (1,4,5,6,7) also provide the annotation of other parts of the brain, including basal ganglia, white matter lesions, cerebellum, and infarction.One of the challenge for segmentation is the presence of a lesion because of the unnatural characterization of lesions.A well-annotated data

Ultrasound
can help AI to overcome this problem and also achieve more robust results.Challenges (5,7) use MR images of the neonatal brain, and consider tissue volumes as an indicator of long-term neurodevelopmental performance [22].
Performance metric: For the segmentation task, the Dice score is one of the most commonly used metrics, and all these datasets and challenges adopt it as a performance measure.Besides the Dice score, datasets (4, 6, 8) also use Hausdorff distance and volumetric similarity as metrics; datasets (2, 3) use the average the Hausdorff distance and the average surface distance as one of their metrics; moreover, dataset (8) also uses sensitivity and F1-score as metrics for performance evaluation.

Segmentation of functional areas & other tissues
The segmentation of functional areas and tissues has also an essential meaning for brain-related research and computer-aided diagnosis.In this subsection, we introduce the datasets and challenges that are related to the segmentation of functional areas and tissues.
Tissues segmentation: While, WM, GM, and CSF were introduced in Subsection 3.1.1,the segmentation of other brain tissues is also an active research area.Challenges (1,4,6,7) aim to segment brain images into different tissues, including ventricles, cerebellum, brainstem, and basal ganglia.These challenges provide MR images and the voxel-level annotations of the regions of interest with thirty or forty scans.Because these regions are essential for brain health, researchers need to overcome the challenges related to their size and shape in order to segment them.Dataset (9) focuses on the cerebellum segmentation from the diffusion-weighted image (DWI), while dataset (12) focuses on the segmentation of caudate from the brain MR image.

Functional areas:
The segmentation of the human brain cortex into different functional areas is of great significance in education, clinical research, treatment, and other applications.Datasets (10,11) provide images and annotations for the design of automatic algorithms to segment the brain cortex into different functional areas.Dataset (10) uses DTK protocol [24], which is modified from DK protocol [31], and the DTK protocol includes 31 labels, details of which are listed in https://mindboggle.readthedocs.io/en/latest/labels.html.Dataset ( 11) is a commercial dataset for research in the segmentation of functional areas of the brain cortex.

Imaging-related tasks
In addition to the segmentation tasks of the brain tissues and the functional areas, some of the datasets and challenges also focus on the generation, registration, and tractography.
Generation: Datasets and challenges (14,17,18) aim to accelerate MR imaging or generate high-resolution MR images from low-resolution ones.Usually, highresolution imaging requires higher cost, while lowresolution imaging is cheaper but affects the analytical judgment and may lead to an incorrect diagnosis.These challenges provide many scans at low-resolution to allow researchers to design algorithms to convert or map low-resolution images onto higher-resolution ones.The datasets and challenges mainly focus on the generation tasks.Another focus is the cranioplasty (13) to generate a part of broken skull from CT images of the models of the broken skull.Other datasets and challenges (15,19,20,21) focus on the reconstruction of MR images.

Registration:
The registration between different modalities is another research focus.Challenges (22,23) focus on the registration between ultrasound data and MR images of the brain.Cross-modality registration is difficult because the subject is not absolutely static.Moreover, the MR is a 3D volume imaging modality and hence is different from ultrasound, which is a 2D imaging modality.Thus, these challenges focus on establishing the topological relation between Preoperative MR image and intraoperative ultrasound.Challenge (19) also focuses on the diffusion MR image registration to eliminate differences between different vendors' hardware devices and protocols.
Tractography: Tractography is another segmentation task and focuses on the segmentation and imaging of the fiber in the WM.Dataset (24) aims to segment the fiber bundles from brain images, including phantom, squirrel monkey, and macaque, while challenges (25,26) focus on the tractography with DTI, another type of MR image.

Brain diseases related datasets and challenges
Besides the structural analysis and image processing tasks, computer-aided diagnosis is also a research focus in healthcare.Medical image analysis plays a critical role in clinical research, diagnosis, and treatment.The datasets and challenges we have included are for two tasks: 1) the segmentation of lesions and tumors and 2) the classification of diseases.For the segmentation task, the respective datasets and challenges focus on the tumor and lesion segmentation of the human brain, mark the lesion's contour for diagnosis and treatment, and the relevant details are shown in Subsection 3.2.1.For classification tasks, the datasets and challenges have been used for the development of automatic algorithms to classify or predict diseases from medical images, and these datasets and challenges are presented in Subsection 3.2.2.

Datasets for segmentation of tumors and lesions
Tumors and lesions in the brain affect human's healthy life and safety, and image analysis is an effective way to diagnose the relevant diseases.In this subsection, related datasets and challenges are introduced, and they are reported in Table 2.
Glioma datasets and challenges: Gliomas are one of the most common brain malignancies for adults.Therefore, many challenges and datasets focus on the segmentation of glioma for its diagnosis and treatment.BraTS challenge series (30,31,32,33,34,35,36,37,38) have been occurring since 2012 to segment the glioma.The challenges of such a segmentation task are caused by the heterogeneous appearance and shape of gliomas.The heterogeneity of glioma reflects its shape, modalities, and many different histological sub-regions, such as the peritumoral edema, the necrotic core, enhancing, and the non-enhancing tumor core.Therefore, these series of challenges provide multi-modal MR scans to help researchers design and train algorithms to segment tumors and their sub-regions.The tasks of this challenge series include 1) low-and high-grade glioma segmentation (37,38), 2) survival prediction from pre-operative images (32,33), and 3) the quantification of segmentation uncertainty (30,31).Besides the BraTS challenge series, dataset (47) is another one for the segmentation of low-grade glioma and provides T1-weight and T2-weight MR images with biopsy-proven gene status of each subject by fluorescence in-situ hybridization, a. k. a. FISH [46].Dataset (46) focuses on the processing of brain tumor and aims to design and evaluate DL-based automatic algorithms for glioblastoma segmentation and further research.

Ischemic stroke lesion datasets and challenges:
Similar to tumor segmentation, brain lesion segmentation also focuses on detecting brain abnormalities.However, the difference is that lesion segmentation deals with damaged tissues.Challenges (39,40,41,42,48) focus on stroke lesion segmentation because stroke is also life-threatening and can disable the surviving patients.
Stroke is often associated with high socioeconomic costs and disabilities.Automatic analysis algorithms help to diagnose and treat stroke, since its manifestation is triggered by local thrombosis, hemodynamic factors, or embolic causes.In MR images, the infarct core can be identified with diffusion MR images, while the penumbra (which can be treated) can be characterized by perfusion MR images.The challenge ISLES 2015 (42) focuses on sub-acute ischemic stroke lesion segmentation and acute stroke outcome/penumbra estimation and provides 50 and 60 multi-modalities MR scans of data for training and validation, respectively, for two subtasks, i. e., sub-acute ischemic stroke lesion segmentation and acute stroke outcome/penumbra estimation.The subsequent year's challenge, ISLES 2016 (41), focuses on the segmentation of lesions and the prediction of the degree of disability.This challenge provides about 70 scans, including clinical parameters and MR modalities, such as DWI, ADC, and perfusion maps.The challenge ISLES 2017 (40) focuses on the segmentation with acute MR images, and ISLES 2018 (39), focuses on the segmentation task based on acute CT perfusion data.Moreover, dataset (48) focuses on the segmentation of the brain after stroke for further treatments.
Intracranial hemorrhage related datasets: Intracranial hemorrhage is another type of medical condition that affects our health.Dataset and challenge (43,44) focus on the detection and segmentation of intracranial hemorrhage to help medics locate the hemorrhage regions and decide on a treatment plan.Dataset (45) also provides data for the classification of normal or hemorrhage CT images.

Multiple sclerosis lesion related datasets:
Multiple sclerosis lesion is another kind of lesion in the brain which is not life-threatening and deadly but can cause disabilities.Datasets and challenges (49,50,51) are about the multiple sclerosis lesion segmentation with multimodalities MR data (T1w, T2w, FLAIR, etc.).

Classification of brain disease
Except for the tumor and lesion segmentation, brain disease classification also plays an essential role in healthcare.Brain related diseases have a severe effect on patients' health and their lives, e. g., Alzheimer's disease (AD) [63,64,65,66] and Parkinson's disease (PD).Therefore, effective diagnosis and early intervention can effectively reduce the health damage to patients, the effect on the social times of families, and the economical impact on society.In this section, we first introduce the datasets and challenges of AD (52,53,54,55,56,62),  For BraTS 17 to 20, T2 modality includes T2 image and T2-FLAIR image. 3 For BraTS 12 to 16, T1 modality includes T1 image and T1c image.and then we introduce other diseases (63,64,65).Table 3 shows the relevant challenges and datasets.
Alzheimer's disease: AD affects a person's behavior, cognition, memory, and daily life activities.Such a progressive neurodegenerative disorder affects the normal daily life of patients because suffering from such a disease makes patients not know who they are and what they should do which then progresses to the point until they forget everything they know.The disease takes an unbearable toll on the patient and leads to a high cost to their loved ones and to the society.For example, according to [67], AD became the sixth deadly cause in the U.S. in 2018 and costs more than two to three hundred billion U.S. dollars.Therefore, researchers are doing everything they could to explore the causes of AD and its treatments.Diagnosis based on medical images has become a research focus because early diagnosis and intervention have significance on the progress of this disease.Hence many researchers work on the classification, i. e., prediction of AD using brain images.The datasets mainly include "Alzheimer's Disease Neuroimaging Initiative (ADNI)" and "Open Access Series of Imaging Studies (OASIS)".
The ADNI is a series of projects that aim to develop clinical, imaging, genetic, and biochemical biomarkers for the early detection and tracking of AD.It includes four stages: ADNI-1 (52), ADNI-GO (53), ADNI-2 (54), and ADNI-3 (55).These projects provide image data of the brain for researchers, and the modalities of images include MR (T1 and T2) and PET (FDG, PIB, Florbetapir, and AV-1451).These four stages consists of 1400 subjects.The subjects can be categorized into normal cognition (NC), mild cognitive impairment (MCI), and AD, where MCI can be split into early mild cognitive impairment (EMCI), later mild cognitive impairment (LMCI).
The OASIS is a series of projects aiming to provide neuroimaging data of the brain, which researchers can freely access.OASIS released three datasets, which are named OASIS-1, OASIS-2, and OASIS-3.All these three datasets are related to AD, but these datasets are also used in functional areas segmentation and other tasks.The OASIS-1 (56) contains 418 subjects aged from 18 to 96, and for the subjects older than 60, there are 100 subjects diagnosed with AD.The dataset includes 434 MR sessions.The OASIS-2 (57) contains 150 subjects, aged between 60 to 96, and each subject includes three or four MR sessions (T1).About 72 subjects were diagnosed as normal, while 51 subjects were diagnosed with AD.Besides, there are 14 subjects who were diagnosed as normal but were characterized as AD at a later visit.The OASIS-3 (58) includes more than 1000 subjects, more than 2000 MR sessions (T1w, T2w, FLAIR, etc.), and more than 1500 PET sessions (PIB, AV45, and FDG).The dataset includes 609 normal subjects and 489 AD subjects.
Moreover, there are many other challenges based on ADNI and OASIS or independence datasets.Challenge ( 60) is based on ADNI and aims at the prediction of the longitudinal evolution.Dataset ( 61) is based on OASIS and it is released on Kaggle for the classification of AD.Challenge ( 62) is an independent AD-related challenge to classify subjects into NC, MCI, and AD.
Other diseases: Similar to AD, other brain diseases are also important from the diagnosis and treatment perspective.However, the number of datasets and challenges of these diseases is not as large as AD.A few datasets focus on Parkinson's disease (PD) and spinocerebellar ataxia type II (SCA2).Datasets (63,64) provide images of PD with MR images and classification labels.Dataset (65) provides images and classification labels of spinocerebellar ataxia-II, i.e., SCA2.Dataset (66) provides images and annotations for the diagnosis of mild traumatic brain injury.

Eye related datasets and challenges
As the human's imaging sensor, the eyes' health is essential for human beings, and eye diseases may lead to blindness.We introduce the relevant challenges and datasets in this subsection and list them in Table 4.
Datasets according to the modality: With regards to the eye-related datasets and challenges, the main used modalities are the fundus photo (70,72,73,74,75,76,77,82,84) and OCT (71,78,79,81,83).The fundus photo can help medics evaluate the eye's health and locate the retinal lesions because the fundus photo clearly shows the important parts of the eye, such as the blood vessels and the optic disc.OCT is a new imaging approach that is safe for the eye and shows the retinal tissues' in details.However, it has also disadvantages -it is not suitable for diagnosing microangioma and the planning of retinal laser for the photocoagulation treatment.
Datasets according to the analysis task: These datasets and challenges can be used for four tasks.
2) Segmentation is another task, which provides more information compared to classification.Datasets and challenges (70,72,74,77,81,83) focus on the segmentation of the tissues and lesions for further diagnosis and disease analysis.
3) Datasets and challenges (70, 71, 74, 76, 77, 84) focus on the detection of lesions or other landmarks.These tasks help medics locate key targets, such as areas and tissues, for effective diagnosis or provide feature details for other automated algorithms.
4) Unlike other tasks, the last one focuses on the annotation of the tools used for eye-related surgery (80).-Hypertension (73) Besides these diseases, dataset (80) aims at the annotation of images.

Datasets and challenges of other Subjects
Besides the brain's structural analysis, the image processing, and the computer-aided diagnosis tasks, another important research focus is the human neck because it holds many essential glands and organs.This subsection discusses the datasets and challenges of the neck and teeth, covered in Subsection 3.4.1 and Subsection 3.4.2,respectively.Moreover, many researchers are working on the analysis of behavior and cognition with DL-based methods.We discuss the details in Subsection 3.4.3.

Neck related datasets
The neck is also essential for our health.The neck holds many glands and organs, and when these become abnormal, effective diagnosis and segmentation play an essential role in their treatments.The related image datasets and challenges are listed in Table 5.
Datasets and challenges (85,87,88,90,94,95) focus on the segmentation of glands and the lesions and tumors in relevant glands.Dataset (89) focuses on the binary classification tumor vs. normal.Challenge (86) aims at the task of thyroid gland nodules detection with ultrasound images and videos.Challenge (91) focuses on the nerves segmentation in the neck, while challenge (96) focuses on evaluating carotid bifurcation.

Cephalometric and teeth related datasets
Challenges (92,93) focus on the diagnosis of dental X-Ray images.The main tasks of these two challenges include landmark localization and caries segmentation.Challenge (92) provides around 400 cephalometric Xray images with the annotation of landmarks by two experienced dentists.Challenge (93) provides about 120 bitewing images with experts' annotations of different parts of the teeth.

Behavior and cognition datasets
To understand what we see, hear, smell, and feel, our brain draws on neurons in our brain to compute and analyze the stimulations and understand what, where, why, and when questions and scenarios are.Many researchers now use Artificial Neural Networks as a research method to analyze the relationship between brain activities and stimulation.They use functional MR images to scan our brain activity, analyze the hemodynamic feedback, and identify the area of the neurons which react.Therefore, the analysis of the reactions of the brain in response to a specific stimulation is an important research focus.Researchers use DL to detect or decode the stimulation of subjects to work out the brain's functionality.The related datasets are listed in Table 6.Some datasets (98,103,107) focus on classifying the stimulations or the subject's attribution based on the subject's functional MR images.Dataset (98) aims to identify whether the subject is a beginner or an expert in programming via the reaction of their brain to source codes.Dataset (103) focuses on diagnosing sub-jects with depression vs. subjects with no-depression using audio stimulations and analyzing the subjects' brain activity.Dataset (107) works on the influence of cannabis on the brain.
Datasets (97, 99, 101, 102, 104, 105, 106) focus on the encoding of the stimulations, i. e., brain activities' decoding.Datasets (101,104,105) aim to rebuild what subjects have seen using DL-based methods from their brain activities using functional MR images.On the other hand, datasets (99,106) work on the encoding of faces that subjects have seen from functional MR images with similar modalities.

Chest and abdomen related datasets and challenges
There are many vital organs in the chest and abdomen.For example, the heart is responsible for the blood supply; the lungs are responsible for breathing; the kidneys are responsible for the production of urine to eliminate toxins from the body.Therefore, the medical image analysis of organs in the chest and abdomen is an important research focus.Most of the tasks are computeraided diagnosis with classification, detection, and segmentation of lesions being the most targeted tasks.
Many datasets and challenges aim to segment one or more organs in the chest and abdomen for diagnosis or treatment planning.Subsection 4.1 discusses the datasets and challenges for segmentation.Subsection 4.2 introduces the datasets and challenges which focus on the diagnosis of organs in the chest and abdomen.While, Subsection 4.3 describes the datasets and challenges of the chest and abdomen that are not catego- Adjudicating between facecoding models with individualface fMRI responses [106] 2018 Decoding face from brain activity vision 107 T1w structural MRI study of cannabis users at baseline and 3 years follow up [107] 2018 Impact of cannabis on brain cannabis 1 DWI, Field map rized above, including regression, tracking, registration, and other tasks related to the chest and abdomen organs.

Datasets for chest & abdomen organ segmentation
This subsection covers the datasets and challenges of the chest and abdomen organs that are used for anatomic segmentation tasks.The anatomic segmentation tasks include the organ contour segmentation (Subsection 4.1.1)and organ segmentation (Subsection 4.1.2).The contour segmentation is different from organ segmentation-the former aims to separate an organ from the backgroup or mark the boundaries between multiple organs and the background.The latter aims to segment the organ into different parts at the anatomical level.Table 7 presents the datasets and challenges that are used for the segmentation of the chest and abdomen organs.

Datasets of chest and abdomen organs
Organ contour segmentation is a necessary information for the preplanning of surgery and diagnosis.A well-segmented contour of the organs provides a precise mask, which helps to produce accurate segmentation results for the diagnosis, treatment, and operation.This subsection introduces datasets and challenges for the contour segmentation of a single organ and of multiple organs.
Chest & abdomen datasets according to the organ: The datasets and challenges that we have covered are shown here.The following organs and parts are involved in: -Liver ( Generally, these datasets and challenges focus on the larger organs, such as the liver and the lungs, with the aim to diagnose tumors and lesions, and where contour segmentation is a pre-processing step.However, it is challenging to segment smaller organs with low-resolution images, particularly for radiotherapy, because an incorrect contour segmentation of these small organs can lead to severe consequences (e. g., organ damage).Small organs' incorrect contour can lead to their damage during radiotherapy.

Chest & abdomen datasets according to modality:
The most commonly used image modalities for chest and abdomen organs segmentation are MR and CT.As Table 1 shows, many datasets and challenges use MR images.MR images have higher resolution under certain conditions and have better resolution for soft body tissues and organs, such as the heart and prostate.Meanwhile, CT is the most widely used modality for organ segmentation and other tasks and diagnosis that are related to chest and abdomen, such as the lung and liver, according to our research, because of its convenience, effectiveness, and low cost.

Chest & abdomen datasets according to focus:
The purpose of these datasets and challenges can be categorized into three groups: further analysis, benchmark, and radiotherapy.Most datasets and challenges which provide annotated organs' contours are provided with the objective to focus on further analysis and treatments.One of the challenges of segmentation is to achieve a robust segmentation of the whole organ and separate it from the background, without omitting the lesions and tumors, and thus, some test benchmarks (116,128) are provided for researchers to evaluate their algorithms.Another challenge, which is addressed by datasets and challenges (115,120) is the imbalance between different organs because of their sizes and shapes, and such an imbalance makes it challenging to segment small organs and provide valuable information for analysis and treatment.

Single chest & abdomen organ contour segmentation:
The single organ's contour segmentation tasks usually focus on segmenting a region for subsequent tasks (110,122,123,124,129,138,144) or with an anatomical purpose (118,126,133,137,142,143) for research.The difficulty of the former task is that the lesions and tumors may affect the segmentation by separating the organ from the background, while the latter's difficulty is to perform more precise segmentation.

Chest & abdomen multi-organs contours segmentation:
The chest and abdomen multiple organs contour segmentation focuses on splitting the organs from each other.Some of these datasets and challenges (113,114,116) focus on the segmentation of multiple organs, including the relatively larger organs, which are easier to segment, and the relatively smaller organs, which can be more challenging to segment compared to the larger ones, especially when the model is handling the larger and smaller organs at the same time.Similarly, some of these datasets and challenges (115,120) focus on the "organ at risk" which means that these organs are healthy but might be at risk because of radiation therapy.Dataset (127) focuses on multi-atlas-based methods, which are widely used in brain-related research.Dataset (128) aims to provide a benchmark for the segmentation algorithms.

Chest & abdomen organ parts segmentation
Different from contour segmentation of the chest and abdomen organs, the organ segmentation aims to segment the organ into different parts.Just as the hand has five fingers, organs are made up of multiple parts, and a typical example is the Couinaud liver segmentation method.This subsection introduces the datasets and challenges for organ segmentation.These datasets and challenges are listed in Table 7.
Heart realted datasets and challenges: Most of these datasets and challenges (112,119,125,130,132,134,140) are related to the heart segmentation.The most frequently used modalities are MR and ultrasound, and the aim is to segment the heart into the left atrium, chambers, valves, and other parts.Though MR and ultrasound can effectively image the different tissues of the heart, the heartbeat results in blurred images, which makes the segmentation task more difficult, while for ultrasound, the dynamic nature of ultrasound images is another challenge for the segmentation algorithm.

Others chest & abdomen body parts: Challenge (139)
provides 55 CT scans and focuses on the segmentation of the lung with the labeling of its different parts: outside the lungs, the left lung, the upper lobe of the left lung, the lower lobe of the left lung, the upper lobe of the right lung, the middle lobe of the right lung, and the lower lobe of the right lung.The biggest challenge is the effect of the lung lesions and diseases, such as tuberculosis and pulmonary emphysema, on the performance of the segmentation.Moreover, challenges (135,141) focus on the segmentation of the lung vessels.

Datasets for diagnosis of chest & abdomen diseases
Diseases of organs in the chest and abdomen have a significant impact on human health.Therefore, many researchers work on this problem by analyzing medical images.Several researchers have designed automatic or semi-automatic algorithms for the classification, segmentation, detection, and characterization tasks to help medics diagnose these diseases.In this subsection, we describe the datasets and challenges related to the diagnosis of diseases of the chest and abdomen that are reported in Tables 8, 9, and 10, respectively.
Chest & abdomen datasets according to modality: According to the datasets and challenges collected, CT is the most commonly used imaging modality for the chest & abdomen, because of its suitable imaging quality and ability to clearly display tissues and lesions.Some datasets and challenges also provide CT images using contrast agents for clearer images.Besides CT imaging, there are other modalities, including MR, X-Ray digital radiographs, PET, endoscopy, etc.The MR images are used in breast-related diagnosis, cardiac-related tasks, soft tissue sarcoma detection, and ventilation imaging.Because of the organs' size and the CT's resolution, which is limited by the imaging exposure time and radiation dose, MR is a more suitable imaging modality for small or specific organs.The PET is always used with other modalities, such as CT and MRI.The contrast agent's density is related to the metabolism, which means the density of radiation from contrast agent will be high in the tumor, so PET is always used for tumor related tasks.Endoscopy images are used for medical inspection of the stomach, intestines, and others.

Chest & abdomen datasets according to classification of diseases:
The classification of diseases intends to determine whether a subject is healthy or not.It is sometimes called "detection" or "prediction", and the prediction is different from the detection task presented below.
The main focus of these datasets is to judge whether there is any cancer, lesion, or tumor, such as soft tissue sarcoma (192), prostate lesion (177,184), lung cancer (161), and breast cancer (160).Classification is an effective task for diagnosis, particularly computer-aided tasks.A quick and early diagnosis can allow effective interventions to increase the probability of the patient recovery before the condition worsens.
Another focus is the classification of diseases.These diseases include mainly pneumothorax (164), cardiac diseases (175), tuberculosis (178), pneumonia (179), and COVID-19, which are discussed at the end of this subsection.The endoscopy related challenges provide data with the aim to classify RGB images and videos to classify patient into "normal" vs. "abnormal".Dataset (169) focuses on the classification based on the diagnostic records.These datasets and challenges provide data for researchers to design AI-based algorithms to diagnose common diseases.

Chest & abdomen datasets for attribute classification:
The characterization task of the tumor and lesion is also called attribute classification, which focuses on the subsequent characterization analysis of the tumors and lesions following the detection and segmentation tasks using automatic analysis algorithms.A typical example is the attributes classification of pulmonary and lung cancer (159,162,168,186,189,193). The datasets and challenges usually provide CT scans with the annotation of different attributes, such as lesion type, spiculation, lesion localization, margin, lobulation, calcification, cavity, etc.Each attribute includes two or more categories.Another focus is the characterization of the breast related lesions and tumors (187,191).

Chest & abdomen datasets for detection:
In most research and clinical situations, classification is not enough.The medics and researchers usually focus on the reason for such a disease, and the localization of the lesion or tumor.Further treatment evaluations, plan and interpretability are the specific focus for medics and DL researchers.Thus, detection and segmentation are the tasks which are receiving a lot of attention at present.The detection task aims to find a region of interest and localize its position.The regions of interest usually include: -Lung cancer and tumor (173,180,189,195,197) -Pulmonary nodule (162,174,193,197,200) -Celiac-related damage (202,203,204,206) -Other lung lesions (172,183) -Polyp (198,204) -Cervical cancer ( 182  Furthermore, challenges (203,206) focus on the segmentation of artifacts (e. g., polyps) in endoscopic images.

COVID-19:
In 2020, COVID-19 became a research focus because it caused more than 100 million infections and two million deaths.Different datasets and challenges focus on this devastating disease and provide data to help researchers develop deep learning models to detect COVID-19 via various medical image modalities.
In the view of modalities, most of these datasets and challenges use either CT or CR images, and some provide both modalities.One exception is dataset (150), which uses ultrasound images.These datasets provide image annotations labeled by radiologists.
Most of these datasets and challenges are related to classification tasks.Datasets (145,146,147,157,159) directly focus on diagnosing COVID-19 from normal subjects.In contrast, datasets and challenges (149,150,151,153,154) focus on diagnosing COVID-19 from a few other similar diseases, which can also lead to lung opacity or other symptoms, such as Middle East Respiratory Syndrome (MERS), Severe Acute Respiratory Syndrome (SARS), and Acute Respiratory Distress Syndrome.Moreover, other datasets and challenges (148,152) focus on the diagnosis task, with natural language processing, genomics, or clinical methods.
Similarly, some other datasets (147,155,156) focus on the segmentation or detection of COVID-19 related lesions, such as ground-glass opacity, air-containing space, and pleural effusion.

Datasets for other chest and abdomen-related tasks
Besides the classification, detection, and segmentation tasks, there are also several other tasks which are the current focus of research.In the following, we present the datasets and challenges related to these tasks, and report them in Table 11.
Chest & abdomen datasets for regression: Similar to attributes classification, regression is another task which aims to compute or measure the target attributes from given images, but the difference is that the outputs of regression are continuous.A typical example is fetal biometric measurements (217,227).These challenges provide ultrasound images to help researchers design algorithms to measure such attributes to estimate the gestational age and monitor the fetus's growth.Besides, another example is cardiac measurements (212,213,214,218,221,226,231).These datasets and challenges provide MR or ultrasound images to analyze the heart's attributes to detect heart diseases.
Chest & abdomen datasets for tracking: Tracking is a critical task because our body and organs move during imaging.For organs, such as the heart, the characteristics of their motion is informative.Challenges (222,224) provide ultrasound data to track the liver to analyze the following of a surgery and treatments.Datasets and challenges (215,228,229) focus on the tracking of the heart.They provide ultrasound images to track and analyze the heart.
Chest & abdomen datasets for registration: Challenge (216) focuses on the CT registration of lungs and provides CT scans with and without enhanced and contrast agents.Meanwhile, challenges (220,225) focus on the registration between different modalities of the heart and provide MR, CT, and other modalities to register images with beating hearts.

Datasets for other chest & abdomen related tasks:
Challenges (210,223) focus on localizing specific landmarks, including the amniotic and the heart, using ultrasound and MR images.Challenge (211) focuses on the classification of surgery videos.Dataset (232) focuses on the reconstruction of the coronary artery.

Datasets and challenges for pathology and blood
Though radiography, MR imaging, and other imaging modalities have been used as the basis for diagnosis,

Generation
pathology images are also used as a gold standard for diagnosis, particularly for tumors and lesions.Digital pathology images are generally obtained by collecting tissue samples, making slices, staining, and imaging.Therefore, pathology images are also one of the mainstream image modalities that are used for diagnosis.
The focus of these datasets and challenges include 1) the identity and segmentation of basic elements (e. g., cell and nucleus) in pathology images, and 2) blood-based diagnosis from images.In this section, we present datasets and challenges of the pathology images (Subsection 5.1), and at the same time, cover the datasets and challenges of blood images in Subsection 5.2.

Datasets & challenges for pathology
Pathology images are used as the basis of cancer diagnosis.The pathologists and automatic algorithms analyze images based on specific features, such as cancer cells and cells under mitosis.Many organizations and researchers provide datasets and challenges, which focus on the microcosmic pathology and at the whole slide image (WSI) level.The relevant datasets and challenges are listed in Table 12.

Imaging datasets & challenges:
In most situations, WSI is used in pathology diagnosis.Unlike CT or MR images, the pathology image is an optical image similar to the picture photoed by a camera.However, one major difference is that a pathology image is imaged by transillumination, while the usual photo is imaged by reflection.Another difficulty is in the size of the image.WSI is stored in a multi-resolution pyramid structure.A single multi-resolution WSI is generally achieved by capturing many small high-resolution image patches, and it might contain up to billions of pixels.Thus, WSI is used as a virtual microscope in diagnosis for clinical research, and many challenges use WSI, such as (237,245,246,254,255,256).However, in some situations, the WSI is not suitable for analysis tasks, for example, cell segmentation.Therefore, pathology image patches are used in several other challenges, such as (233) for visual question answering, (259) for mitosis classification, (238,248) for multi-organs nucleus detection and segmentation.
Datasets for stain: Slides made from human tissues are without color, and required to be stained.The commonly used stains include Hematoxylin, Eosin, and Diaminobenzidine.Usually, two or more stains are used in staining the slide, and the most commonly used combinations include Hematoxylin & Eosin (H&E) and Hematoxylin & Diaminobenzidine (H-DAB).

Pathology datasets according to disease:
The pathology slides are widely used in the diagnosis of many diseases, especially cancer.The cancer cells and tissues have different shapes compared to their normal counterpart.Thus, the diagnosis via pathology is the gold standard.Many datasets and challenges, such as (238,248,260), do not address any specific disease.At the same time, many datasets and challenges target specific diseases, such as breast cancer (239,244), myeloma (262,268), cancers in the digestive system (241), cervical cancer (257,258), lung cancer (247), thyroid cancer (253), and osteosarcoma (240).
Pathology datasets according to task: Generally speaking, the tasks used with these datasets and challenges can be classified into two categories: microcosmic task and WSI-level task.The latter targets the diagnosis of diseases, based on a classification task.Expanded from the simple classification tasks, many datasets and research methodologies focus on complex tasks, such as the segmentation of tumor cell areas (238,248,249) and the detection of pathological features (241,255).The microcosmic tasks derive from the clinical analysis to identify cells and detect mitosis to extract key features from pathology images to support further disease diagnosis.The following subsections expand on the microcosmic tasks and WSI-leveling tasks, respectively.

Microcosmic related datasets
Microcosmic tasks focus on microcosmic features extraction (e. g., nucleus features), for further diagnosis and WSI-level tasks.In this subsection, we introduce the microcosmic task related datasets and challenges.
Data: Unlike the WSI-level, the datasets and challenges which focus on microcosmic tasks usually provide small size patch-level images with high-resolution.These patches are suitable for the annotation of microcosmic-level objects and resource-limited algorithms.The size of images varies depending on the image analysis tasks.For the segmentation and detection of cells and nucleus, the size of images is usually a thousand-pixel square to contain the suitable number of cells or nuclei.For individual cell analysis tasks (e. g., mitosis determination), the size is usually of a single cell.For other tasks (e. g., the patch-level classification), the size varies from dataset to dataset.

Pathology datasets for cell detection & segmentation:
Cells are considered to be essential for the pathology image.The analysis of cells is one of the most effective ways to extract pathology image features for diagnosis.
The pathologists analyze the size, shape, pattern, and stained color of the cells with their knowledge and expertise to make judgments about these cells and classify them as normal or abnormal.Thus, many datasets and challenges focus on the segmentation and detection of cells.The cells and nucleus can be placed neatly in the slide.However, during the slide preparation, these cells could overlap or locate randomly on the slide.Aiming at such a problem, challenges (257,258) focus on the segmentation and detection of overlapping cells and nuclei.The shape and size of cells from different organs might be different and can have different recognition and analysis challenges.Therefore challenges (238,248) focus on the multi-organ cells or nucleus segmentation.
Pathology datasets for patch-level classification: Generally, the size of WSI is too large to be able to analyze every cell and relationships between cells.DL-based methods can easily find essential information from the patchlevel image to support the diagnosis based on feature learning.Many datasets and challenges focus on this problem.The datasets and challenges, which provide patch-level images, mainly focus on the classification, segmentation, or detection tasks.Based on the quality of feature learning, DL has reached the state-of-the-art performance in many areas of computer vision.Therefore, some datasets and challenges focus on the patch itself, and not the cell itself.The tasks can vary from the segmentation, detection, and classification of the cell to the direct classification of the patch.Challenges (242,250,252,253) focus on patch-level image classification to determine whether metastatic or a different tissue is present.
Datasets for other pathology tasks: Besides the detection and segmentation of cells and the patch-level classification, there are other microcosmic tasks.Challenge (259) focuses on the mitotic detection for nuclear atypia scoring.The atypical shape, size, and internal organization of cells are related to the progress of cancer.The more advanced the cancer is, the more atypical the cell looks like.Challenge (260) focuses on cell tracking, to know how cells change shapes and move as they interact with their surrounding environment.This is the key to understand cell migration's mechanobiology and its multiple implications in normal tissue development and many respective diseases.Challenge (233) focuses on the visual question answering task of pathology images using AI where the model is trained to pass the examination of the pathologist.

Datasets for WSI-level tasks
WSI-level pathology tasks focus on the diagnosis of cancer and pathology image processing.WSI contains all the complete information of a patient to be able to establish an accurate diagnosis.Automatic diagnosis algorithms can quickly analyze the slide.This is useful, especially in developing countries where there is a lack of well-experienced pathologists.However, it is a challenge to directly analyze WSI for both pathologists and algorithms because the size of WSI can be up to 100, 000 × 100, 000 pixels.Thus, such analysis becomes challenging, and to address this, most of the current datasets and challenges focus on the classification and segmentation of biomarkers, cells, and other regions of interest.At the end of this subsection, we introduce other datasets and challenges that are related to the tasks of regression and localization of tumors and biomarkers.

Datasets for classification of WSI:
The prime goal of the examination of pathological images, especially WSI, is to diagnose cancer.Thus, how to classify WSI with large size and limited computing resources becomes a research challenge.Datasets and challenges (234,236,237,245) focus on predicting cancer or evaluating WSI, such as Gleason grade or HER2 evaluation.At the same time, some datasets and challenges (244,254,255) focus on the classification of metastasized cancer.
Datasets for segmentation and detection of WSI: DLbased methods are seen as a black box which process pathology images.The performance of these methods has achieved the state-of-the-art performance, but the interpretability of these methods is still difficult.From the pathologists' point of view, datasets and challenges (235,241,246,254,255) focus on the segmentation and detection tasks to determine the critical elements which led to a particular diagnosis, such as cancer cell area and signet ring cell.
Datasets for other WSI tasks: Besides classification and detection, there are a few other tasks based on WSI.This includes the registration of pathology images (243) for data pre-processing and the localization of lymphocytes (242).

Blood-related datasets
Blood image analysis is the basis of the diagnosis of many diseases.In contrast to the pathology images, blood samples' images mainly contain blood cells, Table 13: Summary of datasets and challenges of blood-related image analysis tasks.and these datasets and challenges are aimed at bloodrelated cancer and cell counting.Similar to pathology images, these datasets and challenges also focus on the segmentation, detection, and classification of cells.The relevant datasets and challenges are listed in Table 13.One of the primary tasks of these datasets is the classification of cells, which focuses on identifying the different types of cells.Dataset (271) focuses on classifying red blood cells, white blood cells, platelets, and other cells.At the same time, dataset (264) focuses on the classification of malignant and non-malignant cells.Other datasets and challenges (268) (multiple myeloma segmentation), ( 263) (mitochondria segmentation), (266) (malaria detection) focus on the segmentation and detection of blood cells and biomarkers.

Other datasets
Although we have categorized the datasets and challenges into three parts: "head and neck", "chest and abdomen", and "pathology and blood", several other datasets cannot be categorized under these three areas.In this section, we introduce the datasets and challenges categorized under "other" which means that these datasets do not fit under the above categories but they are still relevent to DL methods.The topics of this section include bone (Subsection 6.1), skin (Subsection 6.2), phantom (Subsection 6.3), and animal (Subsection 6.4).

Bone-related datasets
Medical image analysis of bone is currently a major research focus.Radioautography is the most effective way to image bones, because X-Ray is sensitive to calcium that makes up human bones.The segmentation of bone, the detection of abnormalities, and their characterization are meaningful clinical and research tasks.Therefore, the following subsections discuss the datasets and challenges for the classification, segmentation, and other tasks, and Table 14 reports these datasets and challenges.
Bone datasets for classification: The classification tasks for bone related computer-aided diagnosis is the focus for many researchers.Though the classification cannot locate the regions of interest, it can still help orthopedists to judge whether the patient is healthy or not, such as in dataset (283).The diagnosis of tears and abnormality is also a research focus, such as meniscal tears (279), vertebral fracture (282), and knee abnormality (279).

Regression
Bone datasets for segmentation: The segmentation task of bone images plays a vital role in clinical diagnosis and treatment.The computer-aided segmentation algorithms and orthopedist need to segment the different parts of the bone from a given image and make a sound judgment to provide a more adequate treatment.
The difficulty with such tasks is the low-resolution of images compared with other image modalities.The focus of these datasets and challenges include the spine (282, 284), vertebrae (275,276,281), and knee cartilage (285).

Skin-related datasets
Skin cancer is one of the most common type of cancer, and melanoma is one of the most lethal types of skin cancer.To diagnose skin cancer, dermoscopy is used to image the skin, and the classification, segmentation, and detection tasks are employed.The most relevant datasets and challenges are reported in Table 15.
Aiming at the computer-aided diagnosis of melanoma, ISIC released datasets and a series of challenges for clinical training and for the development of automatic algorithms.The challenges of ISIC include: 2017 (290), 2018 (289), 2019 (288).Challenges (289, 290) include three sub-challenges: lesion segmentation, lesion attribution detection, and lesion classification with thousands of dermoscopic images.Besides, challenge (288) focuses on the classification of melanoma, melanocytic nevus, basal cell carcinoma, actinic keratosis, benign keratosis, dermatofibroma, vascular lesion, squamous cell carcinoma, and others.Challenge (287), i. e., ISIC 2020, focuses on the classification of melanoma to better support dermatological clinical works with 33126 scans of more than 2000 patients.
Moreover, challenge (286) focuses on Diabetic Foot Ulcers, i. e., DFU.The challenges provide more than 2000 images of feet photographed with regular cameras under a consistent light source and annotated by experts for training and testing of automatic detection and classification algorithms.

Phantom-related datasets
Phantom is an object based on a specific material to mainly evaluate medical imaging equipment.Phantom However, the phantom has low cost and risk, easy to image, and easy to be annotated (291, 296, 297).The related datasets and challenges are reported in Table 15.

Animal-related datasets
Medical image analysis of animal material is relatively a smaller research area.However, it is not as limited by privacy and stricter ethics restrictions as human medical images.The datasets and challenges we found focus on animal brain segmentation (299, 301), depth estimation from endoscopic (300), and multi-modality registration (292).The relevant datasets and challenges are reported in Table 15.

Discussions
The success of AI algorithms such as DL has led to their widespread use in several fields, including for med-ical image analysis.Researchers with different knowledge and background tackle image-based clinical tasks using computer vision tools to design automatic algorithms for different applications [11,12,12,260,261,262,263,264].Though AI algorithms can successfully handle many tasks, several unsolved problems and challenges hinder the development of AI-based medical image analysis.

Problems and challenges
DL-based algorithms learn from input images of real data through gradient descent.Large-scale annotated datasets and a powerful DL model are key to the development of successful DL models.For example, the success of AlexNet [14], GoogleNet [2], ResNet [3] are based on powerful models, which include millions of parameters.At the same time, a large-scale dataset, such as ImageNet [265], is also necessary to train the DL model to be able to tune such a large number of parameters.However, when these methods are applied to medical image analysis, many domain-specific problems and challenges start to appear.This subsection discusses some of these challenges.

Data scarcity
The biggest challenge in the development of DL models is data scarcity.Different from other areas, the scale of the medical image datasets is usually smaller due to many limitations, e. g., the ethical restrictions.The commonly used datasets for traditional computer vision are in larger scale compared to medical image datasets.For example, the handwritten digits dataset, MNIST [266] includes a training set with 60,000 examples and a testing set with 10,000 examples; the ImageNet dataset [265] includes three million images for training and testing; Microsoft COCO [267] includes more than two million images with annotations.In contrast, many medical image datasets are smaller and only include hundreds or at most thousands of images.For example, the challenge BraTS 2020 (30) includes four hundred subjects and different modalities for each subject; the challenge REFUGE (70) provides about 1200 images of the eye; the challenge LUNA 16 (186) provides 888 CT scans; our recently published dataset of pulmonary lesions [268] just provides 694 scans; the challenge CAMELYON 17 (254) contains only more than 1000 WSI pathology images.
There are multiple reasons for the lack of data.The main cause is due to the restricted access to medical images by non-medical researchers, i. e., barriers between disciplines.The root causes of these barriers relate to the cost and difficulties of annotation and the restricted access due to ethics and privacy.
Access to data: As mentioned in the introduction, the direct cause of the data scarcity is that most nonmedical researchers are not allowed to access medical data directly.Though many medical data are generated worldwide every day, most non-medical researchers have no authorization to access clinical data.The easily accessible data are publicly available datasets, but these datasets are not at a large-scale to be able to properly train a DL model.
Ethical reasons: Ethics of medical data usage is a major bottleneck and a limitation to researchers, particularly, computer scientists.Medical data stored in databases always contains sensitive or private information, such as name, age, gender, and ID number.In some cases, the data records of medical images can be used to identify a patient.For example, if an MR scan includes the face, an intruder can identify them for a possibly evil purpose.In most countries and regions, it is illegal to distribute such data with private information without the patients' permission, and nobody would usually consent to such distribution.Therefore, for Deep learning researchers, it is impossible to gain authorization to access these datasets.
Before DL researchers are able to gain authorization even to desensitized data, they still need to pass ethical reviews.
Annotation: Another root cause is the difficulty to annotate medical images.Unlike other computer vision areas, the annotation of medical images requires specialized professions and knowledge.For example, in autopilot, when annotating objects such as vehicles and pedestrians, there are no specific annotators' requirements because most of us can easily distinguish a car or a human.However, when annotating medical images, domain-specific knowledge is essential.E. g., few people if naive would be able to tell the differences between an abnormal and normal tissue.However, it is impossible for a non-specialist to mark the lesion's contour or diagnose a disease.
This difficulty cannot easily be solved even when professionals are employed to annotate data.First, the cost of annotation of medical data is huge.Once the researcher and their organization have obtained some data, they need then to spend more money to employ few medics for its labeling.Such annotation cost is enormous, particularly where medical resources are scarce or where medical costs are high.For example, the challenge PALM (74) provides about 1,200 images with annotation, but its organizers involved only two clinical medics.Second, the physician who annotates the data is required to have a rich clinical and diagnosis experience, thus reducing the number of people who are suitable for this task even further.Third, to avoid any subjectivity, one image needs to be annotated by two or more physicians.Another problem is what to do if the labels of two annotators are not the same?In many challenges, the organizer employs many junior physicians to annotate and employs a senior physician to decide if the junior physicians' annotations are not the same.For example, in the challenge AGE (71), each data annotation is determined by the mean of four independent ophthalmologists in a group and it is then manually verified by a senior glaucoma expert.

Limitation of medical data
The characteristics of medical images themselves pose difficulties for the medical image analysis tasks.
There are many types and modalities of images that are used in medical image analysis.Similar to computer vision, the modalities include both 2D and 3D.However, the medical images have several other differences.Though the average scale of a medical image dataset is smaller than computer vision-related field datasets, the size of each sample of data is larger on average than the one of a computer vision-related field.
For 2D images, CR, WSI, and other modalities have large variances in the resolution and color than the other computer vision fields.Some modalities might need more bits to encode a pixel, while some modalities are significantly huge.For example, CAMELYON 17 (254) only includes about a thousand of pathology images, but the whole dataset is about three terabytes.Such datasets with few large samples pose a challenge for the AI algorithms, and it has become a focus of research to design an algorithm that can learn from limited computational resources (e. g., the number of labeled samples) and be useful for clinical diagnosis.
For 3D medical images such as CT and MRI, they are dense 3D data, compared with sparse data, such as point cloud, in autopilot.Like the BraTS serial challenges (30,31,32,33,34,35,36,37,38), many researchers face the challenges to design algorithms that can effectively learn from multi-modal dataset.
These characteristics of medical images require welldesigned algorithms with a more robust capability to fit the data well and without overfitting.However, that further leads to the need for more data and resources.It is a challenge to learn suitable features from a small sample dataset.

No silver bullet
The ideal scenario is to find or invent a method or an algorithm to simultaneously solve all of these encountered problems.However, there is no silver bullet.The problems and challenges related to the data and the adopted methods cannot be entirely resolved, or sometimes, a problem arises as another is solved.Nevertheless, many ideas have been introduced to address the current problems, and they are introduced in this subsection.
With respect to the problems and challenges mentioned above, researchers are working on two research directions: 1) a more effective model with less data, and 2) a more practical approach to access data.For the learning methods with small datasets, researchers use approaches such as few-shot learning and transfer learning.In order to access more data, researchers adopt three main approaches, namely federated learning, lifelong learning, and active learning.

Practical learning from small samples
Many medical image datasets have a small number of samples.For example, challenge MRBrains13 (6) only includes 20 subjects for training and testing, while challenge KITS 19 (110) has about two hundred subjects.Therefore, many researchers struggle to find a practical approach to learn from small samples.
Few-shot learning and zero-shot learning Few-shot learning hits one of the critical spots of DL-based medical image analysis problems, i. e., the development of DL models with fewer data.Humans can effectively learn from few samples.Therefore, different from the standard deep learning-based methods, humans learn to diagnose a disease from images, without the need to view tens of thousands of images (i.e., from only few-shot).Meta-learning, which is also called learning to learn, is a solution used to solve few-shot learning problem.Meta-learning can learn the meta-features from a small data size.The number of medical images in most datasets and challenges is not as large compared to the regular computer vision-related datasets and challenges.Mondal et al. [269] use few-shot learning and GAN to segment medical images.The GAN is modified for semi-supervised learning with few-shot learning.Similar to few-shot learning, zero-shot learning aims at novel samples.Rezaei et al. [270] cover a review of zero-shot learning from autonomous vehicles to COVID-19 diagnosis.However, zero-shot and few-shot learning have also their disadvantages, such as domain gap, overfitting, and interpretability.
Knowledge transfer: Transfer learning is another method, which can recognize and apply knowledge and skills learned from a previous task.For example, both white matter and gray matter segmentation and multi-organs segmentation are segmentation tasks.However, the neural network training is usually independent, which means that almost nobody trains a neural network with two tasks at once.However, it does not mean that these two tasks are unrelated.Besides zero-shot learning and few-shot learning, transfer learning, or say, knowledge transfer, is another method to infer knowledge from a previously learned task.Transfer learning can be applied to two similar tasks and between different domains.The most significant advantage of transfer learning is that they use rich scale datasets to pre-train the neural network and then fine tune and transfer the network to the main task on a few samples dataset.

Effective access to more samples
Besides finding a practical approach to learn from small samples, many researchers have been working on active learning and federated learning (which aims to use data without access to sensitive information).This also reduces annotation costs of deep learning algorithms.
Federated learning: Federated learning provides another way to access data.As discussed previously, the limitation of accessing data is led by privacy and other problems.Instead of directly sharing data, federated learning shares the model to protect privacy from being leaked.With other privacy protection methods, federated learning can effectively use the data from each independent data center or medical center.
However, there are two disadvantages of federated learning: annotation and implementation.The problem of annotation cannot be solved by sharing data but other methods.The main challenges are the implementation, as only a few institutions have attempted federated learning so far.For example, Intel and other institutions have attempted to apply federated learning for brain tumor-related tasks in their research [271].The main challenges in their implementation include: 1) The implementation and proof of privacy protection, 2) The methodology for sharing and updating millions of the model's parameters, 3) Preventing attacks on DL algorithms and leaks of data privacy on the Internet or computing nodes.
Natural language processing: Natural language processing is also a potential tool to automatically or semiautomatically annotate medical image data.It is a stan-dard procedure for a medic to provide a diagnostic report of the patient, particularly after the medical image was taken.Therefore, such large amounts of data (image and text) is useful for medical image analysis after desensitization, and natural language processing can be used for annotation.Several natural language processing-based methods, e. g., [272,273,274] have been applied in medical-related research fields.
Active learning: Active learning aims to reduce the annotation cost by indirectly using the unlabeled data to select the "best" samples to annotate.Generally, data annotation for deep learning requires experts to label data so that the neural network can learn from the data.Active learning does not require too many samples at the beginning of training.In other words, active learning can "help" annotators to label their data.Active learning uses the knowledge learned from the labeled data to select and annotate the unlabeled data.The unlabeled data with annotation from algorithms is used to subsequently train the network over the next number of epochs.Active learning [275,276] is used in the medical image analysis in a loop of 1) algorithm learn from the data annotated by humans, 2) human annotate the unlabeled data selected by the algorithm 3) algorithm add the newly labeled data to the training set.The advantage of active learning is obvious: annotators do not need to annotate all the data they have, and at the same time, the neural network can learn from data faster from such interactive progress.

Conclusion
In this work, we have provided a comprehensive survey of the datasets and challenges for medical image analysis, collected between 2013 and 2020.The datasets and challenges were categorized into four themes: head and neck, chest and abdomen, pathology and blood, and others.We provide a summary of all the details about these themes and data.We also discuss the problems and challenges of medical image analysis and the possible solutions to these problems and challenges.

Table 2 : 1 For
Summary of datasets and challenges for the brain lesion and tumor segmentation task.BraTS 17 to 20, T1 modality includes T1 image and T1Gd image.

Table 4 : 1 AMD
Summary of datasets and challenges of eye-disease related tasks.: age-related macular degeneration; DR: diabetic retinopathy; G: glaucoma.2Allthe diseases of this dataset are listed on the official website.See https://riadd.grand-challenge.org/Data/.
) -Liver cancer (188) Table 8: Summary of datasets and challenges for chest and abdomen organs-related tasks I.

Table 11 : 2
Summary of datasets and challenges of other medical applications in chest and abdomen.See official description:http://crt-epiggy19.surge.sh/datasets.html.

Table 12 :
Summary of datasets and challenges for pathology-related image analysis.

Table 14 :
Summary of datasets and challenges of bone-related image analysis tasks.

Table 1 :
Summary of datasets and challenges for the basic brain image analysis.

Table 3 :
Summary of datasets and challenges for brain disease classification tasks.

Table 5 :
Summary of datasets and challenges of head and neck related diseases.

Table 6 :
Summary of datasets and challenges that are used for behavioral and perception related tasks.

Table 9 :
Summary of datasets and challenges for chest and abdomen organs-related tasks II.

Table 10 :
Summary of datasets and challenges for chest and abdomen organs-related Tasks III.

Table 15 :
Summary of datasets and challenges for skin, phantom, and animal related image analysis tasks.