Face Recognition via Thermal Imaging: A Comparative Study of Traditional and CNN-Based Approaches

In this article, a face recognition via thermal imaging: a comparative study of traditional and CNN-based approaches is proposed. The methodology comprises two distinct components: traditional face recognition and CNN-based face recognition. In the traditional face recognition, we employ Random Forest (RF) and Support Vector Machine Classifier (SVM) techniques. Conversely, CNN-based face recognition leverages Convolutional Neural Networks (CNN) to identify individuals. Our research involves a comprehensive evaluation conducted across different databases, including the PUCV Drunk Thermal Face (PUCV-DTF) and the UCH Thermal Temporal Face (UCH-TTF) datasets. To emulate real-life scenarios, we introduce elements such as glasses, face mask, and noise into the original thermal images during experimentation. The recognition rates of traditional and CNN-based methods for small size and middle size databases achieve 90% and 100%, respectively, under a challenging condition, wearing glasses and mask. Experimental results demonstrate the feasibility and effectiveness of our proposed method, showcasing its robustness in tackling various challenges.


INTRODUCTION
Face recognition plays a vital role in various applications, including video surveillance, criminal identification, and the development of smart cities.Most existing methods have traditionally focused on utilizing conventional CCD/CMOS cameras in the visible spectrum due to their availability and cost-effectiveness.However, achieving reliable face recognition in real-world scenarios remains a significant challenge [1,2].Notably, several factors such as illumination, pose, and disguises, have posed difficulties for face recognition within the visible spectrum.Consequently, enhancing face recognition in this context continues to be a prominent area of research.
Tackling the problem of illumination variations within the visible spectrum is a complex task.In response, one possible remedy involves the adoption of 3D devices that exhibit reduced sensitivity to changes in lighting conditions.However, these systems often encounter processing speed limitations.To overcome these challenges, researchers have explored the application of infrared imagery.Thermal infrared imagery, which typically operates within a wavelength range spanning 0.35 to 0.74 m [3-5], boasts several notable advantages over visible light, especially in environments characterized by inadequate illumination conditions.
In recent years, researchers have been exploring the techniques and applications of thermal infrared imagery, including lie detection [6] and human activity recognition [7].Thermal cameras excel at capturing thermal radiation emitted by objects, converting it into temperature data, and creating images that visualize temperature distribution [8][9][10].For instance, Zhu et al. [6] introduced a segmentation method for extracting forehead signatures from thermal video clips, which holds significant promise for deception detection.This technique relies on tracking a Region of Interest (ROI) on the forehead.Additionally, Uddin and Torresen [7] presented a human activity recognition system based on thermal camera data, harnessing robust features and deep recurrent neural networks.This approach proves remarkably effective for monitoring individuals in low-light environments, surpassing the capabilities of traditional RGB cameras.
Notably, thermal cameras facilitate the visualization and measurement of skin temperature.Human facial skin temperature closely correlates with the underlying blood vessel network.Several factors, including physiological, environmental, and imaging conditions, can influence the thermal imaging of a human face [11,12].Each individual possesses a unique facial thermal pattern primarily determined by their vascular structure.Over different time intervals, minimal changes occur in this underlying structure [13,14].These consistent thermal characteristics serve as the basis for matching thermal signatures to specific individuals, employing a technique akin to fingerprint recognition [15] for facial identity verification.
In a related development, Buddharaju et al. [16] introduced a recognition system centered on characteristic and time-invariant physiological data.They employed image processing techniques to pinpoint the superficial blood vessel network.Vigneau et al. [17] addressed challenges stemming from temporal variations in infrared face images.To tackle these issues, they employed five traditional feature-based methods to develop a thermal face recognition system.Hermosilla et al. [18] introduced a method that involved selecting specific thermal points on the face to create a feature vector for training the classifier.They utilized 22 distinct points for analysis across various scenarios.
Expanding upon the insights and contributions from those prior research [17][18][19], this article embarks on a comprehensive exploration of face recognition using thermal imaging.Our primary objective is to achieve robust face recognition under diverse real-world scenarios, encompassing situations where the face is unobstructed (Normal), when noise is introduced, when individuals wear glasses, and when face mask is worn.The proposed methodology consists of two core components: traditional face recognition and CNN-based face recognition.In the traditional face recognition, we employed Random Forest (RF) [20] and Support Vector Machine Classifier (SVM) [21] models to identify individuals, while the CNN-based face recognition component drove the power of Convolutional Neural Networks (CNN) [22].The experimental results confirm the feasibility and effectiveness of our proposed method, highlighting its robustness in successfully addressing a multitude of challenges.

THE PROPOSED METHODS
The proposed face recognition methods are divided into two key components: (1) traditional face recognition and (2) CNN-based face recognition.The traditional face recognition methods are introduced in section 2.1, while section 2.2 explores the CNN-based approach.<strike/>

Traditional Face Recognition
In the conventional face recognition approach, RF and SVM models are typically employed.As illustrated in Figure 1, the proposed flowchart for traditional face recognition comprises two main phases: training and testing.In the training phase, there are three key steps: • Preprocessing: In this step, human faces are extracted from thermal images using the Bayesian framework [16].The resulting face images are then standardized to dimensions of 81x150 pixels.• Feature Extraction: From each thermal image, a grid of multiple data points is extracted, forming a feature vector.This feature vector serves as the input for training the recognition model, whether it be RF or SVM.• Classification: After feature extraction, the recognition model (either RF or SVM) is trained using the generated feature vectors.We chose specific thermal points located on the face to create a feature vector for training the classifier.In Figure 2, each black block represents a 3x3 pixel neighborhood.To mitigate the impact of noise, we calculate the average intensity within each black block, as illustrated in 2-a.2-b displays the feature vectors corresponding to 12 positions that are typically not obscured by glasses or face mask.[16] to extract faces from their backgrounds.These extracted face images are then normalized to a consistent size of 81x150 pixels.

CNN-Based Face Recognition
Table 1 presents the architecture of our CNN model, which draws inspiration from VGGNet [23].Our proposed CNN architecture comprises three convolutional layers and two fully connected (FC) layers.The convolutional layers consist of 16, 32, and 64 kernels, each with a size of 3x3.Within each FC layer, there are 1024 neurons.To mitigate overfitting, we have incorporated a dropout layer after the final convolutional layer and between the two FC layers.

EXPERIMENTAL RESULTS AND PERFORMANCE ANALYSIS
We have implemented the proposed face recognition methods using the Python programming language.All experiments were performed on a single RTX 2080 GPU.The experimental dataset includes images with occlusions and noise, allowing for a comprehensive evaluation of the method's effectiveness.
The samples from the PUCV-DTF database [18] are illustrated in Figure 4.This thermal database includes 46 individuals, each represented in five subsets: sober, 1-beer, 2-beers, 3-beers, and 4beers, with each subset containing 50 images.Consequently, there are a total of 250 images per person.These thermal images were acquired over time using the FLIR Tau2 thermal imaging camera.In the preprocessing phase, each image is cropped and aligned to dimensions of 81x150 pixels based on the coordinates of the eyes.
Figure 5 displays sample images from the UCH Thermal Temporal Face (UCH-TTF) database [17].This dataset consists of images collected from 7 distinct individuals, with each person contributing 50 images.The images were captured using a FLIR TAU 320 thermal camera.During preprocessing, the images were cropped and standardized to dimensions of 81x150 pixels.
Figure 6 illustrates five distinct experimental scenarios: normal (original images), noise (with noise), glasses (in glasses), face mask (in face mask), and both (in glasses and face mask).The normal image is depicted in 6-a.Gaussian noise, a common noise type in thermal images [24], is introduced to thermal images, as depicted in 6-b.Glasses and face mask obstruct the thermal spectrum, including

Face Recognition Experiments
In our experiments, we considered two key parameters.First, the parameter F denotes the number of feature vectors, where we compared F = 12 and F = 22.Second, we utilized the parameter N to indicate the number of images per individual used for training.For instance, setting N = 30 implies that we employed the first 30 images from the databases for training and tested against the remaining 20 images.The objective of these experiments was to thoroughly analyze the performance of different feature vectors in various scenarios, including normal conditions (original images), images with added noise, images depicting individuals wearing glasses, images of individuals wearing face mask, and scenarios involving both glasses and face mask.Tables 2 and Table 3 present the outcomes obtained with the RF model using different features.The results of the SVM model are displayed in Table 4 and Table 5, while the CNN model's performance is detailed in Table 6.The results indicate that, under conditions involving glasses, mask, or both, superior performance is observed when F = 12.However, in all other experiments, F = 22 demonstrates better performance, particularly under normal conditions.
Table 7 displays the outcomes achieved through the utilization of RF, SVM, and CNN models on the PUCV-DTF database, while Table 8 showcases the results for the UCH-TTF database.Throughout the subsequent experiments, from Table 2 to Table 6, a sample size of N = 30 was employed.The findings suggest that, within the PUCV-DTF database, both traditional and CNN-based methods demonstrate strong performance.In contrast, within the UCH-TTF database, the CNN-based method emerges as the superior performer.

CONCLUSIONS
In this article, we have proposed a face recognition via thermal imaging: a comparative study of traditional and CNN-based approaches.These methods encompass two primary components: traditional face recognition, where we employed Random Forest (RF) and Support Vector Machine Classifier (SVM) models for individual identification, and CNN-based face recognition, featuring the application of a Convolutional Neural Network (CNN) model.
Recognizing the prevalence of masks in our daily lives, especially in the context of the Covid-19 pandemic, we expanded our experiments to simulate real-world scenarios.This entailed the introduction of various experimental image categories, thus encompassing images of individuals wearing face mask.The results of these experiments convincingly demonstrated the effectiveness of our proposed face recognition methods, even when confronted with partial face coverage such as wearing glasses and face mask.

Figure 1 :Figure 2 :
Figure 1: Flowchart of the proposed traditional face recognition approach

Figure 3
Figure 3 illustrates the proposed flowchart for CNN-based face recognition, which encompasses two distinct phases: training and testing.The training phase involves two fundamental steps: preprocessing and classification.To prepare the thermal face images for CNN model training, we utilize the Bayesian framework[16] to extract faces from their backgrounds.These extracted face images are then normalized to a consistent size of 81x150 pixels.Table1presents the architecture of our CNN model, which draws inspiration from VGGNet[23].Our proposed CNN architecture comprises three convolutional layers and two fully connected (FC) layers.The convolutional layers consist of 16, 32, and 64 kernels, each with a size of 3x3.Within each FC layer, there are 1024 neurons.

Figure 3 :
Figure 3: Flowchart of the proposed CNN-based face recognition approach

Figure 6 :
Figure 6: Original and modified experiment samples: (a) Normal images, (b) Noise images, (c) Glasses images, (d) Face mask images, (e) Glasses and face mask images

Table 2 :
Recognition rate (%) achieved by RF model in PUCV-DTF database

Table 3 :
Recognition rate (%) achieved by RF model in UCH-TTF database

Table 5 :
Recognition rate (%) achieved by SVM model in UCH-TTF database

Table 6 :
Recognition rate (%) achieved by CNN model