Abstract
In the absence of vaccines or medicines to stop COVID-19, one of the effective methods to slow the spread of the coronavirus and reduce the overloading of healthcare is to wear a face mask. Nevertheless, to mandate the use of face masks or coverings in public areas, additional human resources are required, which is tedious and attention-intensive. To automate the monitoring process, one of the promising solutions is to leverage existing object detection models to detect the faces with or without masks. As such, security officers do not have to stare at the monitoring devices or crowds, and only have to deal with the alerts triggered by the detection of faces without masks. Existing object detection models usually focus on designing the CNN-based network architectures for extracting discriminative features. However, the size of training datasets of face mask detection is small, while the difference between faces with and without masks is subtle. Therefore, in this article, we propose a face mask detection framework that uses the context attention module to enable the effective attention of the feed-forward convolution neural network by adapting their attention maps’ feature refinement. Moreover, we further propose an anchor-free detector with Triplet-Consistency Representation Learning by integrating the consistency loss and the triplet loss to deal with the small-scale training data and the similarity between masks and occlusions. Extensive experimental results show that our method outperforms the other state-of-the-art methods. The source code is released as a public download to improve public health at https://github.com/wei-1006/MaskFaceDetection.
- [1] . AIZOOTech/FaceMaskDetection. https://github.com/AIZOOTech/FaceMaskDetection.Google Scholar
- [2] . 2002. Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 4 (2002), 509–522. Google Scholar
Digital Library
- [3] . 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020).Google Scholar
- [4] . 2020. YOLOv4: Optimal speed and accuracy of object detection. arXiv (2020).Google Scholar
- [5] . 2017. Real-time implementation of face recognition system. In 2017 International Conference on Computing Methodologies and Communication (ICCMC’17). IEEE, 249–255.Google Scholar
Cross Ref
- [6] . 2018. Cascade R-CNN: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6154–6162.Google Scholar
Cross Ref
- [7] . 2020. D2det: Towards high quality object detection and instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11485–11494.Google Scholar
Cross Ref
- [8] . 2017. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7291–7299.Google Scholar
Cross Ref
- [9] . 2021. You only look one-level feature. arXiv preprint arXiv:2103.09460 (2021).Google Scholar
- [10] . 2020. RepPoints V2: Verification meets regression for object detection. In Neural Information Processing Systems (NeurIPS’20).Google Scholar
- [11] . 2020. Face mask detection using transfer learning of inceptionv3. In International Conference on Big Data Analytics. Springer, 81–90.Google Scholar
Cross Ref
- [12] . 2019. Second-order attention network for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11065–11074.Google Scholar
Cross Ref
- [13] . 2005. Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 1. IEEE, 886–893. Google Scholar
Digital Library
- [14] . 2019. CenterNet: Keypoint triplets for object detection. In IEEE International Conference on Computer Vision (ICCV’19).Google Scholar
Cross Ref
- [15] . 2019. Saccader: Improving accuracy of hard attention models for vision. In Advances in Neural Information Processing Systems. 702–714. Google Scholar
Digital Library
- [16] . 2008. A discriminatively trained, multiscale, deformable part model. In 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1–8.Google Scholar
Cross Ref
- [17] . 2009. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 9 (2009), 1627–1645. Google Scholar
Digital Library
- [18] . 2019. RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free. arXiv preprint arXiv:1901.03353 (2019).Google Scholar
- [19] . 2020. Detecting masked faces using region-based convolutional neural network. In 2020 IEEE 15th International Conference on Industrial and Information Systems (ICIIS’20). IEEE, 156–161.Google Scholar
Cross Ref
- [20] . 2017. Detecting masked faces in the wild with LLE-CNNs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2682–2690.Google Scholar
Cross Ref
- [21] . 2018. Unsupervised representation learning by predicting image rotations. In The International Conference on Learning Representations (ICLR’18).Google Scholar
- [22] . 2018. Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728 (2018).Google Scholar
- [23] . 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 1440–1448. Google Scholar
Digital Library
- [24] . 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 580–587. Google Scholar
Digital Library
- [25] . 2011. Object detection with grammar models. Advances in Neural Information Processing Systems 24 (2011), 442–450. Google Scholar
Digital Library
- [26] . 2012. Discriminatively trained deformable part models, release 5. (2012).Google Scholar
- [27] 2020. Bootstrap your own latent: A new approach to self-supervised learning. In Advances in Neural Information Processing Systems (NeurIPS’20). IEEE.Google Scholar
- [28] . 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9729–9738.Google Scholar
Cross Ref
- [29] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google Scholar
Cross Ref
- [30] . 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google Scholar
- [31] . 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.Google Scholar
Cross Ref
- [32] . 2019. Consistency-based semi-supervised learning for object detection. In Advances in Neural Information Processing Systems. 10759–10768. Google Scholar
Digital Library
- [33] . 2010. Face detection for security surveillance system. In 2010 5th International Conference on Computer Science & Education. IEEE, 1735–1738.Google Scholar
Cross Ref
- [34] . 2020. RetinaMask: A Face Mask Detector. (2020).
arxiv:cs.CV/2005.03950 Google Scholar - [35] . 2020. Deep learning framework to detect face masks from video footage. In 2020 12th International Conference on Computational Intelligence and Communication Networks (CICN’20). IEEE, 435–440.Google Scholar
Cross Ref
- [36] . 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- [37] . 2020. Foveabox: Beyond anchor-based object detection. IEEE Transactions on Image Processing 29 (2020), 7389–7398.Google Scholar
Digital Library
- [38] . 2019. Face detection techniques: A review. Artificial Intelligence Review 52, 2 (2019), 927–948. Google Scholar
Digital Library
- [39] . 2018. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European Conference on Computer Vision (ECCV’18). 734–750.Google Scholar
Cross Ref
- [40] . 2020. Centermask: Real-time anchor-free instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13906–13915.Google Scholar
Cross Ref
- [41] . 2020. Respiratory virus shedding in exhaled breath and efficacy of face masks. Nature Medicine 26, 5 (2020), 676–680.Google Scholar
Cross Ref
- [42] . 2016. Masked face detection via a modified LeNet. Neurocomputing 218 (2016), 197–202. Google Scholar
Digital Library
- [43] . 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2117–2125.Google Scholar
Cross Ref
- [44] . 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision. 2980–2988.Google Scholar
Cross Ref
- [45] . 2020. Bidirectional attention-recognition model for fine-grained object classification. IEEE Transactions on Multimedia 22, 7 (2020), 1785–1795.Google Scholar
Cross Ref
- [46] . 2016. SSD: Single shot multibox detector. In European Conference on Computer Vision. Springer, 21–37.Google Scholar
Cross Ref
- [47] . 2020. Object-centric learning with slot attention. Advances in Neural Information Processing Systems 33 (2020), 11525–11538.Google Scholar
- [48] . 1999. Object recognition from local scale-invariant features. In Proceedings of the 7th IEEE International Conference on Computer Vision, Vol. 2. IEEE, 1150–1157. Google Scholar
Digital Library
- [49] . 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 2 (2004), 91–110. Google Scholar
Digital Library
- [50] . 2020. A novel technique for automated concealed face detection in surveillance videos. Personal and Ubiquitous Computing 25 (2020), 1–12.Google Scholar
- [51] . 2020. Multi-objective matrix normalization for fine-grained visual recognition. IEEE Transactions on Image Processing 29 (2020), 4996–5009.Google Scholar
Digital Library
- [52] . 2020. Self-supervised learning of pretext-invariant representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20).Google Scholar
Cross Ref
- [53] . 2006. Deep residual learning for image recognition. In Proceedings of IEEE International Conference on Pattern Recognition (ICPR’06).Google Scholar
- [54] . 2016. Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision. Springer, 483–499.Google Scholar
Cross Ref
- [55] . 2017. Towards accurate multi-person pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4903–4911.Google Scholar
Cross Ref
- [56] . 2019. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems. 8026–8037. Google Scholar
Digital Library
- [57] . 2016. You only look once: Unified, real-time object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).Google Scholar
Cross Ref
- [58] . 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779–788.Google Scholar
Cross Ref
- [59] . 2017. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7263–7271.Google Scholar
Cross Ref
- [60] . 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).Google Scholar
- [61] . 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. 91–99. Google Scholar
Digital Library
- [62] . 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2017), 1137–1149. Google Scholar
Digital Library
- [63] . 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 815–823.Google Scholar
Cross Ref
- [64] . 2019. Deep high-resolution representation learning for human pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19).Google Scholar
Cross Ref
- [65] . 2017. Compositional human pose regression. In Proceedings of the IEEE International Conference on Computer Vision. 2602–2611.Google Scholar
Cross Ref
- [66] . 2018. Integral human pose regression. In Proceedings of the European Conference on Computer Vision (ECCV’18). 529–545.Google Scholar
Cross Ref
- [67] . 2015. Deeply learned face representations are sparse, selective, and robust. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2892–2900.Google Scholar
Cross Ref
- [68] . 2014. Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1701–1708. Google Scholar
Digital Library
- [69] . 2020. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10781–10790.Google Scholar
Cross Ref
- [70] . 2019. FCOS: Fully convolutional one-stage object detection. In IEEE International Conference on Computer Vision (ICCV’19).Google Scholar
Cross Ref
- [71] . 2011. Segmentation as selective search for object recognition. In 2011 International Conference on Computer Vision. IEEE, 1879–1886. Google Scholar
Digital Library
- [72] . 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998–6008. Google Scholar
Digital Library
- [73] . 2001. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’01), Vol. 1. IEEE, I–I.Google Scholar
Cross Ref
- [74] . 2004. Robust real-time face detection. International Journal of Computer Vision 57, 2 (2004), 137–154. Google Scholar
Digital Library
- [75] . 2017. Residual attention network for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3156–3164.Google Scholar
Cross Ref
- [76] . 2019. Deep high-resolution representation learning for visual recognition. TPAMI 43, 10 (2019), 3349–3364.Google Scholar
- [77] . 2020. Learning human-object interaction detection using interaction points. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4116–4125.Google Scholar
Cross Ref
- [78] . 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV’18). 3–19.Google Scholar
Digital Library
- [79] . 2018. Simple baselines for human pose estimation and tracking. In Proceedings of the European Conference on Computer Vision (ECCV’18). 466–481.Google Scholar
Cross Ref
- [80] . 2020. Knowledge distillation meets self-supervision.. In Proceedings of the European Conference on Computer Vision (ECCV’20).Google Scholar
Digital Library
- [81] . 2016. WIDER FACE: A face detection benchmark. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).Google Scholar
Cross Ref
- [82] . 2019. RepPoints: Point set representation for object detection. In IEEE International Conference on Computer Vision (ICCV’19).Google Scholar
Cross Ref
- [83] . 2020. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9759–9768.Google Scholar
Cross Ref
- [84] . 2019. Object detection with deep learning: A review. IEEE Transactions on Neural Networks and Learning Systems 30, 11 (2019), 3212–3232.Google Scholar
Cross Ref
- [85] . 2020. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 12993–13000.Google Scholar
Cross Ref
- [86] . 2019. Bottom-up object detection by grouping extreme and center points. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 850–859.Google Scholar
Cross Ref
- [87] . 2014. Recover canonical-view faces in the wild with deep neural networks. arXiv preprint arXiv:1404.3543 (2014).Google Scholar
- [88] . 2014. Edge boxes: Locating object proposals from edges. In European Conference on Computer Vision. Springer, 391–405.Google Scholar
Cross Ref
- [89] . 2019. Object detection in 20 years: A survey. arXiv preprint arXiv:1905.05055 (2019).Google Scholar
Index Terms
Mask or Non-Mask? Robust Face Mask Detector via Triplet-Consistency Representation Learning
Recommendations
Real-Time Face Recognition with Mask using Deep Convolutional Neural Network
CNIOT '23: Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of ThingsThe COVID-19 pandemic started in 2019, from this situation people learned that the use of face masks is one of the most effective ways to protect themselves from Coronavirus. A problem has arisen from this situation. Face recognition systems are widely ...
Longitudinal Analysis of Mask and No-Mask on Child Face Recognition
ICVGIP '22: Proceedings of the Thirteenth Indian Conference on Computer Vision, Graphics and Image ProcessingFace is one of the most widely employed traits for person recognition, even in many large-scale applications. Despite technological advancements in face recognition systems, they still face obstacles caused by pose, expression, occlusion, and aging ...
RRFMDS: Rapid Real-Time Face Mask Detection System for Effective COVID-19 Monitoring
AbstractThe primary mode of COVID-19 transmission is through respiratory droplets that are produced when an infected person talks, coughs, or sneezes. To avoid the fast spread of the virus, the WHO has instructed people to use face masks in crowded and ...






Comments