Abstract
Facial Expression Recognition (FER) in the wild poses significant challenges due to realistic occlusions, illumination, scale, and head pose variations of the facial images. In this article, we propose an Edge-AI-driven framework for FER. On the algorithms aspect, we propose two attention modules, Arbitrary-oriented Spatial Pooling (ASP) and Scalable Frequency Pooling (SFP), for effective feature extraction to improve classification accuracy. On the systems aspect, we propose an edge-cloud joint inference architecture for FER to achieve low-latency inference, consisting of a lightweight backbone network running on the edge device, and two optional attention modules partially offloaded to the cloud. Performance evaluation demonstrates that our approach achieves a good balance between classification accuracy and inference latency.
- [1] . 2015. Enhanced partitioned scheduling of mixed-criticality systems on multicore platforms. In Proceedings of the 20th Asia and South Pacific Design Automation Conference. IEEE, 630–635.Google Scholar
Cross Ref
- [2] . 2016. Training deep networks for facial expression recognition with crowd-sourced label distribution. In Proceedings of the ACM International Conference on Multimodal Interaction. 279–283.Google Scholar
Digital Library
- [3] . 2021. Context-aware attentional pooling (CAP) for fine-grained visual classification. In Proceedings of the 35th AAAI Conference on Artificial Intelligence. 929–937.Google Scholar
Cross Ref
- [4] . 2019. Memory- and communication-aware model compression for distributed deep learning inference on IoT. ACM Transactions on Embedded Computing Systems 18, 5s (2019), 82:1–82:22.Google Scholar
Digital Library
- [5] . 2020. A comprehensive survey on model compression and acceleration. Artificial Intelligence Review 53, 7 (2020), 5113–5155.Google Scholar
Digital Library
- [6] . 2011. Static facial expression analysis in tough conditions: Data, evaluation protocol, and benchmark. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 2106–2112.Google Scholar
Cross Ref
- [7] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google Scholar
Cross Ref
- [8] . 2021. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 13713–13722.Google Scholar
Cross Ref
- [9] . 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.Google Scholar
Cross Ref
- [10] . 2020. Design and optimization of energy-accuracy tradeoff networks for mobile platforms via pretrained deep models. ACM Transactions on Embedded Computing Systems 19, 1 (2020), 4:1–4:24.Google Scholar
Digital Library
- [11] . 2020. SETGAN: Scale and energy tradeoff GANs for image applications on mobile platforms. In Proceedings of the IEEE/ACM International Conference On Computer Aided Design. 23:1–23:9.Google Scholar
Digital Library
- [12] . 2022. Delving deep into spatial pooling for squeeze-and-excitation networks. Pattern Recognition 121 (2022), 108159.Google Scholar
Digital Library
- [13] . 2010. Presentation and validation of the Radboud faces database. Cognition and Emotion 24, 8 (2010), 1377–1388.Google Scholar
Cross Ref
- [14] . 2020. Edge AI: On-demand accelerating deep neural network inference via edge computing. IEEE Transactions on Wireless Communications 19, 1 (2020), 447–457.Google Scholar
Cross Ref
- [15] . 2018. Patch-gated CNN for occlusion-aware facial expression recognition. In Proceedings of the International Conference on Pattern Recognition. 2209–2214.Google Scholar
Cross Ref
- [16] . 2019. Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Transactions on Image Processing 28, 5 (2019), 2439–2450.Google Scholar
Cross Ref
- [17] . 2014. Partitioned multiprocessor scheduling of mixed-criticality parallel jobs. In Proceedings of the 2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications. IEEE, 1–10.Google Scholar
- [18] . 2018. Conditional convolution neural network enhanced random forest for facial expression recognition. Pattern Recognition 84 (2018), 251–261.Google Scholar
Digital Library
- [19] . 2023. LRP-based network pruning and policy distillation of robust and non-robust DRL agents for embedded systems. Concurrency and Computation: Practice and Experience (2023), e7351.Google Scholar
- [20] . 2018. ShuffleNet V2: Practical guidelines for efficient CNN architecture design. In Proceedings of the European Conference on Computer Vision. 122–138.Google Scholar
Digital Library
- [21] . 2017. Two-bit networks for deep learning on resource-constrained embedded devices. CoRR abs/1701.00485 (2017).Google Scholar
- [22] . 2021. Background-aware pooling and noise-aware loss for weakly-supervised semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6913–6922.Google Scholar
Cross Ref
- [23] . 2016. Conditional deep learning for energy-efficient and enhanced pattern recognition. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition. 475–480.Google Scholar
Cross Ref
- [24] . 2018. Bam: Bottleneck attention module. In Proceedings of British Machine Vision Conference. 147–160.Google Scholar
- [25] . 2021. Fcanet: Frequency channel attention networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 783–792.Google Scholar
Cross Ref
- [26] . 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the CVPR. 4510–4520.Google Scholar
Cross Ref
- [27] . 2018. Efficient semantic image segmentation with superpixel pooling. CoRR abs/1806.02705 (2018).Google Scholar
- [28] . 2022. Ptflops: A flops counting tool for neural networks in pytorch framework. https://github.com/sovrasov/flops-counter.pytorch.Google Scholar
- [29] . 2018. Designing adaptive neural networks for energy-constrained image classification. In Proceedings of the International Conference on Computer-Aided Design. 23.Google Scholar
Digital Library
- [30] . 2011. The NumPy array: A structure for efficient numerical computation. Computing in Science and Engineering 13, 2 (2011), 22–30.Google Scholar
Digital Library
- [31] . 2021. CRFace: Confidence ranker for model-agnostic face detection refinement. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1674–1684.Google Scholar
Cross Ref
- [32] . 2020. Region attention networks for pose and occlusion robust facial expression recognition. IEEE Transactions on Image Processing 29 (2020), 4057–4069.Google Scholar
Digital Library
- [33] . 2020. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11534–11542.Google Scholar
Cross Ref
- [34] . 2021. HLA-Face: Joint high-low adaptation for low light face detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 16195–16204.Google Scholar
Cross Ref
- [35] . 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision. Vol. 11211. 3–19.Google Scholar
Digital Library
- [36] . 2023. Edge computing driven low-light image dynamic enhancement for object detection. IEEE Transactions on Network Science and Engineering (2023).
DOI: Google ScholarCross Ref
- [37] . 2019. Deep multi-path convolutional neural network joint with salient region attention for facial expression recognition. Pattern Recognition 92 (2019), 177–191.Google Scholar
Digital Library
- [38] . 2017. S3Pool: Pooling with stochastic spatial sampling. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 4003–4011.Google Scholar
Cross Ref
- [39] . 2021. Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Transactions on Image Processing 30 (2021), 6544–6556.Google Scholar
Digital Library
- [40] . 2021. Robust lightweight facial expression recognition network with label distribution training. In Proceedings of the AAAI. 3510–3519.Google Scholar
Cross Ref
Index Terms
Edge-AI-Driven Framework with Efficient Mobile Network Design for Facial Expression Recognition
Recommendations
Expression-invariant face recognition by facial expression transformations
In this paper, we present a method of expression-invariant face recognition that transforms input face image with an arbitrary expression into its corresponding neutral facial expression image. When a new face image with an arbitrary expression is ...
Facial expression recognition with Convolutional Neural Networks
Facial expression recognition has been an active research area in the past 10 years, with growing application areas including avatar animation, neuromarketing and sociable robots. The recognition of facial expressions is not an easy problem for machine ...
Geometric Feature-Based Face Normalization for Facial Expression Recognition
AIMS '14: Proceedings of the 2014 2nd International Conference on Artificial Intelligence, Modelling and SimulationIn this paper, we propose a robust facial expression recognition approach using ASM (Active Shape Model) based face normalization and embedded hidden Markov model (EHMM). Since the face region generally varies as different emotion states, the face ...






Comments