Abstract
Facial landmark detection aims to locate keypoints for facial images, which typically suffer from variations caused by arbitrary pose, diverse facial expressions, and partial occlusion. In this article, we propose a coarse-to-fine framework that joins a stacked hourglass network and salient region attention refinement for robust face alignment. To achieve this goal, we first present a multi-scale region learning module to analyze the structure information at a different facial region and extract a strong discriminative deep feature. Then we employ a stacked hourglass network for heatmap regression and initial facial landmarks prediction. Specifically, the stacked hourglass network introduces an improved Inception-ResNet unit as a basic building block, which can effectively improve the receptive field and learn contextual feature representations. Meanwhile, a novel loss function takes into account global weights and local weights to make the heatmap regression more accurate. Different from existing heatmap regression models, we present a salient region attention refinement module to extract a precise feature based on the heatmap regression, and utilize the filtered feature for landmarks refinement to achieve accurate prediction. Extensive experimental results of several challenging datasets (including 300 Faces in the Wild, Caltech Occluded Faces in the Wild, and Annotated Facial Landmarks Faces in the Wild) confirm that our approach can achieve more competitive performance than the most advanced algorithms.
- Ankan Bansal, Carlos Castillo, Rajeev Ranjan, and Rama Chellappa. 2017. The do’s and don’ts for CNN-based face verification. arXiv:1705.07426.Google Scholar
- C. Fabian Benitez-Quiroz, Ramprakash Srinivasan, and Aleix M. Martinez. 2016. EmotioNet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In Proceedings of the 2016 Conference on Computer Vision and Pattern Recognition. 5562--5570.Google Scholar
- Adrian Bulat and Georgios Tzimiropoulos. 2016. Convolutional aggregation of local evidence for large pose face alignment. In Proceedings of the 2016 British Machine Vision Conference.Google Scholar
Cross Ref
- Adrian Bulat and Georgios Tzimiropoulos. 2016. Human pose estimation via convolutional part heatmap regression. In Proceedings of the 2016 European Conference on Computer Vision. 717--732.Google Scholar
Cross Ref
- Adrian Bulat and Georgios Tzimiropoulos. 2017. Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources. In Proceedings of the 2017 International Conference on Computer Vision.Google Scholar
Cross Ref
- Xavier P. Burgosartizzu, Pietro Perona, and Piotr Dollar. 2014. Robust face landmark estimation under occlusion. In Proceedings of the 2014 IEEE International Conference on Computer Vision. 1513--1520.Google Scholar
- Xiao Chu, Wei Yang, Wanli Ouyang, Cheng Ma, Alan L. Yuille, and Xiaogang Wang. 2017. Multi-context attention for human pose estimation. arXiv:1702.07432.Google Scholar
- T. F. Cootes, G. J. Edwards, and C. J. Taylor. 2001. Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 6 (2001), 681--685.Google Scholar
Digital Library
- Timothy F. Cootes, Christopher J. Taylor, David H. Cooper, and Jim Graham. 1995. Active shape models—Their training and application. Computer Vision and Image Understanding 61, 1 (1995), 38--59.Google Scholar
Digital Library
- Piotr Dollár, Peter Welinder, and Pietro Perona. 2010. Cascaded pose regression. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 1078--1085.Google Scholar
Cross Ref
- Pengfei Dou, Shishir K. Shah, and Ioannis A. Kakadiaris. 2017. End-to-end 3D face reconstruction with deep neural networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 21--26.Google Scholar
- Zhen Hua Feng, Josef Kittler, William Christmas, Patrik Huber, and Xiao Jun Wu. 2017. Dynamic attention-controlled cascaded shape regression exploiting training data augmentation and fuzzy-set sample weighting. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 3681--3690.Google Scholar
Cross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google Scholar
Cross Ref
- Amin Jourabloo and Xiaoming Liu. 2015. Pose-invariant 3D face alignment. In Proceedings of the 2015 IEEE International Conference on Computer Vision. 3694--3702.Google Scholar
Digital Library
- Vahid Kazemi and Josephine Sullivan. 2014. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. 1867--1874.Google Scholar
Digital Library
- Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278--2324.Google Scholar
Cross Ref
- Q. Liu, J. Deng, J. Yang, G. Liu, and D. Tao. 2017. Adaptive cascade regression model for robust face alignment. IEEE Transactions on Image Processing 26, 2 (Feb. 2017), 797--807. DOI:https://doi.org/10.1109/TIP.2016.2633939Google Scholar
Digital Library
- Yaojie Liu, Amin Jourabloo, William Ren, and Xiaoming Liu. 2017. Dense face alignment. In Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops. 1619--1628.Google Scholar
Cross Ref
- Jiangjing Lv, Xiaohu Shao, Junliang Xing, Cheng Cheng, and Xi Zhou. 2017. A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 3691--3700.Google Scholar
Cross Ref
- Martin Koestinger, Paul Wohlhart, Peter M. Roth, and Horst Bischof. 2011. Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In Proceedings of the 2011 1st IEEE International Workshop on Benchmarking Facial Image Analysis Technologies.Google Scholar
Cross Ref
- Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In Proceedings of the European Conference on Computer Vision. 483--499.Google Scholar
Cross Ref
- S. Ren, X. Cao, Y. Wei, and J. Sun. 2016. Face alignment via regressing local binary features. IEEE Transactions on Image Processing 25, 3 (2016), 1233--1245.Google Scholar
Digital Library
- Christos Sagonas, Georgios Tzimiropoulos, Stefanos Zafeiriou, and Maja Pantic. 2013. 300 Faces in-the-Wild Challenge: The first facial landmark localization challenge. In Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops. 397--403.Google Scholar
Digital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.Google Scholar
- Yi Sun, Xiaogang Wang, and Xiaoou Tang. 2013. Deep convolutional network cascade for facial point detection. In Proceedings of the 2013 Conference on Computer Vision and Pattern Recognition. 3476--3483.Google Scholar
Digital Library
- Zhiqiang Tang, Xi Peng, Shijie Geng, Lingfei Wu, Shaoting Zhang, and Dimitris Metaxas. 2018. Quantized densely connected U-Nets for efficient landmark localization. In Proceedings of the 2018 European Conference on Computer Vision.Google Scholar
Cross Ref
- Jonathan J. Tompson, Arjun Jain, Yann LeCun, and Christoph Bregler. 2014. Joint training of a convolutional network and a graphical model for human pose estimation. In Advances in Neural Information Processing Systems. 1799--1807.Google Scholar
- George Trigeorgis, Patrick Snape, Mihalis A. Nicolaou, Epameinondas Antonakos, and Stefanos Zafeiriou. 2016. Mnemonic descent method: A recurrent process applied for end-to-end face alignment. In Proceedings of the 2016 Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- Georgios Tzimiropoulos. 2015. Project-out cascaded regression with an application to face alignment. In Proceedings of the 2015 Conference on Computer Vision and Pattern Recognition. 3659--3667.Google Scholar
Cross Ref
- Georgios Tzimiropoulos and Maja Pantic. 2013. Optimization problems for fast AAM fitting in-the-wild. In Proceedings of the 2013 IEEE International Conference on Computer Vision. 593--600.Google Scholar
Digital Library
- Roberto Valle and M. José. 2018. A deeply-initialized coarse-to-fine ensemble of regression trees for face alignment. In Proceedings of the 2018 European Conference on Computer Vision.585--601.Google Scholar
- Yichen Wei. 2014. Face alignment by explicit shape regression. International Journal of Computer Vision 107, 2 (2014), 177--190.Google Scholar
Digital Library
- Wayne Wu, Chen Qian, Shuo Yang, Quan Wang, Yici Cai, and Qiang Zhou. 2018. Look at boundary: A boundary-aware face alignment algorithm. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. 2129--2138.Google Scholar
Cross Ref
- Shengtao Xiao, Jiashi Feng, Junliang Xing, Hanjiang Lai, Shuicheng Yan, and Ashraf Kassim. 2016. Robust facial landmark detection via recurrent attentive-refinement networks. In Proceedings of the 2016 European Conference on Computer Vision. 57--72.Google Scholar
Cross Ref
- Xuehan Xiong and Fernando De La Torre. 2013. Supervised descent method and its applications to face alignment. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. 532--539.Google Scholar
Digital Library
- Jing Yang, Qingshan Liu, and Kaihua Zhang. 2017. Stacked hourglass network for robust facial landmark localisation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2025--2033.Google Scholar
Cross Ref
- Xiang Yu, Feng Zhou, and Manmohan Chandraker. 2016. Deep deformation network for object landmark localization. In Proceedings of the 2016 European Conference on Computer Vision. 52--70.Google Scholar
Cross Ref
- Jie Zhang, Shiguang Shan, Meina Kan, and Xilin Chen. 2014. Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In Proceedings of the 2014 European Conference on Computer Vision. 1--16.Google Scholar
Cross Ref
- Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters 23, 10 (2016), 1499--1503.Google Scholar
Digital Library
- Shizhan Zhu, Cheng Li, Change Loy Chen, and Xiaoou Tang. 2015. Face alignment by coarse-to-fine shape searching. In Proceedings of the 2015 Conference on Computer Vision and Pattern Recognition. 4998--5006.Google Scholar
- Shizhan Zhu, Cheng Li, Change Loy Chen, and Xiaoou Tang. 2016. Unconstrained face alignment via cascaded compositional learning. In Proceedings of the 2016 Conference on Computer Vision and Pattern Recognition. 3409--3417.Google Scholar
Cross Ref
- Xiangyu Zhu, Zhen Lei, Xiaoming Liu, Hailin Shi, and Stan Z. Li. 2016. Face alignment across large poses: A 3D solution. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 146--155.Google Scholar
Index Terms
Joint Stacked Hourglass Network and Salient Region Attention Refinement for Robust Face Alignment
Recommendations
Stacked Hourglass Network Joint with Salient Region Attention Refinement for Face Alignment
2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019)Localizing facial landmarks is a fundamental step in facial image analysis. However, the problem continues to be challenging in condition of large variations caused by pose disparity, illumination, expression and occlusion. In this paper, we propose a ...
Face Alignment Refinement
WACV '15: Proceedings of the 2015 IEEE Winter Conference on Applications of Computer VisionAchieving sub-pixel accuracy with face alignment algorithms is a difficult task given the diversity of appearance in real world facial profiles. To capture variations in perspective, occlusion, and illumination with adequate precision, current face ...
Deep multi-path convolutional neural network joint with salient region attention for facial expression recognition
Highlights- A deep-based model for solving facial expression recognition is proposed.
- ...
AbstractFacial Expression Recognition (FER) has long been a challenging task in the field of computer vision. In this paper, we present a novel model, named Deep Attentive Multi-path Convolutional Neural Network (DAM-CNN), for FER. Different ...






Comments