Abstract
Face alignment is a key component of numerous face analysis tasks. In recent years, most existing methods have focused on designing high-performance face alignment systems and paid less attention to efficiency. However more face alignment systems are now applied on low-cost devices, such as mobile phones. In this article, we design a common efficient framework that can team with any face alignment regression network and improve the overall performance with nearly no extra computational cost. First, we discover that the maximum regression error exists in the face contour, where landmarks do not have distinct semantic positions, and thus are randomly labeled along the face contours in training data. To address this problem, we propose a novel contour fitting loss that dynamically adjusts the regression target during training so the network can learn more accurate semantic meanings of the contour landmarks and achieve better localization performance. Second, we decouple the complex sample variations in face alignment task and propose a Fast Normalization Module (FNM) to efficiently normalize considerable variations that can be described by geometric transformation. Finally, a new lightweight network architecture named Lightweight Alignment Module (LAM) is also proposed to achieve fast and precise face alignment on mobile devices. Our method achieves competitive performance with state-of-the-arts on 300W and AFLW2000-3D benchmarks. Meanwhile, the speed of our framework is significantly faster than other CNN-based approaches.
- Brian Amberg, Sami Romdhani, and Thomas Vetter. 2007. Optimal step nonrigid ICP algorithms for surface registration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--8.Google Scholar
Cross Ref
- P. N. Belhumeur, D. W. Jacobs, D. J. Kriegman, and N. Kumar. 2011. Localizing parts of faces using a consensus of exemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 545--552.Google Scholar
- Paul J. Besl and Neil D. McKay. 1992. Method for registration of 3-D shapes. In Sensor Fusion IV: Control Paradigms and Data Structures, Vol. 1611. International Society for Optics and Photonics, 586--607.Google Scholar
- Chandrasekhar Bhagavatula, Chenchen Zhu, Khoa Luu, and Marios Savvides. 2017. Faster than real-time facial alignment: A 3D spatial transformer network approach in unconstrained poses. In Proceedings of the IEEE International Conference on Computer Vision. 3980--3989.Google Scholar
Cross Ref
- Xavier P. Burgos-Artizzu, Pietro Perona, and Piotr Dollár. 2013. Robust face landmark estimation under occlusion. In Proceedings of the IEEE International Conference on Computer Vision. 1513--1520.Google Scholar
Digital Library
- Xudong Cao, Yichen Wei, Fang Wen, and Jian Sun. 2014. Face alignment by explicit shape regression. Int. J. Comput. Vis. 107, 2 (2014), 177--190.Google Scholar
Digital Library
- T. F. Cootes, G. J. Edwards, and C. J. Taylor. 1998. Active appearance models. In European Conference on Computer Vision. 484--498.Google Scholar
- Timothy F. Cootes, Christopher J. Taylor, David H. Cooper, and Jim Graham. 1995. Active shape models-their training and application. Computer Vision and Image Understanding 61, 1 (1995), 38--59.Google Scholar
Digital Library
- Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1. arXiv preprint arXiv:1602.02830 (2016).Google Scholar
- Jiankang Deng, George Trigeorgis, Yuxiang Zhou, and Stefanos Zafeiriou. 2019. Joint multi-view face alignment in the wild. IEEE Transactions on Image Processing 28, 7 (2019), 3636--3648.Google Scholar
Cross Ref
- P. Dollár, P. Welinder, and P. Perona. 2010. Cascaded pose regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1078--1085.Google Scholar
- Xuanyi Dong, Shoou-I Yu, Xinshuo Weng, Shih-En Wei, Yi Yang, and Yaser Sheikh. 2018. Supervision-by-registration: An unsupervised approach to improve the precision of facial landmark detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 360--368.Google Scholar
Cross Ref
- Yao Feng, Fan Wu, Xiaohu Shao, Yanfeng Wang, and Xi Zhou. 2018. Joint 3D face reconstruction and dense alignment with position map regression network. In Proceedings of the European Conference on Computer Vision (ECCV’18). 534--551.Google Scholar
Cross Ref
- Zhen Hua Feng, Josef Kittler, Muhammad Awais, Patrik Huber, and Xiao Jun Wu. 2018. Wing loss for robust facial landmark localisation with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2235--2245.Google Scholar
Cross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google Scholar
Cross Ref
- Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google Scholar
- Gao Huang, Shichen Liu, Van Der Maaten Laurens, and Kilian Q. Weinberger. 2018. CondenseNet: An efficient densenet using learned group convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2752--2761.Google Scholar
- Amin Jourabloo, Mao Ye, Xiaoming Liu, and Liu Ren. 2017. Pose-invariant face alignment with a single CNN. In Proceedings of the IEEE International Conference on Computer Vision. 3219--3228.Google Scholar
Cross Ref
- Vahid Kazemi and Josephine Sullivan. 2014. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1867--1874.Google Scholar
Digital Library
- Martin Koestinger, Paul Wohlhart, Peter M. Roth, and Horst Bischof. 2011. Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCV Workshops’11). IEEE, 2144--2151.Google Scholar
Cross Ref
- Amit Kumar and Rama Chellappa. 2018. Disentangling 3D pose in a dendritic CNN for unconstrained 2D face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 430--439.Google Scholar
Cross Ref
- Vuong Le, Jonathan Brandt, Zhe Lin, Lubomir Bourdev, and Thomas S. Huang. 2012. Interactive facial feature localization. In Proceedings of the European Conference on Computer Vision. 679--692.Google Scholar
- Yaojie Liu, Amin Jourabloo, William Ren, and Xiaoming Liu. 2017. Dense face alignment. In Proceedings of the IEEE International Conference on Computer Vision Workshop. 1619--1628.Google Scholar
Cross Ref
- David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2 (2004), 91--110.Google Scholar
Digital Library
- Jiangjing Lv, Xiaohu Shao, Junliang Xing, Cheng Cheng, and Xi Zhou. 2017. A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3691--3700.Google Scholar
Cross Ref
- K. Messer, J. Matas, J. Kittler, and K. Jonsson. 2000. XM2VTS: the extended M2VTS database. In Proceedings of the 2nd International Conference on Audio- and Video-Based Biometric Person Authentication. 72--77.Google Scholar
- Deva Ramanan. 2012. Face detection, pose estimation, and landmark localization in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2879--2886.Google Scholar
- Rajeev Ranjan, Vishal M. Patel, and Rama Chellappa. 2019. Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Anal. Machine Intell. 41, 1 (2019), 121--135.Google Scholar
Digital Library
- Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. In European Conference on Computer Vision. Springer, 525--542.Google Scholar
Cross Ref
- Shaoqing Ren, Xudong Cao, Yichen Wei, and Jian Sun. 2016. Face alignment via regressing local binary features. IEEE Trans. Image Proc. 25, 3 (2016), 1233--1245.Google Scholar
Cross Ref
- Sami Romdhani and Thomas Vetter. 2005. Estimating 3D shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 2. IEEE, 986--993.Google Scholar
Digital Library
- Christos Sagonas, Georgios Tzimiropoulos, Stefanos Zafeiriou, and Maja Pantic. 2013. 300 Faces in-the-wild challenge: The first facial landmark localization challenge. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 397--403.Google Scholar
Digital Library
- Christos Sagonas, Georgios Tzimiropoulos, Stefanos Zafeiriou, and Maja Pantic. 2013. A semi-automatic methodology for facial landmark annotation. In Proceedings of the IEEE Computer Vision and Pattern Recognition Workshops. 896--903.Google Scholar
Digital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- Yi Sun, Xiaogang Wang, and Xiaoou Tang. 2013. Deep convolutional network cascade for facial point detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3476--3483.Google Scholar
Digital Library
- Roberto Valle and M. José. 2018. A deeply initialized coarse-to-fine ensemble of regression trees for face alignment. In Proceedings of the European Conference on Computer Vision (ECCV’18). 585--601.Google Scholar
- Wayne Wu, Chen Qian, Shuo Yang, Quan Wang, Yici Cai, and Qiang Zhou. 2018. Look at boundary: A boundary-aware face alignment algorithm. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2129--2138.Google Scholar
Cross Ref
- Shengtao Xiao, Jiashi Feng, Junliang Xing, Hanjiang Lai, Shuicheng Yan, and Ashraf Kassim. 2016. Robust facial landmark detection via recurrent attentive-refinement networks. In Proceedings of the European Conference on Computer Vision. 57--72.Google Scholar
Cross Ref
- Xuehan Xiong and Fernando De La Torre. 2013. Supervised descent method and its applications to face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 532--539.Google Scholar
Digital Library
- Jing Yang, Qingshan Liu, and Kaihua Zhang. 2017. Stacked hourglass network for robust facial landmark localisation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2025--2033.Google Scholar
Cross Ref
- Xiang Yu, Feng Zhou, and Manmohan Chandraker. 2016. Deep deformation network for object landmark localization. In Proceedings of the European Conference on Computer Vision (2016), 52--70.Google Scholar
Cross Ref
- Jie Zhang, Shiguang Shan, Meina Kan, and Xilin Chen. 2014. Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In Proceedings of the European Conference on Computer Vision. 1--16.Google Scholar
Cross Ref
- Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Sig. Proc. Lett. 23, 10 (2016), 1499--1503.Google Scholar
Cross Ref
- Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6848--6856.Google Scholar
Cross Ref
- Zhanpeng Zhang, Ping Luo, Change Loy Chen, and Xiaoou Tang. 2014. Facial landmark detection by deep multi-task learning. In Proceedings of the European Conference on Computer Vision. 94--108.Google Scholar
Cross Ref
- Shizhan Zhu, Cheng Li, Change Loy Chen, and Xiaoou Tang. 2015. Face alignment by coarse-to-fine shape searching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4998--5006.Google Scholar
- Xiangyu Zhu, Zhen Lei, Xiaoming Liu, Hailin Shi, and Stan Z. Li. 2016. Face alignment across large poses: A 3D solution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 146--155.Google Scholar
- Xiangyu Zhu, Xiaoming Liu, Zhen Lei, and Stan Z. Li. 2019. Face alignment in full pose range: A 3D total solution. IEEE Trans. Pattern Anal. Machine Intell. 41, 1 (2019), 78--92.Google Scholar
Digital Library
- Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8697--8710.Google Scholar
Index Terms
Efficient Face Alignment with Fast Normalization and Contour Fitting Loss
Recommendations
Frontal face synthesis based on multiple pose-variant images for face recognition
ICB'07: Proceedings of the 2007 international conference on Advances in BiometricsPose variance remains a challenging problem for face recognition. In this paper, a stereoscopic synthesis method for generating a frontal face image is proposed to improve the performance of automatic face recognition system. Through this method, a ...
Robust face alignment and tracking by combining local search and global fitting
When a face in an image is considerably occluded, existing local search and global fitting methods often cannot find the facial features due to failures in the local facial feature detectors or the fitting limitations of appearance modeling. To solve ...
Towards an Illumination-Based 3D Active Appearance Model for Fast Face Alignment
CIARP '08: Proceedings of the 13th Iberoamerican congress on Pattern Recognition: Progress in Pattern Recognition, Image Analysis and ApplicationsA novel 3D active appearance model invariant to illumination is presented. 3D-IAAM (Tridimensional Illumination-based Active Appearance Model) is capable of representing human faces with any identity, pose and illumination condition and it was tested ...






Comments