Abstract
In this article, a two-stage refinement network is proposed for facial landmarks detection on unconstrained conditions. Our model can be divided into two modules, namely the Head Attribude Classifier (HAC) module and the Domain-Specific Refinement (DSR) module. Given an input facial image, HAC adopts multi-task learning mechanism to detect the head pose and obtain an initial shape. Based on the obtained head pose, DSR designs three different CNN-based refinement networks trained by specific domain, respectively, and automatically selects the most approximate network for the landmarks refinement. Different from existing two-stage models, HAC combines head pose prediction with facial landmarks estimation to improve the accuracy of head pose prediction, as well as obtaining a robust initial shape. Moreover, an adaptive sub-network training strategy applied in the DSR module can effectively solve the issue of traditional multi-view methods that an improperly selected sub-network may result in alignment failure. The extensive experimental results on two public datasets, AFLW and 300W, confirm the validity of our model.
- C. Fabian Benitez-Quiroz, Ramprakash Srinivasan, and Aleix M. Martinez. 2016. EmotioNet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In Computer Vision and Pattern Recognition. 5562--557.Google Scholar
- Xavier P. Burgosartizzu, Pietro Perona, and Piotr Dollar. 2014. Robust face landmark estimation under occlusion. In Proceedings of the IEEE International Conference on Computer Vision. 1513--1520. Google Scholar
Digital Library
- Timothy F. Cootes, Christopher J. Taylor, David H. Cooper, and Jim Graham. 1995. Active shape models-their training and application. Computer Vision and Image Understanding 61, 1 (1995), 38--59. Google Scholar
Digital Library
- T. F. Cootes, G. J. Edwards, and C. J. Taylor. 2001. Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23, 6 (2001), 681--685. Google Scholar
Digital Library
- T. F. Cootes, K. Walker, and C. J. Taylor. 2002. View-based active appearance models. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition. 227. Google Scholar
Digital Library
- David Cristinacce and Timothy F. Cootes. 2006. Feature detection and tracking with constrained local models. In Proceedings of the British Machine Vision Conference. 929--938.Google Scholar
- Jiankang Deng, George Trigeorgis, Yuxiang Zhou, and Stefanos Zafeiriou. 2017. Joint multi-view face alignment in the wild. arXiv preprint arXiv:1708.06023 (2017).Google Scholar
- Xuanyi Dong, Shoou-I Yu, Xinshuo Weng, Shih-En Wei, Yi Yang, and Yaser Sheikh. 2018. Supervision-by-registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).Google Scholar
Cross Ref
- Pengfei Dou, Shishir K. Shah, and Ioannis A. Kakadiaris. 2017. End-to-end 3D face reconstruction with deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 21--26.Google Scholar
- Zhen Hua Feng, Josef Kittler, William Christmas, Patrik Huber, and Xiao Jun Wu. 2017. Dynamic attention-controlled cascaded shape regression exploiting training data augmentation and fuzzy-set sample weighting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3681--3690.Google Scholar
Cross Ref
- Kota Hara and Rama Chellappa. 2014. Growing regression forests by classification: Applications to object pose estimation. In Proceedings of the European Conference on Computer Vision. 552--567.Google Scholar
Cross Ref
- Amin Jourabloo and Xiaoming Liu. 2015. Pose-invariant 3D face alignment. In Proceedings of the IEEE International Conference on Computer Vision. 3694--3702. Google Scholar
Digital Library
- Amin Jourabloo and Xiaoming Liu. 2016. Large-pose face alignment via CNN-based dense 3D model fitting. In Computer Vision and Pattern Recognition.Google Scholar
- Vahid Kazemi and Josephine Sullivan. 2014. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1867--1874. Google Scholar
Digital Library
- Josef Kittler, Patrik Huber, Zhen Hua Feng, Guosheng Hu, and William Christmas. 2016. 3D Morphable Face Models and Their Applications. Springer International Publishing.Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the International Conference on Neural Information Processing Systems. 1097--1105. Google Scholar
Digital Library
- Anders Krogh and John A. Hertz. 1991. A simple weight decay can improve generalization. In Proceedings of the International Conference on Neural Information Processing Systems. 950--957. Google Scholar
Digital Library
- Martin K?stinger, Paul Wohlhart, Peter M. Roth, and Horst Bischof. 2012. Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 2144--2151.Google Scholar
- Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.Google Scholar
Cross Ref
- Zhujin Liang, Shengyong Ding, and Liang Lin. 2015. Unconstrained facial landmark localization with backbone-branches fully-convolutional networks. arXiv preprint arXiv:1507.03409 (2015).Google Scholar
- Yutian Lin, Liang Zheng, Zhedong Zheng, Yu Wu, and Yi Yang. 2017. Improving person re-identification by attribute and identity learning. arXiv preprint arXiv:1703.07220 (2017).Google Scholar
- C. Liu and H. Wechsler. 2002. Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans, Image Process, 11, 4 (2002), 467. Google Scholar
Digital Library
- Yaojie Liu, Amin Jourabloo, William Ren, and Xiaoming Liu. 2017. Dense face alignment. In Proceedings of the IEEE International Conference on Computer Vision Workshop. 1619--1628.Google Scholar
Cross Ref
- Jiangjing Lv, Xiaohu Shao, Junliang Xing, Cheng Cheng, and Xi Zhou. 2017. A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3691--3700.Google Scholar
Cross Ref
- S. Ren, X. Cao, Y. Wei, and J. Sun. 2016. Face alignment via regressing local binary features. IEEE Trans, Image Process, 25, 3 (2016), 1233--1245.Google Scholar
Digital Library
- Christos Sagonas, Georgios Tzimiropoulos, Stefanos Zafeiriou, and Maja Pantic. 2013. 300 faces in-the-wild challenge: The first facial landmark localization challenge. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 397--403. Google Scholar
Digital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- Yi Sun, Xiaogang Wang, and Xiaoou Tang. 2013. Deep convolutional network cascade for facial point detection. In Computer Vision and Pattern Recognition. 3476--3483. Google Scholar
Digital Library
- Yi Sun, Yuheng Chen, Xiaogang Wang, and Xiaoou Tang. 2014. Deep learning face representation by joint identification-verification. In Advances in Neural Information Processing Systems. 1988--1996. Google Scholar
Digital Library
- Georgios Tzimiropoulos. 2015. Project-out cascaded regression with an application to face alignment. In Computer Vision and Pattern Recognition. 3659--3667.Google Scholar
- Robert Walecki, Ognjen Rudovic, Vladimir Pavlovic, and Maja Pantic. 2016. Copula ordinal regression for joint estimation of facial action unit intensity. In Computer Vision and Pattern Recognition. 4902--4910.Google Scholar
- Yichen Wei. 2014. Face alignment by explicit shape regression. Int. J. Comput. Visi. 107, 2 (2014), 177--190. Google Scholar
Digital Library
- R. Weng, J. Lu, and Y. P. Tan. 2016. Robust point set matching for partial face recognition. IEEE Trans. Image Process. 25, 3 (2016), 1163--1176.Google Scholar
Digital Library
- Yue Wu and Qiang Ji. 2016. Constrained joint cascade regression framework for simultaneous facial action unit recognition and facial landmark detection. In Computer Vision and Pattern Recognition. 3400--3408.Google Scholar
- Shengtao Xiao, Jiashi Feng, Junliang Xing, Hanjiang Lai, Shuicheng Yan, and Ashraf Kassim. 2016. Robust facial landmark detection via recurrent attentive-refinement networks. In European Conference on Computer Vision. Springer, 57--72.Google Scholar
Cross Ref
- Xuehan Xiong and Fernando De La Torre. 2013. Supervised descent method and its applications to face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 532--539. Google Scholar
Digital Library
- Xuehan Xiong and Fernando De La Torre. 2015. Global supervised descent method. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2664--2673.Google Scholar
Cross Ref
- Xiang Xu and Ioannis A. Kakadiaris. 2017. Joint head pose estimation and face alignment framework using global and local CNN features. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition. 642--649.Google Scholar
- Jing Yang, Qingshan Liu, and Kaihua Zhang. 2017. Stacked hourglass network for robust facial landmark localisation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2025--2033.Google Scholar
Cross Ref
- Jiaolong Yang, Peiran Ren, Dongqing Zhang, Dong Chen, Fang Wen, Hongdong Li, and Gang Hua. 2017. Neural aggregation network for video face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5216--5225.Google Scholar
Cross Ref
- Xi Yin and Xiaoming Liu. 2018. Multi-task convolutional neural network for pose-invariant face recognition. IEEE Transactions on Image Processing 27, 2 (2018), 964--975.Google Scholar
Digital Library
- Xiang Yu, Feng Zhou, and Manmohan Chandraker. 2016. Deep deformation network for object landmark localization. In Proceedings of the European Conference on Computer Vision. 52--70.Google Scholar
Cross Ref
- Junfeng Zhang and Haifeng Hu. 2018. Exemplar-based cascaded stacked auto-encoder networks for robust face alignment. Computer Vision and Image Understanding.Google Scholar
- Jie Zhang, Shiguang Shan, Meina Kan, and Xilin Chen. 2014. Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In Proceedings of the European Conference on Computer Vision. 1--16.Google Scholar
Cross Ref
- Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Sign. Process. Lett. 23, 10 (2016), 1499--1503.Google Scholar
Cross Ref
- Zhanpeng Zhang, Ping Luo, Change Loy Chen, and Xiaoou Tang. 2014. Facial landmark detection by deep multi-task learning. In Proceedings of the European Conference on Computer Vision. 94--108.Google Scholar
Cross Ref
- Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. A discriminatively learned CNN embedding for person reidentification. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM’17) 14, 1 (2017), 13. Google Scholar
Digital Library
- Shizhan Zhu, Cheng Li, Change Loy Chen, and Xiaoou Tang. 2015. Face alignment by coarse-to-fine shape searching. In Computer Vision and Pattern Recognition. 4998--5006.Google Scholar
- Shizhan Zhu, Cheng Li, Change Loy Chen, and Xiaoou Tang. 2016. Unconstrained face alignment via cascaded compositional learning. In Computer Vision and Pattern Recognition. 3409--3417.Google Scholar
- Xiangyu Zhu, Zhen Lei, Xiaoming Liu, Hailin Shi, and Stan Z. Li. 2016. Face alignment across large poses: A 3D solution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 146--155.Google Scholar
Index Terms
Joint Head Attribute Classifier and Domain-Specific Refinement Networks for Face Alignment
Recommendations
Frontal face synthesis based on multiple pose-variant images for face recognition
ICB'07: Proceedings of the 2007 international conference on Advances in BiometricsPose variance remains a challenging problem for face recognition. In this paper, a stereoscopic synthesis method for generating a frontal face image is proposed to improve the performance of automatic face recognition system. Through this method, a ...
Annotated face model-based alignment: a robust landmark-free pose estimation approach for 3D model registration
Registering a 3D facial model onto a 2D image is important for constructing pixel-wise correspondences between different facial images. The registration is based on a 3 $$\times $$ 4 dimensional projection matrix, which is obtained from pose estimation. ...
JÂA-Net: Joint Facial Action Unit Detection and Face Alignment Via Adaptive Attention
AbstractFacial action unit (AU) detection and face alignment are two highly correlated tasks, since facial landmarks can provide precise AU locations to facilitate the extraction of meaningful local features for AU detection. However, most existing AU ...






Comments