Abstract
Recently, great progress has been achieved on facial landmark detection based on convolutional neural network, while it is still challenging due to partial occlusion and extreme head pose. In this paper, we propose a Cascaded Structure-Learning Network (CSLN) with using adversarial training to improve the performance of 2D facial landmark detection by taking the structure of facial landmarks into account. In the first stage, we improve the original stacked hourglass network, which applies a multi-branch module to capture different scales of features, a progressive convolution structure to compensate for the missing structural features in hourglass networks, and a pyramid inception structure to expand the receptive field. Specially, by introducing a discriminator, we use the adversarial training strategy to urge the improved hourglass network for generating more accurate heatmaps. The second stage, which is based on attention mechanism, optimizes the spatial correlations between different facial landmarks by reusing the structural features. Moreover, we propose a novel region loss, which can adaptively allocate proper weights to different regions. In this way, the network can focus more on those occluded landmarks. The experimental results on several datasets, i.e. 300W, COFW, and AFLW, show that our proposed method achieves superior performance compared with the state-of-the-art methods.
- [1] . 2013. Localizing parts of faces using a consensus of exemplars. IEEE Trans. Pattern Anal. Mach. Intell. 35, 12 (2013), 2930–2940. Google Scholar
Digital Library
- [2] . 2017. BEGAN: Boundary equilibrium generative adversarial networks. CoRR abs/1703.10717 (2017).Google Scholar
- [3] . 2017. Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources. In ICCV. 3726–3734.Google Scholar
- [4] . 2013. Robust face landmark estimation under occlusion. In ICCV. 1513–1520. Google Scholar
Digital Library
- [5] . 2014. Face alignment by explicit shape regression. International Journal of Computer Vision 107, 2 (2014), 177–190. Google Scholar
Digital Library
- [6] . 2014. Joint cascade face detection and alignment. In ECCV. 109–122.Google Scholar
- [7] . 2018. Face alignment by combining residual features in cascaded hourglass network. In ICIP. 196–200.Google Scholar
- [8] . 2017. Adversarial PoseNet: A structure-aware convolutional network for human pose estimation. In ICCV. 1221–1230.Google Scholar
- [9] . 2018. Cascaded pyramid network for multi-person pose estimation. In CVPR. 7103–7112.Google Scholar
- [10] . 2018. Self adversarial training for human pose estimation. In APSIPA. 17–30.Google Scholar
- [11] . 2001. Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23, 6 (2001), 681–685. Google Scholar
Digital Library
- [12] . 1995. Active shape models-their training and application. Computer Vision and Image Understanding 61, 1 (1995), 38–59. Google Scholar
Digital Library
- [13] . 2017. Joint multi-view face alignment in the wild. CoRR abs/1708.06023 (2017).Google Scholar
- [14] . 2010. Cascaded pose regression. In CVPR. 1078–1085.Google Scholar
- [15] . 2018. Style aggregated network for facial landmark detection. In CVPR. 379–388.Google Scholar
- [16] . 2014. Occlusion coherence: Localizing occluded faces with a hierarchical deformable part model. In CVPR. 1899–1906. Google Scholar
Digital Library
- [17] . 2015. Occlusion coherence: Detecting and localizing occluded faces. CoRR abs/1506.08347 (2015).Google Scholar
- [18] . 2014. Generative adversarial nets. In NIPS. 2672–2680. Google Scholar
Digital Library
- [19] . 2018. DensePose: Dense human pose estimation in the wild. In CVPR. 7297–7306.Google Scholar
- [20] . 2015. Effective face frontalization in unconstrained images. In CVPR. 4295–4304.Google Scholar
- [21] . 2018. Improving landmark localization with semi-supervised learning. In CVPR. 1546–1555.Google Scholar
- [22] . 2015. Coarse-to-fine face alignment with multi-scale local patch regression. CoRR abs/1511.04901 (2015).Google Scholar
- [23] . 2013. Learning human pose estimation features with convolutional networks. CoRR abs/1312.7302 (2013).Google Scholar
- [24] . 2014. One millisecond face alignment with an ensemble of regression trees. In CVPR. 1867–1874. Google Scholar
Digital Library
- [25] . 2018. Multi-scale structure-aware network for human pose estimation. In ECCV. 731–746.Google Scholar
- [26] . 2016. 3D morphable face models and their applications. In AMDO. 185–206.Google Scholar
- [27] . 2011. Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In IEEE International Conference on Computer Vision Workshops, ICCV 2011 Workshops, Barcelona, Spain, November 6–13, 2011. 2144–2151.Google Scholar
Cross Ref
- [28] . 2018. Disentangling 3D pose in a dendritic CNN for unconstrained 2D face alignment. In CVPR. 430–439.Google Scholar
- [29] . 2012. Interactive facial feature localization. In Computer Vision - ECCV 2012-12th European Conference on Computer Vision, Florence, Italy, October 7–13, 2012, Proceedings, Part III. 679–692. Google Scholar
Digital Library
- [30] . 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.Google Scholar
Cross Ref
- [31] . 2019. Lightweight and effective facial landmark detection using adversarial learning with face geometric map generative network. IEEE Transactions on Circuits and Systems for Video Technology 30, 3 (2019), 771–780.Google Scholar
Digital Library
- [32] . 2017. Adaptive cascade regression model for robust face alignment. IEEE Trans. Image Processing 26, 2 (2017), 797–807.Google Scholar
Digital Library
- [33] . 2017. Dense face alignment. In 2017 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2017, Venice, Italy, October 22–29, 2017. 1619–1628.Google Scholar
Cross Ref
- [34] . 2017. A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In CVPR. 3691–3700.Google Scholar
- [35] . 2017. Image-based localization using hourglass networks. In 2017 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2017, Venice, Italy, October 22–29, 2017. 870–877.Google Scholar
Cross Ref
- [36] . 2018. Direct shape regression networks for end-to-end face alignment. In CVPR. 5040–5049.Google Scholar
- [37] . 2008. Locating facial features with an extended active shape model. In ECCV. 504–513. Google Scholar
Digital Library
- [38] . 2018. Unsupervised depth estimation, 3D face rotation and replacement. In Advances in NNeural IInformation PProcessing SSystems. 9736–9746. Google Scholar
Digital Library
- [39] . 2016. Stacked hourglass networks for human pose estimation. In ECCV. 483–499.Google Scholar
- [40] . 2017. Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE TTransactions on PPattern AAnalysis and MMachine IIntelligence 41, 1 (2017), 121–135.Google Scholar
Digital Library
- [41] . 2016. Face alignment via regressing local binary features. IEEE Trans. Image Processing 25, 3 (2016), 1233–1245.Google Scholar
Digital Library
- [42] . 2017. Adaptive 3D face reconstruction from unconstrained photo collections. IEEE Trans. Pattern Anal. Mach. Intell. 39, 11 (2017), 2127–2141.Google Scholar
Digital Library
- [43] . 2013. 300 Faces in-the-wwild cchallenge: The ffirst f facial llandmark llocalization cchallenge. In 2013 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2013, Sydney, Australia, December 1–8, 2013. 397–403. Google Scholar
Digital Library
- [44] . 2007. A nonlinear discriminative approach to AAM fitting. In ICCV. 1–8.Google Scholar
- [45] . 2014. Deep learning face representation by joint identification-verification. In NIPS. 1988–1996. Google Scholar
Digital Library
- [46] . 2013. Deep convolutional network cascade for facial point detection. In CVPR. 3476–3483. Google Scholar
Digital Library
- [47] . 2018. Quantized densely connected u-nets for efficient landmark localization. In ECCV. 348–364.Google Scholar
- [48] . 2016. Mnemonic descent method: A recurrent process applied for end-to-end face alignment. In CVPR. 4177–4187.Google Scholar
- [49] . 2013. Optimization problems for fast AAM fitting in-the-wwild. In ICCV. 593–600. Google Scholar
Digital Library
- [50] . 2018. A deeply-initialized coarse-to-fine ensemble of regression trees for face alignment. In ECCV. 609–624.Google Scholar
- [51] . 2018. Recurrent convolutional shape regression. IEEE TTransactions on PPattern AAnalysis and MMachine IIntelligence 40, 11 (2018), 2569–2582.Google Scholar
Digital Library
- [52] . 2016. Convolutional pose machines. In CVPR. 4724–4732.Google Scholar
- [53] . 2018. Look at boundary: A boundary-aware face alignment algorithm. In CVPR. 2129–2138.Google Scholar
- [54] . 2017. Leveraging intra and inter-dataset variations for robust face alignment. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2017, Honolulu, HI, USA, July 21–26, 2017. 2096–2105.Google Scholar
Cross Ref
- [55] . 2018. Facial landmark detection with tweaked convolutional neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 40, 12 (2018), 3067–3074.Google Scholar
Digital Library
- [56] . 2016. Robust facial landmark detection via recurrent attentive-refinement networks. In ECCV. 57–72.Google Scholar
- [57] . 2013. Supervised descent method and its applications to face alignment. In CVPR. 532–539. Google Scholar
Digital Library
- [58] . 2017. Stacked hourglass network for robust facial landmark localisation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2017, Honolulu, HI, USA, July 21–26, 2017. 2025–2033.Google Scholar
Cross Ref
- [59] . 2017. Neural aggregation network for video face recognition. In CVPR. 5216–5225.Google Scholar
- [60] . 2017. The MMenpo facial landmark localisation challenge: A step towards the solution. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2017, Honolulu, HI, USA, July 21–26, 2017. 2116–2125.Google Scholar
- [61] . 2014. Coarse-to-ffine auto-encoder networks (CFAN) for real-time face alignment. In ECCV. 1–16.Google Scholar
- [62] . 2016. Joint face detection and alignment using multi-task cascaded convolutional networks. CoRR abs/1604.02878 (2016).Google Scholar
- [63] . 2014. Facial landmark detection by deep multi-task learning. In ECCV. 94–108.Google Scholar
- [64] . 2019. Robust facial landmark detection via occlusion-adaptive deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3486–3496.Google Scholar
Cross Ref
- [65] . 2015. Face alignment by coarse-to-fine shape searching. In CVPR. 4998–5006.Google Scholar
- [66] . 2016. Unconstrained face alignment via cascaded compositional learning. In CVPR. 3409–3417.Google Scholar
- [67] . 2016. Face alignment across large poses: A 3D solution. In CVPR. 146–155.Google Scholar
- [68] . 2012. Face detection, pose estimation, and landmark localization in the wild. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, June 16–21, 2012. 2879–2886.Google Scholar
- [69] . 2019. Learning robust facial landmark detection via hierarchical structured ensemble. In Proceedings of the IEEE International Conference on Computer Vision. 141–150.Google Scholar
Cross Ref
Index Terms
Cascaded Structure-Learning Network with Using Adversarial Training for Robust Facial Landmark Detection
Recommendations
A hybrid adversarial training for deep learning model and denoising network resistant to adversarial examples
AbstractDeep neural networks (DNNs) are vulnerable to adversarial attacks that generate adversarial examples by adding small perturbations to the clean images. To combat adversarial attacks, the two main defense methods used are denoising and adversarial ...
Robust facial landmark extraction scheme using multiple convolutional neural networks
Facial landmarks are a set of features that can be distinguished on the human face with the naked eye. Typical facial landmarks include eyes, eyebrows, nose, and mouth. Landmarks play an important role in human-related image analysis. For example, they ...
Learning inter-class optical flow difference using generative adversarial networks for facial expression recognition
AbstractFacial expression recognition is a fine-grained task because different emotions have subtle facial movements. This paper proposes to learn inter-class optical flow difference using generative adversarial networks (GANs) for facial expression ...






Comments