skip to main content
research-article

Cascaded Structure-Learning Network with Using Adversarial Training for Robust Facial Landmark Detection

Authors Info & Claims
Published:16 February 2022Publication History
Skip Abstract Section

Abstract

Recently, great progress has been achieved on facial landmark detection based on convolutional neural network, while it is still challenging due to partial occlusion and extreme head pose. In this paper, we propose a Cascaded Structure-Learning Network (CSLN) with using adversarial training to improve the performance of 2D facial landmark detection by taking the structure of facial landmarks into account. In the first stage, we improve the original stacked hourglass network, which applies a multi-branch module to capture different scales of features, a progressive convolution structure to compensate for the missing structural features in hourglass networks, and a pyramid inception structure to expand the receptive field. Specially, by introducing a discriminator, we use the adversarial training strategy to urge the improved hourglass network for generating more accurate heatmaps. The second stage, which is based on attention mechanism, optimizes the spatial correlations between different facial landmarks by reusing the structural features. Moreover, we propose a novel region loss, which can adaptively allocate proper weights to different regions. In this way, the network can focus more on those occluded landmarks. The experimental results on several datasets, i.e. 300W, COFW, and AFLW, show that our proposed method achieves superior performance compared with the state-of-the-art methods.

REFERENCES

  1. [1] Belhumeur Peter N., Jacobs David W., Kriegman David J., and Kumar Neeraj. 2013. Localizing parts of faces using a consensus of exemplars. IEEE Trans. Pattern Anal. Mach. Intell. 35, 12 (2013), 29302940. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Berthelot David, Schumm Tom, and Metz Luke. 2017. BEGAN: Boundary equilibrium generative adversarial networks. CoRR abs/1703.10717 (2017).Google ScholarGoogle Scholar
  3. [3] Bulat Adrian and Tzimiropoulos Georgios. 2017. Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources. In ICCV. 37263734.Google ScholarGoogle Scholar
  4. [4] Burgos-Artizzu Xavier P., Perona Pietro, and Dollár Piotr. 2013. Robust face landmark estimation under occlusion. In ICCV. 15131520. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Cao Xudong, Wei Yichen, Wen Fang, and Sun Jian. 2014. Face alignment by explicit shape regression. International Journal of Computer Vision 107, 2 (2014), 177190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Chen Dong, Ren Shaoqing, Wei Yichen, Cao Xudong, and Sun Jian. 2014. Joint cascade face detection and alignment. In ECCV. 109122.Google ScholarGoogle Scholar
  7. [7] Chen Weiliang, Zhou Qiang, and Hu Roland. 2018. Face alignment by combining residual features in cascaded hourglass network. In ICIP. 196200.Google ScholarGoogle Scholar
  8. [8] Chen Yu, Shen Chunhua, Wei Xiu-Shen, Liu Lingqiao, and Yang Jian. 2017. Adversarial PoseNet: A structure-aware convolutional network for human pose estimation. In ICCV. 12211230.Google ScholarGoogle Scholar
  9. [9] Chen Yilun, Wang Zhicheng, Peng Yuxiang, Zhang Zhiqiang, Yu Gang, and Sun Jian. 2018. Cascaded pyramid network for multi-person pose estimation. In CVPR. 71037112.Google ScholarGoogle Scholar
  10. [10] Chou Chia-Jung, Chien Jui-Ting, and Chen Hwann-Tzong. 2018. Self adversarial training for human pose estimation. In APSIPA. 1730.Google ScholarGoogle Scholar
  11. [11] Cootes Timothy F., Edwards Gareth J., and Taylor Christopher J.. 2001. Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23, 6 (2001), 681685. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Cootes Timothy F., Taylor Christopher J., Cooper David H., and Graham Jim. 1995. Active shape models-their training and application. Computer Vision and Image Understanding 61, 1 (1995), 3859. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Deng Jiankang, Trigeorgis George, Zhou Yuxiang, and Zafeiriou Stefanos. 2017. Joint multi-view face alignment in the wild. CoRR abs/1708.06023 (2017).Google ScholarGoogle Scholar
  14. [14] Dollár Piotr, Welinder Peter, and Perona Pietro. 2010. Cascaded pose regression. In CVPR. 10781085.Google ScholarGoogle Scholar
  15. [15] Dong Xuanyi, Yan Yan, Ouyang Wanli, and Yang Yi. 2018. Style aggregated network for facial landmark detection. In CVPR. 379388.Google ScholarGoogle Scholar
  16. [16] Ghiasi Golnaz and Fowlkes Charless C.. 2014. Occlusion coherence: Localizing occluded faces with a hierarchical deformable part model. In CVPR. 18991906. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Ghiasi Golnaz and Fowlkes Charless C.. 2015. Occlusion coherence: Detecting and localizing occluded faces. CoRR abs/1506.08347 (2015).Google ScholarGoogle Scholar
  18. [18] Goodfellow Ian J., Pouget-Abadie Jean, Mirza Mehdi, Xu Bing, Warde-Farley David, Ozair Sherjil, Courville Aaron C., and Bengio Yoshua. 2014. Generative adversarial nets. In NIPS. 26722680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Güler Riza Alp, Neverova Natalia, and Kokkinos Iasonas. 2018. DensePose: Dense human pose estimation in the wild. In CVPR. 72977306.Google ScholarGoogle Scholar
  20. [20] Hassner Tal, Harel Shai, Paz Eran, and Enbar Roee. 2015. Effective face frontalization in unconstrained images. In CVPR. 42954304.Google ScholarGoogle Scholar
  21. [21] Honari Sina, Molchanov Pavlo, Tyree Stephen, Vincent Pascal, Pal Christopher J., and Kautz Jan. 2018. Improving landmark localization with semi-supervised learning. In CVPR. 15461555.Google ScholarGoogle Scholar
  22. [22] Huang Zhiao, Zhou Erjin, and Cao Zhimin. 2015. Coarse-to-fine face alignment with multi-scale local patch regression. CoRR abs/1511.04901 (2015).Google ScholarGoogle Scholar
  23. [23] Jain Arjun, Tompson Jonathan, Andriluka Mykhaylo, Taylor Graham W., and Bregler Christoph. 2013. Learning human pose estimation features with convolutional networks. CoRR abs/1312.7302 (2013).Google ScholarGoogle Scholar
  24. [24] Kazemi Vahid and Sullivan Josephine. 2014. One millisecond face alignment with an ensemble of regression trees. In CVPR. 18671874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Ke Lipeng, Chang Ming-Ching, Qi Honggang, and Lyu Siwei. 2018. Multi-scale structure-aware network for human pose estimation. In ECCV. 731746.Google ScholarGoogle Scholar
  26. [26] Kittler Josef, Huber Patrik, Feng Zhen-Hua, Hu Guosheng, and Christmas William J.. 2016. 3D morphable face models and their applications. In AMDO. 185206.Google ScholarGoogle Scholar
  27. [27] Köstinger Martin, Wohlhart Paul, Roth Peter M., and Bischof Horst. 2011. Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In IEEE International Conference on Computer Vision Workshops, ICCV 2011 Workshops, Barcelona, Spain, November 6–13, 2011. 21442151.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Kumar Amit and Chellappa Rama. 2018. Disentangling 3D pose in a dendritic CNN for unconstrained 2D face alignment. In CVPR. 430439.Google ScholarGoogle Scholar
  29. [29] Le Vuong, Brandt Jonathan, Lin Zhe, Bourdev Lubomir D., and Huang Thomas S.. 2012. Interactive facial feature localization. In Computer Vision - ECCV 2012-12th European Conference on Computer Vision, Florence, Italy, October 7–13, 2012, Proceedings, Part III. 679692. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] LeCun Yann, Bottou Léon, Bengio Yoshua, and Haffner Patrick et al. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 22782324.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Lee Hong Joo, Kim Seong Tae, Lee Hakmin, and Ro Yong Man. 2019. Lightweight and effective facial landmark detection using adversarial learning with face geometric map generative network. IEEE Transactions on Circuits and Systems for Video Technology 30, 3 (2019), 771780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Liu Qingshan, Deng Jiankang, Yang Jing, Liu Guangcan, and Tao Dacheng. 2017. Adaptive cascade regression model for robust face alignment. IEEE Trans. Image Processing 26, 2 (2017), 797807.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Liu Yaojie, Jourabloo Amin, Ren William, and Liu Xiaoming. 2017. Dense face alignment. In 2017 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2017, Venice, Italy, October 22–29, 2017. 16191628.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Lv Jiang-Jing, Shao Xiaohu, Xing Junliang, Cheng Cheng, and Zhou Xi. 2017. A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In CVPR. 36913700.Google ScholarGoogle Scholar
  35. [35] Melekhov Iaroslav, Ylioinas Juha, Kannala Juho, and Rahtu Esa. 2017. Image-based localization using hourglass networks. In 2017 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2017, Venice, Italy, October 22–29, 2017. 870877.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Miao Xin, Zhen Xiantong, Liu Xianglong, Deng Cheng, Athitsos Vassilis, and Huang Heng. 2018. Direct shape regression networks for end-to-end face alignment. In CVPR. 50405049.Google ScholarGoogle Scholar
  37. [37] Milborrow Stephen and Nicolls Fred. 2008. Locating facial features with an extended active shape model. In ECCV. 504513. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Moniz Joel Ruben Antony, Beckham Christopher, Rajotte Simon, Honari Sina, and Pal Chris. 2018. Unsupervised depth estimation, 3D face rotation and replacement. In Advances in NNeural IInformation PProcessing SSystems. 97369746. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Newell Alejandro, Yang Kaiyu, and Deng Jia. 2016. Stacked hourglass networks for human pose estimation. In ECCV. 483499.Google ScholarGoogle Scholar
  40. [40] Ranjan Rajeev, Patel Vishal M., and Chellappa Rama. 2017. Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE TTransactions on PPattern AAnalysis and MMachine IIntelligence 41, 1 (2017), 121135.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Ren Shaoqing, Cao Xudong, Wei Yichen, and Sun Jian. 2016. Face alignment via regressing local binary features. IEEE Trans. Image Processing 25, 3 (2016), 12331245.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Roth Joseph, Tong Yiying, and Liu Xiaoming. 2017. Adaptive 3D face reconstruction from unconstrained photo collections. IEEE Trans. Pattern Anal. Mach. Intell. 39, 11 (2017), 21272141.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Sagonas Christos, Tzimiropoulos Georgios, Zafeiriou Stefanos, and Pantic Maja. 2013. 300 Faces in-the-wwild cchallenge: The ffirst f facial llandmark llocalization cchallenge. In 2013 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2013, Sydney, Australia, December 1–8, 2013. 397403. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Saragih Jason M. and Göcke Roland. 2007. A nonlinear discriminative approach to AAM fitting. In ICCV. 18.Google ScholarGoogle Scholar
  45. [45] Sun Yi, Chen Yuheng, Wang Xiaogang, and Tang Xiaoou. 2014. Deep learning face representation by joint identification-verification. In NIPS. 19881996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Sun Yi, Wang Xiaogang, and Tang Xiaoou. 2013. Deep convolutional network cascade for facial point detection. In CVPR. 34763483. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Tang Zhiqiang, Peng Xi, Geng Shijie, Wu Lingfei, Zhang Shaoting, and Metaxas Dimitris N.. 2018. Quantized densely connected u-nets for efficient landmark localization. In ECCV. 348364.Google ScholarGoogle Scholar
  48. [48] Trigeorgis George, Snape Patrick, Nicolaou Mihalis A., Antonakos Epameinondas, and Zafeiriou Stefanos. 2016. Mnemonic descent method: A recurrent process applied for end-to-end face alignment. In CVPR. 41774187.Google ScholarGoogle Scholar
  49. [49] Tzimiropoulos Georgios and Pantic Maja. 2013. Optimization problems for fast AAM fitting in-the-wwild. In ICCV. 593600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Valle Roberto, Buenaposada José Miguel, Valdés Antonio, and Baumela Luis. 2018. A deeply-initialized coarse-to-fine ensemble of regression trees for face alignment. In ECCV. 609624.Google ScholarGoogle Scholar
  51. [51] Wang Wei, Tulyakov Sergey, and Sebe Nicu. 2018. Recurrent convolutional shape regression. IEEE TTransactions on PPattern AAnalysis and MMachine IIntelligence 40, 11 (2018), 25692582.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Wei Shih-En, Ramakrishna Varun, Kanade Takeo, and Sheikh Yaser. 2016. Convolutional pose machines. In CVPR. 47244732.Google ScholarGoogle Scholar
  53. [53] Wu Wayne, Qian Chen, Yang Shuo, Wang Quan, Cai Yici, and Zhou Qiang. 2018. Look at boundary: A boundary-aware face alignment algorithm. In CVPR. 21292138.Google ScholarGoogle Scholar
  54. [54] Wu Wenyan and Yang Shuo. 2017. Leveraging intra and inter-dataset variations for robust face alignment. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2017, Honolulu, HI, USA, July 21–26, 2017. 20962105.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Wu Yue, Hassner Tal, Kim KangGeon, Medioni Gérard G., and Natarajan Prem. 2018. Facial landmark detection with tweaked convolutional neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 40, 12 (2018), 30673074.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Xiao Shengtao, Feng Jiashi, Xing Junliang, Lai Hanjiang, Yan Shuicheng, and Kassim Ashraf A.. 2016. Robust facial landmark detection via recurrent attentive-refinement networks. In ECCV. 5772.Google ScholarGoogle Scholar
  57. [57] Xiong Xuehan and Torre Fernando De la. 2013. Supervised descent method and its applications to face alignment. In CVPR. 532539. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Yang Jing, Liu Qingshan, and Zhang Kaihua. 2017. Stacked hourglass network for robust facial landmark localisation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2017, Honolulu, HI, USA, July 21–26, 2017. 20252033.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Yang Jiaolong, Ren Peiran, Zhang Dongqing, Chen Dong, Wen Fang, Li Hongdong, and Hua Gang. 2017. Neural aggregation network for video face recognition. In CVPR. 52165225.Google ScholarGoogle Scholar
  60. [60] Zafeiriou Stefanos, Trigeorgis George, Chrysos Grigorios, Deng Jiankang, and Shen Jie. 2017. The MMenpo facial landmark localisation challenge: A step towards the solution. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2017, Honolulu, HI, USA, July 21–26, 2017. 21162125.Google ScholarGoogle Scholar
  61. [61] Zhang Jie, Shan Shiguang, Kan Meina, and Chen Xilin. 2014. Coarse-to-ffine auto-encoder networks (CFAN) for real-time face alignment. In ECCV. 116.Google ScholarGoogle Scholar
  62. [62] Zhang Kaipeng, Zhang Zhanpeng, Li Zhifeng, and Qiao Yu. 2016. Joint face detection and alignment using multi-task cascaded convolutional networks. CoRR abs/1604.02878 (2016).Google ScholarGoogle Scholar
  63. [63] Zhang Zhanpeng, Luo Ping, Loy Chen Change, and Tang Xiaoou. 2014. Facial landmark detection by deep multi-task learning. In ECCV. 94108.Google ScholarGoogle Scholar
  64. [64] Zhu Meilu, Shi Daming, Zheng Mingjie, and Sadiq Muhammad. 2019. Robust facial landmark detection via occlusion-adaptive deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 34863496.Google ScholarGoogle ScholarCross RefCross Ref
  65. [65] Zhu Shizhan, Li Cheng, Loy Chen Change, and Tang Xiaoou. 2015. Face alignment by coarse-to-fine shape searching. In CVPR. 49985006.Google ScholarGoogle Scholar
  66. [66] Zhu Shizhan, Li Cheng, Loy Chen Change, and Tang Xiaoou. 2016. Unconstrained face alignment via cascaded compositional learning. In CVPR. 34093417.Google ScholarGoogle Scholar
  67. [67] Zhu Xiangyu, Lei Zhen, Liu Xiaoming, Shi Hailin, and Li Stan Z.. 2016. Face alignment across large poses: A 3D solution. In CVPR. 146155.Google ScholarGoogle Scholar
  68. [68] Zhu Xiangxin and Ramanan Deva. 2012. Face detection, pose estimation, and landmark localization in the wild. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, June 16–21, 2012. 28792886.Google ScholarGoogle Scholar
  69. [69] Zou Xu, Zhong Sheng, Yan Luxin, Zhao Xiangyun, Zhou Jiahuan, and Wu Ying. 2019. Learning robust facial landmark detection via hierarchical structured ensemble. In Proceedings of the IEEE International Conference on Computer Vision. 141150.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Cascaded Structure-Learning Network with Using Adversarial Training for Robust Facial Landmark Detection

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 2
        May 2022
        494 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3505207
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 16 February 2022
        • Revised: 1 July 2021
        • Accepted: 1 July 2021
        • Received: 1 July 2020
        Published in tomm Volume 18, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)106
        • Downloads (Last 6 weeks)10

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!