skip to main content
research-article

Joint Stacked Hourglass Network and Salient Region Attention Refinement for Robust Face Alignment

Authors Info & Claims
Published:17 February 2020Publication History
Skip Abstract Section

Abstract

Facial landmark detection aims to locate keypoints for facial images, which typically suffer from variations caused by arbitrary pose, diverse facial expressions, and partial occlusion. In this article, we propose a coarse-to-fine framework that joins a stacked hourglass network and salient region attention refinement for robust face alignment. To achieve this goal, we first present a multi-scale region learning module to analyze the structure information at a different facial region and extract a strong discriminative deep feature. Then we employ a stacked hourglass network for heatmap regression and initial facial landmarks prediction. Specifically, the stacked hourglass network introduces an improved Inception-ResNet unit as a basic building block, which can effectively improve the receptive field and learn contextual feature representations. Meanwhile, a novel loss function takes into account global weights and local weights to make the heatmap regression more accurate. Different from existing heatmap regression models, we present a salient region attention refinement module to extract a precise feature based on the heatmap regression, and utilize the filtered feature for landmarks refinement to achieve accurate prediction. Extensive experimental results of several challenging datasets (including 300 Faces in the Wild, Caltech Occluded Faces in the Wild, and Annotated Facial Landmarks Faces in the Wild) confirm that our approach can achieve more competitive performance than the most advanced algorithms.

References

  1. Ankan Bansal, Carlos Castillo, Rajeev Ranjan, and Rama Chellappa. 2017. The do’s and don’ts for CNN-based face verification. arXiv:1705.07426.Google ScholarGoogle Scholar
  2. C. Fabian Benitez-Quiroz, Ramprakash Srinivasan, and Aleix M. Martinez. 2016. EmotioNet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In Proceedings of the 2016 Conference on Computer Vision and Pattern Recognition. 5562--5570.Google ScholarGoogle Scholar
  3. Adrian Bulat and Georgios Tzimiropoulos. 2016. Convolutional aggregation of local evidence for large pose face alignment. In Proceedings of the 2016 British Machine Vision Conference.Google ScholarGoogle ScholarCross RefCross Ref
  4. Adrian Bulat and Georgios Tzimiropoulos. 2016. Human pose estimation via convolutional part heatmap regression. In Proceedings of the 2016 European Conference on Computer Vision. 717--732.Google ScholarGoogle ScholarCross RefCross Ref
  5. Adrian Bulat and Georgios Tzimiropoulos. 2017. Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources. In Proceedings of the 2017 International Conference on Computer Vision.Google ScholarGoogle ScholarCross RefCross Ref
  6. Xavier P. Burgosartizzu, Pietro Perona, and Piotr Dollar. 2014. Robust face landmark estimation under occlusion. In Proceedings of the 2014 IEEE International Conference on Computer Vision. 1513--1520.Google ScholarGoogle Scholar
  7. Xiao Chu, Wei Yang, Wanli Ouyang, Cheng Ma, Alan L. Yuille, and Xiaogang Wang. 2017. Multi-context attention for human pose estimation. arXiv:1702.07432.Google ScholarGoogle Scholar
  8. T. F. Cootes, G. J. Edwards, and C. J. Taylor. 2001. Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 6 (2001), 681--685.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Timothy F. Cootes, Christopher J. Taylor, David H. Cooper, and Jim Graham. 1995. Active shape models—Their training and application. Computer Vision and Image Understanding 61, 1 (1995), 38--59.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Piotr Dollár, Peter Welinder, and Pietro Perona. 2010. Cascaded pose regression. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 1078--1085.Google ScholarGoogle ScholarCross RefCross Ref
  11. Pengfei Dou, Shishir K. Shah, and Ioannis A. Kakadiaris. 2017. End-to-end 3D face reconstruction with deep neural networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 21--26.Google ScholarGoogle Scholar
  12. Zhen Hua Feng, Josef Kittler, William Christmas, Patrik Huber, and Xiao Jun Wu. 2017. Dynamic attention-controlled cascaded shape regression exploiting training data augmentation and fuzzy-set sample weighting. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 3681--3690.Google ScholarGoogle ScholarCross RefCross Ref
  13. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  14. Amin Jourabloo and Xiaoming Liu. 2015. Pose-invariant 3D face alignment. In Proceedings of the 2015 IEEE International Conference on Computer Vision. 3694--3702.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Vahid Kazemi and Josephine Sullivan. 2014. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. 1867--1874.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278--2324.Google ScholarGoogle ScholarCross RefCross Ref
  17. Q. Liu, J. Deng, J. Yang, G. Liu, and D. Tao. 2017. Adaptive cascade regression model for robust face alignment. IEEE Transactions on Image Processing 26, 2 (Feb. 2017), 797--807. DOI:https://doi.org/10.1109/TIP.2016.2633939Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yaojie Liu, Amin Jourabloo, William Ren, and Xiaoming Liu. 2017. Dense face alignment. In Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops. 1619--1628.Google ScholarGoogle ScholarCross RefCross Ref
  19. Jiangjing Lv, Xiaohu Shao, Junliang Xing, Cheng Cheng, and Xi Zhou. 2017. A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 3691--3700.Google ScholarGoogle ScholarCross RefCross Ref
  20. Martin Koestinger, Paul Wohlhart, Peter M. Roth, and Horst Bischof. 2011. Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In Proceedings of the 2011 1st IEEE International Workshop on Benchmarking Facial Image Analysis Technologies.Google ScholarGoogle ScholarCross RefCross Ref
  21. Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In Proceedings of the European Conference on Computer Vision. 483--499.Google ScholarGoogle ScholarCross RefCross Ref
  22. S. Ren, X. Cao, Y. Wei, and J. Sun. 2016. Face alignment via regressing local binary features. IEEE Transactions on Image Processing 25, 3 (2016), 1233--1245.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Christos Sagonas, Georgios Tzimiropoulos, Stefanos Zafeiriou, and Maja Pantic. 2013. 300 Faces in-the-Wild Challenge: The first facial landmark localization challenge. In Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops. 397--403.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.Google ScholarGoogle Scholar
  25. Yi Sun, Xiaogang Wang, and Xiaoou Tang. 2013. Deep convolutional network cascade for facial point detection. In Proceedings of the 2013 Conference on Computer Vision and Pattern Recognition. 3476--3483.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Zhiqiang Tang, Xi Peng, Shijie Geng, Lingfei Wu, Shaoting Zhang, and Dimitris Metaxas. 2018. Quantized densely connected U-Nets for efficient landmark localization. In Proceedings of the 2018 European Conference on Computer Vision.Google ScholarGoogle ScholarCross RefCross Ref
  27. Jonathan J. Tompson, Arjun Jain, Yann LeCun, and Christoph Bregler. 2014. Joint training of a convolutional network and a graphical model for human pose estimation. In Advances in Neural Information Processing Systems. 1799--1807.Google ScholarGoogle Scholar
  28. George Trigeorgis, Patrick Snape, Mihalis A. Nicolaou, Epameinondas Antonakos, and Stefanos Zafeiriou. 2016. Mnemonic descent method: A recurrent process applied for end-to-end face alignment. In Proceedings of the 2016 Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  29. Georgios Tzimiropoulos. 2015. Project-out cascaded regression with an application to face alignment. In Proceedings of the 2015 Conference on Computer Vision and Pattern Recognition. 3659--3667.Google ScholarGoogle ScholarCross RefCross Ref
  30. Georgios Tzimiropoulos and Maja Pantic. 2013. Optimization problems for fast AAM fitting in-the-wild. In Proceedings of the 2013 IEEE International Conference on Computer Vision. 593--600.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Roberto Valle and M. José. 2018. A deeply-initialized coarse-to-fine ensemble of regression trees for face alignment. In Proceedings of the 2018 European Conference on Computer Vision.585--601.Google ScholarGoogle Scholar
  32. Yichen Wei. 2014. Face alignment by explicit shape regression. International Journal of Computer Vision 107, 2 (2014), 177--190.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Wayne Wu, Chen Qian, Shuo Yang, Quan Wang, Yici Cai, and Qiang Zhou. 2018. Look at boundary: A boundary-aware face alignment algorithm. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. 2129--2138.Google ScholarGoogle ScholarCross RefCross Ref
  34. Shengtao Xiao, Jiashi Feng, Junliang Xing, Hanjiang Lai, Shuicheng Yan, and Ashraf Kassim. 2016. Robust facial landmark detection via recurrent attentive-refinement networks. In Proceedings of the 2016 European Conference on Computer Vision. 57--72.Google ScholarGoogle ScholarCross RefCross Ref
  35. Xuehan Xiong and Fernando De La Torre. 2013. Supervised descent method and its applications to face alignment. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. 532--539.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Jing Yang, Qingshan Liu, and Kaihua Zhang. 2017. Stacked hourglass network for robust facial landmark localisation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2025--2033.Google ScholarGoogle ScholarCross RefCross Ref
  37. Xiang Yu, Feng Zhou, and Manmohan Chandraker. 2016. Deep deformation network for object landmark localization. In Proceedings of the 2016 European Conference on Computer Vision. 52--70.Google ScholarGoogle ScholarCross RefCross Ref
  38. Jie Zhang, Shiguang Shan, Meina Kan, and Xilin Chen. 2014. Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In Proceedings of the 2014 European Conference on Computer Vision. 1--16.Google ScholarGoogle ScholarCross RefCross Ref
  39. Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters 23, 10 (2016), 1499--1503.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Shizhan Zhu, Cheng Li, Change Loy Chen, and Xiaoou Tang. 2015. Face alignment by coarse-to-fine shape searching. In Proceedings of the 2015 Conference on Computer Vision and Pattern Recognition. 4998--5006.Google ScholarGoogle Scholar
  41. Shizhan Zhu, Cheng Li, Change Loy Chen, and Xiaoou Tang. 2016. Unconstrained face alignment via cascaded compositional learning. In Proceedings of the 2016 Conference on Computer Vision and Pattern Recognition. 3409--3417.Google ScholarGoogle ScholarCross RefCross Ref
  42. Xiangyu Zhu, Zhen Lei, Xiaoming Liu, Hailin Shi, and Stan Z. Li. 2016. Face alignment across large poses: A 3D solution. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 146--155.Google ScholarGoogle Scholar

Index Terms

  1. Joint Stacked Hourglass Network and Salient Region Attention Refinement for Robust Face Alignment

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!