skip to main content
research-article

Joint Head Attribute Classifier and Domain-Specific Refinement Networks for Face Alignment

Authors Info & Claims
Published:10 October 2018Publication History
Skip Abstract Section

Abstract

In this article, a two-stage refinement network is proposed for facial landmarks detection on unconstrained conditions. Our model can be divided into two modules, namely the Head Attribude Classifier (HAC) module and the Domain-Specific Refinement (DSR) module. Given an input facial image, HAC adopts multi-task learning mechanism to detect the head pose and obtain an initial shape. Based on the obtained head pose, DSR designs three different CNN-based refinement networks trained by specific domain, respectively, and automatically selects the most approximate network for the landmarks refinement. Different from existing two-stage models, HAC combines head pose prediction with facial landmarks estimation to improve the accuracy of head pose prediction, as well as obtaining a robust initial shape. Moreover, an adaptive sub-network training strategy applied in the DSR module can effectively solve the issue of traditional multi-view methods that an improperly selected sub-network may result in alignment failure. The extensive experimental results on two public datasets, AFLW and 300W, confirm the validity of our model.

References

  1. C. Fabian Benitez-Quiroz, Ramprakash Srinivasan, and Aleix M. Martinez. 2016. EmotioNet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In Computer Vision and Pattern Recognition. 5562--557.Google ScholarGoogle Scholar
  2. Xavier P. Burgosartizzu, Pietro Perona, and Piotr Dollar. 2014. Robust face landmark estimation under occlusion. In Proceedings of the IEEE International Conference on Computer Vision. 1513--1520. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Timothy F. Cootes, Christopher J. Taylor, David H. Cooper, and Jim Graham. 1995. Active shape models-their training and application. Computer Vision and Image Understanding 61, 1 (1995), 38--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. F. Cootes, G. J. Edwards, and C. J. Taylor. 2001. Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23, 6 (2001), 681--685. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. F. Cootes, K. Walker, and C. J. Taylor. 2002. View-based active appearance models. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition. 227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. David Cristinacce and Timothy F. Cootes. 2006. Feature detection and tracking with constrained local models. In Proceedings of the British Machine Vision Conference. 929--938.Google ScholarGoogle Scholar
  7. Jiankang Deng, George Trigeorgis, Yuxiang Zhou, and Stefanos Zafeiriou. 2017. Joint multi-view face alignment in the wild. arXiv preprint arXiv:1708.06023 (2017).Google ScholarGoogle Scholar
  8. Xuanyi Dong, Shoou-I Yu, Xinshuo Weng, Shih-En Wei, Yi Yang, and Yaser Sheikh. 2018. Supervision-by-registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).Google ScholarGoogle ScholarCross RefCross Ref
  9. Pengfei Dou, Shishir K. Shah, and Ioannis A. Kakadiaris. 2017. End-to-end 3D face reconstruction with deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 21--26.Google ScholarGoogle Scholar
  10. Zhen Hua Feng, Josef Kittler, William Christmas, Patrik Huber, and Xiao Jun Wu. 2017. Dynamic attention-controlled cascaded shape regression exploiting training data augmentation and fuzzy-set sample weighting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3681--3690.Google ScholarGoogle ScholarCross RefCross Ref
  11. Kota Hara and Rama Chellappa. 2014. Growing regression forests by classification: Applications to object pose estimation. In Proceedings of the European Conference on Computer Vision. 552--567.Google ScholarGoogle ScholarCross RefCross Ref
  12. Amin Jourabloo and Xiaoming Liu. 2015. Pose-invariant 3D face alignment. In Proceedings of the IEEE International Conference on Computer Vision. 3694--3702. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Amin Jourabloo and Xiaoming Liu. 2016. Large-pose face alignment via CNN-based dense 3D model fitting. In Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  14. Vahid Kazemi and Josephine Sullivan. 2014. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1867--1874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Josef Kittler, Patrik Huber, Zhen Hua Feng, Guosheng Hu, and William Christmas. 2016. 3D Morphable Face Models and Their Applications. Springer International Publishing.Google ScholarGoogle Scholar
  16. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the International Conference on Neural Information Processing Systems. 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Anders Krogh and John A. Hertz. 1991. A simple weight decay can improve generalization. In Proceedings of the International Conference on Neural Information Processing Systems. 950--957. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Martin K?stinger, Paul Wohlhart, Peter M. Roth, and Horst Bischof. 2012. Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 2144--2151.Google ScholarGoogle Scholar
  19. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.Google ScholarGoogle ScholarCross RefCross Ref
  20. Zhujin Liang, Shengyong Ding, and Liang Lin. 2015. Unconstrained facial landmark localization with backbone-branches fully-convolutional networks. arXiv preprint arXiv:1507.03409 (2015).Google ScholarGoogle Scholar
  21. Yutian Lin, Liang Zheng, Zhedong Zheng, Yu Wu, and Yi Yang. 2017. Improving person re-identification by attribute and identity learning. arXiv preprint arXiv:1703.07220 (2017).Google ScholarGoogle Scholar
  22. C. Liu and H. Wechsler. 2002. Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans, Image Process, 11, 4 (2002), 467. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Yaojie Liu, Amin Jourabloo, William Ren, and Xiaoming Liu. 2017. Dense face alignment. In Proceedings of the IEEE International Conference on Computer Vision Workshop. 1619--1628.Google ScholarGoogle ScholarCross RefCross Ref
  24. Jiangjing Lv, Xiaohu Shao, Junliang Xing, Cheng Cheng, and Xi Zhou. 2017. A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3691--3700.Google ScholarGoogle ScholarCross RefCross Ref
  25. S. Ren, X. Cao, Y. Wei, and J. Sun. 2016. Face alignment via regressing local binary features. IEEE Trans, Image Process, 25, 3 (2016), 1233--1245.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Christos Sagonas, Georgios Tzimiropoulos, Stefanos Zafeiriou, and Maja Pantic. 2013. 300 faces in-the-wild challenge: The first facial landmark localization challenge. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 397--403. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  28. Yi Sun, Xiaogang Wang, and Xiaoou Tang. 2013. Deep convolutional network cascade for facial point detection. In Computer Vision and Pattern Recognition. 3476--3483. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yi Sun, Yuheng Chen, Xiaogang Wang, and Xiaoou Tang. 2014. Deep learning face representation by joint identification-verification. In Advances in Neural Information Processing Systems. 1988--1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Georgios Tzimiropoulos. 2015. Project-out cascaded regression with an application to face alignment. In Computer Vision and Pattern Recognition. 3659--3667.Google ScholarGoogle Scholar
  31. Robert Walecki, Ognjen Rudovic, Vladimir Pavlovic, and Maja Pantic. 2016. Copula ordinal regression for joint estimation of facial action unit intensity. In Computer Vision and Pattern Recognition. 4902--4910.Google ScholarGoogle Scholar
  32. Yichen Wei. 2014. Face alignment by explicit shape regression. Int. J. Comput. Visi. 107, 2 (2014), 177--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. Weng, J. Lu, and Y. P. Tan. 2016. Robust point set matching for partial face recognition. IEEE Trans. Image Process. 25, 3 (2016), 1163--1176.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yue Wu and Qiang Ji. 2016. Constrained joint cascade regression framework for simultaneous facial action unit recognition and facial landmark detection. In Computer Vision and Pattern Recognition. 3400--3408.Google ScholarGoogle Scholar
  35. Shengtao Xiao, Jiashi Feng, Junliang Xing, Hanjiang Lai, Shuicheng Yan, and Ashraf Kassim. 2016. Robust facial landmark detection via recurrent attentive-refinement networks. In European Conference on Computer Vision. Springer, 57--72.Google ScholarGoogle ScholarCross RefCross Ref
  36. Xuehan Xiong and Fernando De La Torre. 2013. Supervised descent method and its applications to face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 532--539. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Xuehan Xiong and Fernando De La Torre. 2015. Global supervised descent method. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2664--2673.Google ScholarGoogle ScholarCross RefCross Ref
  38. Xiang Xu and Ioannis A. Kakadiaris. 2017. Joint head pose estimation and face alignment framework using global and local CNN features. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition. 642--649.Google ScholarGoogle Scholar
  39. Jing Yang, Qingshan Liu, and Kaihua Zhang. 2017. Stacked hourglass network for robust facial landmark localisation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2025--2033.Google ScholarGoogle ScholarCross RefCross Ref
  40. Jiaolong Yang, Peiran Ren, Dongqing Zhang, Dong Chen, Fang Wen, Hongdong Li, and Gang Hua. 2017. Neural aggregation network for video face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5216--5225.Google ScholarGoogle ScholarCross RefCross Ref
  41. Xi Yin and Xiaoming Liu. 2018. Multi-task convolutional neural network for pose-invariant face recognition. IEEE Transactions on Image Processing 27, 2 (2018), 964--975.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Xiang Yu, Feng Zhou, and Manmohan Chandraker. 2016. Deep deformation network for object landmark localization. In Proceedings of the European Conference on Computer Vision. 52--70.Google ScholarGoogle ScholarCross RefCross Ref
  43. Junfeng Zhang and Haifeng Hu. 2018. Exemplar-based cascaded stacked auto-encoder networks for robust face alignment. Computer Vision and Image Understanding.Google ScholarGoogle Scholar
  44. Jie Zhang, Shiguang Shan, Meina Kan, and Xilin Chen. 2014. Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In Proceedings of the European Conference on Computer Vision. 1--16.Google ScholarGoogle ScholarCross RefCross Ref
  45. Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Sign. Process. Lett. 23, 10 (2016), 1499--1503.Google ScholarGoogle ScholarCross RefCross Ref
  46. Zhanpeng Zhang, Ping Luo, Change Loy Chen, and Xiaoou Tang. 2014. Facial landmark detection by deep multi-task learning. In Proceedings of the European Conference on Computer Vision. 94--108.Google ScholarGoogle ScholarCross RefCross Ref
  47. Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. A discriminatively learned CNN embedding for person reidentification. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM’17) 14, 1 (2017), 13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Shizhan Zhu, Cheng Li, Change Loy Chen, and Xiaoou Tang. 2015. Face alignment by coarse-to-fine shape searching. In Computer Vision and Pattern Recognition. 4998--5006.Google ScholarGoogle Scholar
  49. Shizhan Zhu, Cheng Li, Change Loy Chen, and Xiaoou Tang. 2016. Unconstrained face alignment via cascaded compositional learning. In Computer Vision and Pattern Recognition. 3409--3417.Google ScholarGoogle Scholar
  50. Xiangyu Zhu, Zhen Lei, Xiaoming Liu, Hailin Shi, and Stan Z. Li. 2016. Face alignment across large poses: A 3D solution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 146--155.Google ScholarGoogle Scholar

Index Terms

  1. Joint Head Attribute Classifier and Domain-Specific Refinement Networks for Face Alignment

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 14, Issue 4
        Special Section on Deep Learning for Intelligent Multimedia Analytics
        November 2018
        221 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3282485
        Issue’s Table of Contents

        Copyright © 2018 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 October 2018
        • Accepted: 1 July 2018
        • Revised: 1 June 2018
        • Received: 1 March 2018
        Published in tomm Volume 14, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!