skip to main content
research-article

Efficient Face Alignment with Fast Normalization and Contour Fitting Loss

Authors Info & Claims
Published:01 November 2019Publication History
Skip Abstract Section

Abstract

Face alignment is a key component of numerous face analysis tasks. In recent years, most existing methods have focused on designing high-performance face alignment systems and paid less attention to efficiency. However more face alignment systems are now applied on low-cost devices, such as mobile phones. In this article, we design a common efficient framework that can team with any face alignment regression network and improve the overall performance with nearly no extra computational cost. First, we discover that the maximum regression error exists in the face contour, where landmarks do not have distinct semantic positions, and thus are randomly labeled along the face contours in training data. To address this problem, we propose a novel contour fitting loss that dynamically adjusts the regression target during training so the network can learn more accurate semantic meanings of the contour landmarks and achieve better localization performance. Second, we decouple the complex sample variations in face alignment task and propose a Fast Normalization Module (FNM) to efficiently normalize considerable variations that can be described by geometric transformation. Finally, a new lightweight network architecture named Lightweight Alignment Module (LAM) is also proposed to achieve fast and precise face alignment on mobile devices. Our method achieves competitive performance with state-of-the-arts on 300W and AFLW2000-3D benchmarks. Meanwhile, the speed of our framework is significantly faster than other CNN-based approaches.

References

  1. Brian Amberg, Sami Romdhani, and Thomas Vetter. 2007. Optimal step nonrigid ICP algorithms for surface registration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  2. P. N. Belhumeur, D. W. Jacobs, D. J. Kriegman, and N. Kumar. 2011. Localizing parts of faces using a consensus of exemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 545--552.Google ScholarGoogle Scholar
  3. Paul J. Besl and Neil D. McKay. 1992. Method for registration of 3-D shapes. In Sensor Fusion IV: Control Paradigms and Data Structures, Vol. 1611. International Society for Optics and Photonics, 586--607.Google ScholarGoogle Scholar
  4. Chandrasekhar Bhagavatula, Chenchen Zhu, Khoa Luu, and Marios Savvides. 2017. Faster than real-time facial alignment: A 3D spatial transformer network approach in unconstrained poses. In Proceedings of the IEEE International Conference on Computer Vision. 3980--3989.Google ScholarGoogle ScholarCross RefCross Ref
  5. Xavier P. Burgos-Artizzu, Pietro Perona, and Piotr Dollár. 2013. Robust face landmark estimation under occlusion. In Proceedings of the IEEE International Conference on Computer Vision. 1513--1520.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Xudong Cao, Yichen Wei, Fang Wen, and Jian Sun. 2014. Face alignment by explicit shape regression. Int. J. Comput. Vis. 107, 2 (2014), 177--190.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. F. Cootes, G. J. Edwards, and C. J. Taylor. 1998. Active appearance models. In European Conference on Computer Vision. 484--498.Google ScholarGoogle Scholar
  8. Timothy F. Cootes, Christopher J. Taylor, David H. Cooper, and Jim Graham. 1995. Active shape models-their training and application. Computer Vision and Image Understanding 61, 1 (1995), 38--59.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1. arXiv preprint arXiv:1602.02830 (2016).Google ScholarGoogle Scholar
  10. Jiankang Deng, George Trigeorgis, Yuxiang Zhou, and Stefanos Zafeiriou. 2019. Joint multi-view face alignment in the wild. IEEE Transactions on Image Processing 28, 7 (2019), 3636--3648.Google ScholarGoogle ScholarCross RefCross Ref
  11. P. Dollár, P. Welinder, and P. Perona. 2010. Cascaded pose regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1078--1085.Google ScholarGoogle Scholar
  12. Xuanyi Dong, Shoou-I Yu, Xinshuo Weng, Shih-En Wei, Yi Yang, and Yaser Sheikh. 2018. Supervision-by-registration: An unsupervised approach to improve the precision of facial landmark detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 360--368.Google ScholarGoogle ScholarCross RefCross Ref
  13. Yao Feng, Fan Wu, Xiaohu Shao, Yanfeng Wang, and Xi Zhou. 2018. Joint 3D face reconstruction and dense alignment with position map regression network. In Proceedings of the European Conference on Computer Vision (ECCV’18). 534--551.Google ScholarGoogle ScholarCross RefCross Ref
  14. Zhen Hua Feng, Josef Kittler, Muhammad Awais, Patrik Huber, and Xiao Jun Wu. 2018. Wing loss for robust facial landmark localisation with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2235--2245.Google ScholarGoogle ScholarCross RefCross Ref
  15. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  16. Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google ScholarGoogle Scholar
  17. Gao Huang, Shichen Liu, Van Der Maaten Laurens, and Kilian Q. Weinberger. 2018. CondenseNet: An efficient densenet using learned group convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2752--2761.Google ScholarGoogle Scholar
  18. Amin Jourabloo, Mao Ye, Xiaoming Liu, and Liu Ren. 2017. Pose-invariant face alignment with a single CNN. In Proceedings of the IEEE International Conference on Computer Vision. 3219--3228.Google ScholarGoogle ScholarCross RefCross Ref
  19. Vahid Kazemi and Josephine Sullivan. 2014. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1867--1874.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Martin Koestinger, Paul Wohlhart, Peter M. Roth, and Horst Bischof. 2011. Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCV Workshops’11). IEEE, 2144--2151.Google ScholarGoogle ScholarCross RefCross Ref
  21. Amit Kumar and Rama Chellappa. 2018. Disentangling 3D pose in a dendritic CNN for unconstrained 2D face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 430--439.Google ScholarGoogle ScholarCross RefCross Ref
  22. Vuong Le, Jonathan Brandt, Zhe Lin, Lubomir Bourdev, and Thomas S. Huang. 2012. Interactive facial feature localization. In Proceedings of the European Conference on Computer Vision. 679--692.Google ScholarGoogle Scholar
  23. Yaojie Liu, Amin Jourabloo, William Ren, and Xiaoming Liu. 2017. Dense face alignment. In Proceedings of the IEEE International Conference on Computer Vision Workshop. 1619--1628.Google ScholarGoogle ScholarCross RefCross Ref
  24. David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2 (2004), 91--110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jiangjing Lv, Xiaohu Shao, Junliang Xing, Cheng Cheng, and Xi Zhou. 2017. A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3691--3700.Google ScholarGoogle ScholarCross RefCross Ref
  26. K. Messer, J. Matas, J. Kittler, and K. Jonsson. 2000. XM2VTS: the extended M2VTS database. In Proceedings of the 2nd International Conference on Audio- and Video-Based Biometric Person Authentication. 72--77.Google ScholarGoogle Scholar
  27. Deva Ramanan. 2012. Face detection, pose estimation, and landmark localization in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2879--2886.Google ScholarGoogle Scholar
  28. Rajeev Ranjan, Vishal M. Patel, and Rama Chellappa. 2019. Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Anal. Machine Intell. 41, 1 (2019), 121--135.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. In European Conference on Computer Vision. Springer, 525--542.Google ScholarGoogle ScholarCross RefCross Ref
  30. Shaoqing Ren, Xudong Cao, Yichen Wei, and Jian Sun. 2016. Face alignment via regressing local binary features. IEEE Trans. Image Proc. 25, 3 (2016), 1233--1245.Google ScholarGoogle ScholarCross RefCross Ref
  31. Sami Romdhani and Thomas Vetter. 2005. Estimating 3D shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 2. IEEE, 986--993.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Christos Sagonas, Georgios Tzimiropoulos, Stefanos Zafeiriou, and Maja Pantic. 2013. 300 Faces in-the-wild challenge: The first facial landmark localization challenge. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 397--403.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Christos Sagonas, Georgios Tzimiropoulos, Stefanos Zafeiriou, and Maja Pantic. 2013. A semi-automatic methodology for facial landmark annotation. In Proceedings of the IEEE Computer Vision and Pattern Recognition Workshops. 896--903.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  35. Yi Sun, Xiaogang Wang, and Xiaoou Tang. 2013. Deep convolutional network cascade for facial point detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3476--3483.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Roberto Valle and M. José. 2018. A deeply initialized coarse-to-fine ensemble of regression trees for face alignment. In Proceedings of the European Conference on Computer Vision (ECCV’18). 585--601.Google ScholarGoogle Scholar
  37. Wayne Wu, Chen Qian, Shuo Yang, Quan Wang, Yici Cai, and Qiang Zhou. 2018. Look at boundary: A boundary-aware face alignment algorithm. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2129--2138.Google ScholarGoogle ScholarCross RefCross Ref
  38. Shengtao Xiao, Jiashi Feng, Junliang Xing, Hanjiang Lai, Shuicheng Yan, and Ashraf Kassim. 2016. Robust facial landmark detection via recurrent attentive-refinement networks. In Proceedings of the European Conference on Computer Vision. 57--72.Google ScholarGoogle ScholarCross RefCross Ref
  39. Xuehan Xiong and Fernando De La Torre. 2013. Supervised descent method and its applications to face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 532--539.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Jing Yang, Qingshan Liu, and Kaihua Zhang. 2017. Stacked hourglass network for robust facial landmark localisation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2025--2033.Google ScholarGoogle ScholarCross RefCross Ref
  41. Xiang Yu, Feng Zhou, and Manmohan Chandraker. 2016. Deep deformation network for object landmark localization. In Proceedings of the European Conference on Computer Vision (2016), 52--70.Google ScholarGoogle ScholarCross RefCross Ref
  42. Jie Zhang, Shiguang Shan, Meina Kan, and Xilin Chen. 2014. Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In Proceedings of the European Conference on Computer Vision. 1--16.Google ScholarGoogle ScholarCross RefCross Ref
  43. Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Sig. Proc. Lett. 23, 10 (2016), 1499--1503.Google ScholarGoogle ScholarCross RefCross Ref
  44. Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6848--6856.Google ScholarGoogle ScholarCross RefCross Ref
  45. Zhanpeng Zhang, Ping Luo, Change Loy Chen, and Xiaoou Tang. 2014. Facial landmark detection by deep multi-task learning. In Proceedings of the European Conference on Computer Vision. 94--108.Google ScholarGoogle ScholarCross RefCross Ref
  46. Shizhan Zhu, Cheng Li, Change Loy Chen, and Xiaoou Tang. 2015. Face alignment by coarse-to-fine shape searching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4998--5006.Google ScholarGoogle Scholar
  47. Xiangyu Zhu, Zhen Lei, Xiaoming Liu, Hailin Shi, and Stan Z. Li. 2016. Face alignment across large poses: A 3D solution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 146--155.Google ScholarGoogle Scholar
  48. Xiangyu Zhu, Xiaoming Liu, Zhen Lei, and Stan Z. Li. 2019. Face alignment in full pose range: A 3D total solution. IEEE Trans. Pattern Anal. Machine Intell. 41, 1 (2019), 78--92.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8697--8710.Google ScholarGoogle Scholar

Index Terms

  1. Efficient Face Alignment with Fast Normalization and Contour Fitting Loss

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 15, Issue 3s
        Special Issue on Face Analysis for Applications and Special Issue on Affective Computing for Large-Scale Heterogeneous Multimedia Data
        November 2019
        304 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3368027
        Issue’s Table of Contents

        Copyright © 2019 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 November 2019
        • Accepted: 1 May 2019
        • Revised: 1 March 2019
        • Received: 1 October 2018
        Published in tomm Volume 15, Issue 3s

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!