Abstract
Appearance variations result in many difficulties in face image analysis. To deal with this challenge, we present a Unified Tensor-based Active Appearance Model (UT-AAM) for jointly modelling the geometry and texture information of 2D faces. For each type of face information, namely shape and texture, we construct a unified tensor model capturing all relevant appearance variations. This contrasts with the variation-specific models of the classical tensor AAM. To achieve the unification across pose variations, a strategy for dealing with self-occluded faces is proposed to obtain consistent shape and texture representations of pose-varied faces. In addition, our UT-AAM is capable of constructing the model from an incomplete training dataset, using tensor completion methods. Last, we use an effective cascaded-regression-based method for UT-AAM fitting. With these advancements, the utility of UT-AAM in practice is considerably enhanced. As an example, we demonstrate the improvements in training facial landmark detectors through the use of UT-AAM to synthesise a large number of virtual samples. Experimental results obtained on a number of well-known face datasets demonstrate the merits of the proposed approach.
- Evrim Acar, Daniel M. Dunlavy, Tamara G. Kolda, and Morten Mørup. 2011. Scalable tensor factorizations for incomplete data. Chemo. Intell. Lab. Syst. 106, 1 (2011), 41--56.Google Scholar
Cross Ref
- Joan Alabort-i Medina and Stefanos Zafeiriou. 2017. A unified framework for compositional fitting of active appearance models. Int. J. Comput. Vis. 121, 1 (2017), 26--64.Google Scholar
Digital Library
- Epameinondas Antonakos, Joan Alabort-i Medina, Georgios Tzimiropoulos, and Stefanos P Zafeiriou. 2015. Feature-based Lucas--Kanade and active appearance models. IEEE Trans. Image Proc. 24, 9 (2015), 2617--2632.Google Scholar
Digital Library
- James Booth, Anastasios Roussos, Stefanos Zafeiriou, Allan Ponniah, and David Dunaway. 2016. A 3D morphable model learnt from 10,000 faces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 5543--5552.Google Scholar
Cross Ref
- Xudong Cao, Yichen Wei, Fang Wen, and Jian Sun. 2014. Face alignment by explicit shape regression. Int. J. Comput. Vis. 107, 2 (2014), 177--190.Google Scholar
Digital Library
- J. D. Carroll and J. J. Chang. 1970. Analysis of individual differences in multidimensional scaling via an N-way generalization of Eckart-Young decomposition. Psychometrika 35, 3 (1970), 283--319.Google Scholar
Cross Ref
- Sang-Il Choi, Sung-Sin Lee, Sang Tae Choia, and Won-Yong Shin. 2018. Face recognition using composite features based on discriminant analysis. IEEE Access 6 (2018), 13663--13670.Google Scholar
Cross Ref
- T. Cootes, G. Edwards, and C. Taylor. 1998. Active appearance models. In Proceedings of the European Conference on Computer Vision. Lecture Notes in Computer Science, Vol. 1407. Springer Berlin, 484--498.Google Scholar
- T. Cootes, G. Edwards, and C. Taylor. 2001. Active appearance models. IEEE Trans. Pattern Anal. Machine Intell. 23, 6 (2001), 681--685.Google Scholar
Digital Library
- T. F. Cootes, C. J. Taylor, D. H. Cooper, J. Graham et al. 1995. Active shape models—their training and application. Comput. Vis. Image Underst. 61, 1 (1995), 38--59.Google Scholar
Digital Library
- T. F. Cootes, K. Walker, and C. J. Taylor. 2000. View-based active appearance models. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition. 227--232.Google Scholar
- Tim F. Cootes, Mircea C. Ionita, Claudia Lindner, and Patrick Sauer. 2012. Robust and accurate shape model fitting using random forest regression voting. In Proceedings of the European Conference on Computer Vision. Springer, 278--291.Google Scholar
Digital Library
- David Cristinacce and Tim Cootes. 2006. Feature detection and tracking with constrained local models. In Proceedings of the British Machine Vision Conference, Vol. 3. 929--938.Google Scholar
Cross Ref
- L. De Lathauwer, B. De Moor, and J. Vandewalle. 2000. A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21, 4 (2000), 1253--1278.Google Scholar
Digital Library
- Piotr Dollár, Peter Welinder, and Pietro Perona. 2010. Cascaded pose regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1078--1085.Google Scholar
Cross Ref
- Xuanyi Dong, Yan Yan, Wang Ouyang, and Yi Yang. 2018. Style aggregated network for facial landmark detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 379--388.Google Scholar
Cross Ref
- Xuanyi Dong, Shoou-I Yu, Xinshuo Weng, Shih-En Wei, Yi Yang, and Yaser Sheikh. 2018. Supervision-by-registration: An unsupervised Aapproach to improve the precision of facial landmark detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 360--368.Google Scholar
Cross Ref
- R. Donner, M. Reiter, G. Langs, P. Peloschek, and H. Bischof. 2006. Fast active appearance model search using canonical correlation analysis. IEEE IEEE Trans. Pattern Anal. Machine Intell. 28, 10 (2006), 1690--1694.Google Scholar
Digital Library
- Chi Nhan Duong, Khoa Luu, Kha Gia Quach, and Tien D Bui. 2019. Deep appearance models: A deep Boltzmann machine approach for face modeling. Int. J. Comput. Vis. 127, 5 (2019), 437--455.Google Scholar
Digital Library
- Zhen-Hua Feng, Guosheng Hu, Josef Kittler, William Christmas, and Xiao-Jun Wu. 2015. Cascaded collaborative regression for robust facial landmark detection trained using a mixture of synthetic and real images with dynamic weighting. IEEE Trans. Image Proc. 24, 11 (2015), 3425--3440.Google Scholar
Digital Library
- Zhen-Hua Feng, Patrik Huber, Josef Kittler, William Christmas, and Xiao-Jun Wu. 2015. Random cascaded-regression copse for robust facial landmark detection. IEEE Sig. Proc. Lett. 1, 22 (2015), 76--80.Google Scholar
Cross Ref
- Zhen-Hua Feng, Josef Kittler, Muhammad Awais, Patrik Huber, and Xiao-Jun Wu. 2017. Face detection, bounding box aggregation and pose estimation for robust facial landmark localisation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.Google Scholar
Cross Ref
- Zhen-Hua Feng, Josef Kittler, Muhammad Awais, Patrik Huber, and Xiao-Jun Wu. 2018. Wing loss for robust facial landmark localisation with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2235--2245.Google Scholar
Cross Ref
- Zhen-Hua Feng, Josef Kittler, William Christmas, Patrik Huber, and Xiao-Jun Wu. 2017. Dynamic attention-controlled cascaded shape regression exploiting training data augmentation and fuzzy-set sample weighting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2481--2490.Google Scholar
Cross Ref
- Zhen-Hua Feng, Josef Kittler, and Xiao-Jun Wu. 2019. Mining hard augmented samples for robust facial landmark localization with CNNs. IEEE Sig. Proc. Lett. 26, 3 (2019), 450--454.Google Scholar
Cross Ref
- X. Geng, K. Smith-Miles, Z. H. Zhou, and L. Wang. 2011. Face image modeling by multilinear subspace analysis with missing values. IEEE Trans. Syst., Man, Cyber., Part B: Cyber. 41, 3 (2011), 881--892.Google Scholar
Digital Library
- J. Gonzalez-Mora, F. De la Torre, R. Murthi, N. Guil, and E. L. Zapata. 2007. Bilinear active appearance models. In Proceedings of the IEEE International Conference on Computer Vision. 1--8.Google Scholar
- R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker. 2010. Multi-PIE. Image Vis. Comput. 28, 5 (2010), 807--813.Google Scholar
Digital Library
- Yanan Guo, Dapeng Tao, Jun Cheng, Alan Dougherty, Yaotang Li, Kun Yue, and Bob Zhang. 2016. Tensor manifold discriminant projections for acceleration-based human activity recognition. IEEE Trans. Multim. 18, 10 (2016), 1977--1987.Google Scholar
Cross Ref
- Onur C. Hamsici and Aleix M. Martinez. 2009. Active appearance models with rotation invariant kernels. In Proceedings of the IEEE International Conference on Computer Vision. 1003--1009.Google Scholar
- Richard A. Harshman. 1970. Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multimodal factor analysis. UCLA Working Papers in Phonetics (1970), 1--84.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770--778.Google Scholar
Cross Ref
- Xiaofei He, Deng Cai, and Partha Niyogi. 2006. Tensor subspace analysis. Adv. Neural Inform. Proc. Syst. 18 (2006), 499.Google Scholar
- Patrik Huber, Zhen-Hua Feng, William Christmas, Josef Kittler, and Matthias Rätsch. 2015. Fitting 3D morphable face models using local features. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 1195--1199.Google Scholar
Digital Library
- Amin Jourabloo and Xiaoming Liu. 2016. Large-pose face alignment via CNN-based dense 3D model fitting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4188--4196.Google Scholar
Cross Ref
- P. Kittipanya-ngam and T. F. Cootes. 2006. The effect of texture representations on AAM performance. In Proceedings of the International Conference on Pattern Recognition, Vol. 2. IEEE, 328--331.Google Scholar
- Josef Kittler, Patrik Huber, Zhen-Hua Feng, Guosheng Hu, and William Christmas. 2016. 3D Morphable face models and their applications. In Proceedings of the International Conference on Articulated Motion and Deformable Objects. Springer, 185--206.Google Scholar
Cross Ref
- T. G. Kolda and B. W. Bader. 2009. Tensor decompositions and applications. SIAM Rev. 51, 3 (2009), 455--500.Google Scholar
Digital Library
- Paul Koppen, Zhen-Hua Feng, Josef Kittler, Muhammad Awais, William Christmas, Xiao-Jun Wu, and He-Feng Yin. 2018. Gaussian mixture 3D morphable face model. Pattern Recog. 74 (2018), 617--628.Google Scholar
Digital Library
- Martin Köstinger, Paul Wohlhart, Peter M. Roth, and Horst Bischof. 2011. Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In Proceedings of the IEEE International Conference on Computer Vision Workshops. IEEE, 2144--2151.Google Scholar
Cross Ref
- Hyung-Soo Lee and Daijin Kim. 2009. Tensor-based AAM with continuous variation estimation: Application to variation-robust face recognition. IEEE IEEE Trans. Pattern Anal. Machine Intell. 31, 6 (2009), 1102--1116.Google Scholar
Digital Library
- Dahua Lin, Yingqing Xu, Xiaoou Tang, and Shuicheng Yan. 2005. Tensor-based factor decomposition for relighting. In Proceedings of the IEEE International Conference on Image Processing, Vol. 2. IEEE, II--386.Google Scholar
- Q. Liu, J. Deng, and D. Tao. 2015. Dual sparse constrained cascade regression for robust face alignment. IEEE Trans. Image Proc. 25, 2 (2015), 700--712.Google Scholar
Digital Library
- Xiaoming Liu. 2009. Discriminative face alignment. IEEE Trans. Pattern Anal. Machine Intell. 31, 11 (2009), 1941--1954.Google Scholar
Digital Library
- Simon Lucey, Yang Wang, Mark Cox, Sridha Sridharan, and Jeffery F. Cohn. 2009. Efficient constrained local model fitting for non-rigid face alignment. Image Vis. Comput. 27, 12 (2009), 1804--1813.Google Scholar
Digital Library
- Jiangjing Lv, Xiaohu Shao, Junliang Xing, Cheng Cheng, and Xi Zhou. 2017. A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 3317--3326.Google Scholar
Cross Ref
- Iain Matthews and Simon Baker. 2004. Active appearance models revisited. Int. J. Comput. Vis. 60, 2 (2004), 135--164.Google Scholar
Digital Library
- L. Matthews, Takahiro Ishikawa, and Simon Baker. 2004. The template update problem. IEEE Trans. Pattern Anal. Machine Intell. 26, 6 (2004), 810--815.Google Scholar
Digital Library
- Chi Nhan Duong, Khoa Luu, Kha Gia Quach, and Tien D. Bui. 2015. Beyond principal components: Deep Boltzmann machines for face modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4786--4794.Google Scholar
- Xiushan Nie, Yilong Yin, Jiande Sun, Ju Liu, and Chaoran Cui. 2017. Comprehensive feature-based robust video fingerprinting using tensor model. IEEE Trans. Multim. 19, 4 (2017), 785--796.Google Scholar
Digital Library
- Chao Qi, Min Li, Qiushi Wang, Huiquan Zhang, Jinling Xing, Zhifan Gao, and Huailing Zhang. 2018. Facial expressions recognition based on cognition and mapped binary patterns. IEEE Access 6 (2018), 18795--18803.Google Scholar
Cross Ref
- Na Qi, Yunhui Shi, Xiaoyan Sun, and Baocai Yin. 2016. TenSR: Multi-dimensional tensor sparse representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5916--5925.Google Scholar
Cross Ref
- Michael J. Reale, Shaun Canavan, Lijun Yin, Kaoning Hu, and Terry Hung. 2011. A multi-gesture interaction system using a 3-D iris disk model for gaze estimation and an active appearance model for 3-D hand pointing. IEEE Trans. Multim. 13, 3 (2011), 474--486.Google Scholar
Digital Library
- Shaoqing Ren, Xudong Cao, Yichen Wei, and Jian Sun. 2014. Face alignment at 3000 fps via regressing local binary features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1685--1692.Google Scholar
Digital Library
- S. Romdhani, S. Gong, A. Psarrou et al. 1999. A multi-view nonlinear active shape model using kernel PCA. In Proceedings of the British Machine Vision Conference, Vol. 99. 483--492.Google Scholar
- Sami Romdhani, Alexandra Psarrou, and Shaogang Gong. 2000. On utilising template and feature-based correspondence in multi-view appearance models. In Proceedings of the European Conference on Computer Vision. Springer, 799--813.Google Scholar
Cross Ref
- Christos Sagonas, Epameinondas Antonakos, Georgios Tzimiropoulos, Stefanos Zafeiriou, and Maja Pantic. 2016. 300 faces in-the-wild challenge: Database and results. Image Vis. Comput. 47 (2016), 3--18.Google Scholar
Digital Library
- Jason Saragih and Roland Goecke. 2007. A nonlinear discriminative approach to AAM fitting. In Proceedings of the International Conference on Computer Vision. 1--8.Google Scholar
Cross Ref
- P. Sauer, T. Cootes, and C. Taylor. 2011. Accurate regression procedures for active appearance models. In Proceedings of the British Machine Vision Conference. 1--11.Google Scholar
- S. Sclaroff and J. Isidoro. 1998. Active blobs. In Proceedings of the International Conference on Computer Vision. 1146--1153.Google Scholar
- K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. Retrieved from: arXiv abs/1409.1556 (2014).Google Scholar
- Xiaoning Song, Zhen-Hua Feng, Guosheng Hu, Josef Kittler, and Xiao-Jun Wu. 2018. Dictionary integration using 3D morphable face models for pose-invariant collaborative-representation-based classification. IEEE Trans. Inform. Forens. Secur. 13, 11 (2018), 2734--2745.Google Scholar
Cross Ref
- Mikkel B. Stegmann and Rasmus Larsen. 2003. Multi-band modelling of appearance. Image Vis. Comput. 21, 1 (2003), 61--67.Google Scholar
Cross Ref
- J. Sung, T. Kanade, and D. Kim. 2007. A unified gradient-based approach for combining ASM into AAM. Int. J. Comput. Vis. 75, 2 (2007), 297--309.Google Scholar
Digital Library
- George Trigeorgis, Patrick Snape, Mihalis A. Nicolaou, Epameinondas Antonakos, and Stefanos Zafeiriou. 2016. Mnemonic descent method: A recurrent process applied for end-to-end face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4177--4187.Google Scholar
Cross Ref
- L. R. Tucker. 1966. Some mathematical notes on three-mode factor analysis. Psychometrika 31, 3 (1966), 279--311.Google Scholar
Cross Ref
- Georgios Tzimiropoulos, Joan Alabort-i Medina, Stefanos P. Zafeiriou, and Maja Pantic. 2014. Active orientation models for face alignment in-the-wild. IEEE Trans. Inform. Forens. Secur. 9, 12 (2014), 2024--2034.Google Scholar
Digital Library
- Georgios Tzimiropoulos and Maja Pantic. 2013. Optimization problems for fast AAM fitting in-the-wild. In Proceedings of the IEEE International Conference on Computer Vision. 593--600.Google Scholar
Digital Library
- M. Alex O. Vasilescu and Demetri Terzopoulos. 2002. Multilinear analysis of image ensembles: Tensorfaces. In Proceedings of the European Conference on Computer Vision. 447--460.Google Scholar
- M. Alex O. Vasilescu and Demetri Terzopoulos. 2007. Multilinear projection for appearance-based recognition in the tensor framework. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--8.Google Scholar
- Daniel Vlasic, Matthew Brand, Hanspeter Pfister, and Jovan Popović. 2005. Face transfer with multilinear models. ACM Transactions on Graphics (TOG) 24, 3 (2005), 426--433.Google Scholar
Digital Library
- Yang Wang, Simon Lucey, and Jeffrey F. Cohn. 2008. Enforcing convexity for improved alignment with constrained local models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--8.Google Scholar
- Renliang Weng, Jiwen Lu, Yap-Peng Tan, and Jie Zhou. 2016. Learning cascaded deep auto-encoder networks for face alignment. IEEE Trans. Multim. 18, 10 (2016), 2066--2078.Google Scholar
Cross Ref
- Xinrong Wu, Junwei Zhou, and Yiyun Pan. 2017. Initial shape pool construction for facial landmark localization under occlusion. IEEE Access 5 (2017), 16649--16655.Google Scholar
Cross Ref
- Yuhang Wu, Shishir K. Shah, and Ioannis A. Kakadiaris. 2018. GoDP: Globally optimized dual pathway deep network architecture for facial landmark localization in-the-wild. Image Vis. Comput. 73 (2018), 1--16.Google Scholar
Digital Library
- Fan Xin, Wang Hao, Luo Zhongxuan, Li Yuntao, Hu Wenyu, and Luo Daiyun. 2015. Fiducial facial point extraction using a novel projective invariant. IEEE Trans. Image Proc. 24, 3 (2015), 1164--1177.Google Scholar
Digital Library
- Fan Xin, Risheng Liu, Zhongxuan Luo, Yuntao Li, and Yuyao Feng. 2018. Explicit shape regression with characteristic number for facial landmark localization. IEEE Trans. Multim. 20, 3 (2018), 567--579.Google Scholar
Digital Library
- Xuehan Xiong and Fernando De la Torre. 2013. Supervised descent method and its applications to face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 532--539.Google Scholar
Digital Library
- Junjie Yan, Zhen Lei, Dong Yi, and Stan Li. 2013. Learn to combine multiple hypotheses for accurate face alignment. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW’13). 392--396.Google Scholar
Digital Library
- Junfeng Zhang and Haifeng Hu. 2018. Joint head attribute classifier and domain-specific refinement networks for face alignment. ACM Trans. Multim. Comput. Commun. Appl. 14, 4, Article 79 (Oct. 2018), 19 pages.Google Scholar
Digital Library
- Jie Zhang, Meina Kan, Shiguang Shan, and Xilin Chen. 2016. Occlusion-free face alignment: Deep regression networks coupled with de-corrupt autoencoders. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3428--3437.Google Scholar
Cross Ref
- Jie Zhang, Shiguang Shan, Meina Kan, and Xilin Chen. 2014. Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In Proceedings of the European Conference on Computer Vision. 1--16.Google Scholar
Cross Ref
- Shizhan Zhu, Cheng Li, Chen-Change Loy, and Xiaoou Tang. 2016. Unconstrained face alignment via cascaded compositional learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 3409--3417.Google Scholar
Cross Ref
- Xiangyu Zhu, Zhen Lei, Xiaoming Liu, Hailin Shi, and Stan Z. Li. 2016. Face alignment across large poses: A 3D solution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 146--155.Google Scholar
Index Terms
A Unified Tensor-based Active Appearance Model
Recommendations
Tensor-Based AAM with Continuous Variation Estimation: Application to Variation-Robust Face Recognition
The Active appearance model (AAM) is a well-known model that can represent a non-rigid object effectively. However, the fitting result is often unsatisfactory when an input image deviates from the training images due to its fixed shape and appearance ...
A comparative study of facial appearance modeling methods for active appearance models
Active appearance models (AAMs) have been widely used in many face modeling and facial feature extraction methods. One of the problems of AAMs is that it is difficult to model a sufficiently wide range of human facial appearances, the pattern of ...
Emotion recognition using facial expressions with active appearance models
HCI '08: Proceedings of the Third IASTED International Conference on Human Computer InteractionRecognizing emotion using facial expressions is a key element in human communication. In this paper we discuss a framework for the classification of emotional states, based on still images of the face. The technique we present involves the creation of ...






Comments