Abstract
To achieve robust facial attribute estimation, a hierarchical prediction system referred to as tensor correlation fusion network (TCFN) is proposed for attribute estimation. The system includes feature extraction, correlation excavation among facial attribute features, score fusion, and multi-attribute prediction. Subnetworks (Age-Net, Gender-Net, Race-Net, and Smile-Net) are used to extract corresponding features while Main-Net extracts features not only from an input image but also from corresponding pooling layers of subnetworks. Dynamic tensor canonical correlation analysis (DTCCA) is proposed to explore the correlation of different targets’ features in the F7 layers. Then, for binary classifications of gender, race, and smile, corresponding robust decisions are achieved by fusing the results of subnetworks with those of TCFN while for age prediction, facial image into one of age groups, and then ELM regressor performs the final age estimation. Experimental results on benchmarks with multiple face attributes (MORPH-II, Adience Benchmark datasets, LAP-2016, and CelebA) show that the proposed approach has superior performance compared to state of the art.
- Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. Arxiv Preprint Arxiv:1603.04467 (2016).Google Scholar
- T. Ahonen, A. Hadid, and M. Pietikainen. 2006. Face description with local binary patterns: Application to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 12 (Dec. 2006), 2037--2041. DOI:https://doi.org/10.1109/TPAMI.2006.244Google Scholar
Digital Library
- Stefano Berretti, Alberto Del Bimbo, and Pietro Pala. 2006. Description and retrieval of 3D face models using iso-geodesic stripes. In Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval (MIR’06). ACM, New York, NY, 13--22. DOI:https://doi.org/10.1145/1178677.1178683Google Scholar
Digital Library
- Stefano Berretti, Alberto Del Bimbo, and Pietro Pala. 2011. Partial match of 3D faces using facial curves between SIFT keypoints. In Proceedings of the 4th Eurographics Conference on 3D Object Retrieval (3DOR’11). Eurographics Association, 117--120. DOI:https://doi.org/10.2312/3DOR/3DOR11/117-120Google Scholar
Digital Library
- Garrison W. Cottrell and JanetMetcalfe. 1990. EMPATH: Face, emotion, and gender recognition using holons. In Advances in Neural Information Processing Systems. 564--571.Google Scholar
- A. Dantcheva and F. Brémond. 2017. Gender estimation based on smile-dynamics. IEEE Transactions on Information Forensics and Security 12, 3 (March 2017), 719--729. DOI:https://doi.org/10.1109/TIFS.2016.2632070Google Scholar
Digital Library
- H. Dibeklioǧlu, F. Alnajar, A. Ali Salah, and T. Gevers. 2015. Combining facial dynamics with appearance for age estimation. IEEE Transactions on Image Processing 24, 6 (June 2015), 1928--1943. DOI:https://doi.org/10.1109/TIP.2015.2412377Google Scholar
Digital Library
- M. Duan, K. Li, and K. Li. 2018. An ensemble CNN2ELM for age estimation. IEEE Transactions on Information Forensics and Security 13, 3 (March 2018), 758--772. DOI:https://doi.org/10.1109/TIFS.2017.2766583Google Scholar
Cross Ref
- Mingxing Duan, Kenli Li, and Qi Tian. 2018. A novel multi-task tensor correlation neural network for facial attribute prediction. arXiv:1804.02810 (4 2018).Google Scholar
- Mingxing Duan, Kenli Li, Canqun Yang, and Keqin Li. 2018. A hybrid deep learning CNN-ELM for age and gender classification. Neurocomputing 275 (2018), 448--461. DOI:https://doi.org/10.1016/j.neucom.2017.08.062Google Scholar
Digital Library
- Max Ehrlich, Timothy J. Shields, Timur Almaev, and Mohamed R. Amer. 2016. Facial attributes classification using multi-task representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 47--55.Google Scholar
- E. Eidinger, R. Enbar, and T. Hassner. 2014. Age and gender estimation of unfiltered faces. IEEE Transactions on Information Forensics and Security 9, 12 (Dec. 2014), 2170--2179. DOI:https://doi.org/10.1109/TIFS.2014.2359646Google Scholar
Digital Library
- S. Escalera, M. T. Torres, B. Martínez, X. Baró, H. J. Escalante, I. Guyon, G. Tzimiropoulos, C. Corneanu, M. Oliu, M. A. Bagheri, and M. Valstar. 2016. ChaLearn looking at people and faces of the world: Face analysis workshop and challenge 2016. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’16). 706--713. DOI:https://doi.org/10.1109/CVPRW.2016.93Google Scholar
Cross Ref
- Yun Fu, Guodong Guo, and Thomas S. Huang. 2010. Age synthesis and estimation via faces: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 11 (2010), 1955--1976.Google Scholar
Digital Library
- Bin-Bin Gao, Hong-Yu Zhou, Jianxin Wu, and Xin Geng. 2018. Age estimation using expectation of label distribution learning. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18). 712–718.Google Scholar
Digital Library
- Asuman Gunay and Vasif V. Nabiyev. 2008. Automatic age classification with LBP. In Proceedings of the International Symposium on Computer and Information Sciences. 1--4.Google Scholar
- G. Guo and G. Mu. 2010. Human age estimation: What is the influence across race and gender?. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops. 71--78. DOI:https://doi.org/10.1109/CVPRW.2010.5543609Google Scholar
- Guodong Guo and Guowang Mu. 2014. A framework for joint estimation of age, gender and ethnicity on a large database. Image and Vision Computing 32, 10 (2014), 761--770.Google Scholar
- F. Gürpinar, H. Kaya, H. Dibeklioglu, and A. A. Salah. 2016. Kernel ELM and CNN based facial age estimation. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’16). 785--791. DOI:https://doi.org/10.1109/CVPRW.2016.103Google Scholar
Cross Ref
- M. A. Hajizadeh and H. Ebrahimnezhad. 2011. Classification of age groups from facial image using histograms of oriented gradients. In Proceedings of the 2011 7th Iranian Conference on Machine Vision and Image Processing. 1--5. DOI:https://doi.org/10.1109/IranianMVIP.2011.6121582Google Scholar
Cross Ref
- H. Han, A. K. Jain, F. Wang, S. Shan, and X. Chen. 2018. Heterogeneous face attribute estimation: A deep multi-task learning approach. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 11 (Nov. 2018), 2597--2609. DOI:https://doi.org/10.1109/TPAMI.2017.2738004Google Scholar
Digital Library
- Emily Hand and Rama Chellappa. 2017. Attributes for Improved Attributes: A Multi-Task Network Utilizing Implicit and Explicit Relationships for Facial Attribute Classification. In AAAI. 4068–4074.Google Scholar
- David R. Hardoon, Sandor R. Szedmak, and John R. Shawe-Taylor. 2004. Canonical correlation analysis: An overview with application to learning methods. Neural Computation 16, 12 (2004), 2639.Google Scholar
Digital Library
- T. Hassner, S. Harel, E. Paz, and R. Enbar. Open University of Israel. http://www.openu.ac.il/home/hassner/Adience.Google Scholar
- Wen Bing Horng, Cheng Ping Lee, and Chun Wen Chen. 2001. Classification of age groups based on facial features. Tamkang Journal of Science and Engineering 4, 4 (2001), 183--192.Google Scholar
- Z. Hu, Y. Wen, J. Wang, M. Wang, R. Hong, and S. Yan. 2017. Facial age estimation with age difference. IEEE Transactions on Image Processing 26, 7 (July 2017), 3087--3097. DOI:https://doi.org/10.1109/TIP.2016.2633868Google Scholar
Digital Library
- Chen Huang, Yining Li, Chen Change Loy, and Xiaoou Tang. 2016. Learning deep representation for imbalanced classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5375--5384.Google Scholar
Cross Ref
- X. Jing, X. Zhu, F. Wu, R. Hu, X. You, Y. Wang, H. Feng, and J. Yang. 2017. Super-resolution person re-identification with semi-coupled low-rank discriminant dictionary learning. IEEE Transactions on Image Processing 26, 3 (March 2017), 1363--1378. DOI:https://doi.org/10.1109/TIP.2017.2651364Google Scholar
Digital Library
- Neeraj Kumar, Peter Belhumeur, and Shree Nayar. 2008. FaceTracer: A search engine for large collections of images with faces. In European Conference on Computer Vision. 340--353.Google Scholar
Digital Library
- Young Ho Kwon and N. Da Vitoria Lobo. 1999. Age classification from facial images. Computer Vision and Image Understanding 74, 1 (1999), 1--21.Google Scholar
Digital Library
- Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. 2006. On the best rank-1 and rank-(R1,R2,…, RN) approximation of higher-order tensors. Siam Journal on Matrix Analysis and Applications 21, 4 (2006), 1324--1342.Google Scholar
Digital Library
- Gil Levi and Tal Hassncer. 2015. Age and gender classification using convolutional neural networks. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’15). 34--42.Google Scholar
Cross Ref
- Kuan Hsien Liu, Shuicheng Yan, and C. C. Jay Kuo. 2015. Age estimation via grouping and decision fusion. IEEE Transactions on Information Forensics and Security 10, 11 (2015), 2408--2423.Google Scholar
Digital Library
- X. Liu, S. Li, M. Kan, J. Zhang, S. Wu, W. Liu, H. Han, S. Shan, and X. Chen. 2015. AgeNet: Deeply learned regressor and classifier for robust apparent age estimation. In Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop (ICCVW’15). 258--266. DOI:https://doi.org/10.1109/ICCVW.2015.42Google Scholar
- Xin Liu, Shaoxin Li, Meina Kan, Jie Zhang, Shuzhe Wu, Wenxian Liu, Hu Han, Shiguang Shan, and Xilin Chen. 2015. AgeNet: Deeply learned regressor and classifier for robust apparent age estimation. In IEEE International Conference on Computer Vision Workshop. 258--266.Google Scholar
Digital Library
- Z. Liu, P. Luo, X. Wang, and X. Tang. 2015. Deep learning face attributes in the wild. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV’15). 3730--3738. DOI:https://doi.org/10.1109/ICCV.2015.425Google Scholar
Digital Library
- P. Luo, X. Wang, and X. Tang. 2013. A deep sum-product architecture for robust facial attributes analysis. In Proceedings of the 2013 IEEE International Conference on Computer Vision. 2864--2871. DOI:https://doi.org/10.1109/ICCV.2013.356Google Scholar
- Yong Luo, Dacheng Tao, Kotagiri Ramamohanarao, Chao Xu, and Yonggang Wen. 2015. Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Transactions on Knowledge and Data Engineering 27, 11 (2015), 3111--3124.Google Scholar
Digital Library
- R. C. Mallı, M. Aygün, and H. K. Ekenel. 2016. Apparent age estimation using ensemble of deep learning models. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’16). 714--721. DOI:https://doi.org/10.1109/CVPRW.2016.94Google Scholar
Cross Ref
- Markus Mathias, Rodrigo Benenson, Marco Pedersoli, and Luc Van Gool. 2014. Face detection without bells and whistles. In Proceedings of the European Conference on Computer Vision. 720--735.Google Scholar
Cross Ref
- Hongying Meng, Di Huang, Heng Wang, Hongyu Yang, Mohammed AI-Shuraifi, and Yunhong Wang. 2013. Depression recognition based on dynamic facial and vocal expression features using partial least square regression. In Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge (AVEC’13). ACM, New York, NY, 21--30. DOI:https://doi.org/10.1145/2512530.2512532Google Scholar
Digital Library
- Hongyu Pan, Hu Han, Shiguang Shan, and Xilin Chen. 2018. Mean-variance loss for deep age estimation from a face. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).Google Scholar
Cross Ref
- R. Polikar. 2006. Ensemble based systems in decision making. IEEE Circuits and Systems Magazine 6, 3 (2006), 21--45. DOI:https://doi.org/10.1109/MCAS.2006.1688199Google Scholar
- G. J. Qi, C. Aggarwal, Q. Tian, H. Ji, and T. Huang. 2012. Exploring context and content links in social media: A latent space method. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 5 (May 2012), 850--862. DOI:https://doi.org/10.1109/TPAMI.2011.191Google Scholar
- Guo Jun Qi, Xian Sheng Hua, and Hong Jiang Zhang. 2009. Learning semantic distance from community-tagged media collection. In Proceedings of the International Conference on Multimedia 2009. 243--252.Google Scholar
Digital Library
- Rajeev Ranjan, Vishal M. Patel, and Rama Chellappa. 2016. HyperFace: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. CoRR abs/1603.01249 (2016). arxiv:1603.01249 http://arxiv.org/abs/1603.01249Google Scholar
- R. Ranjan, S. Sankaranarayanan, C. D. Castillo, and R. Chellappa. 2017. An All-In-One convolutional neural network for face analysis. In Proceedings of the 2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG’17). 17--24. DOI:https://doi.org/10.1109/FG.2017.137Google Scholar
Cross Ref
- Karl Ricanek and Tamirat Tesafaye. 2006. MORPH: A longitudinal image database of normal adult age-progression. In International Conference on Automatic Face and Gesture Recognition. 341--345.Google Scholar
Digital Library
- R. Rothe, R. Timofte, and L. V. Gool. 2015. DEX: Deep expectation of apparent age from a single image. In Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop (ICCVW’15). 252--257. DOI:https://doi.org/10.1109/ICCVW.2015.41Google Scholar
Digital Library
- Rasmus Rothe, Radu Timofte, and Luc Van Gool. 2016. Deep expectation of real and apparent age from a single image without facial landmarks. International Journal of Computer Vision 126, 2–4 (2018), 144–157.Google Scholar
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, and Michael Bernstein. 2014. ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2014), 211--252.Google Scholar
Digital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. Arxiv Preprint Arxiv:1409.1556 (2014).Google Scholar
- C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 1--9. DOI:https://doi.org/10.1109/CVPR.2015.7298594Google Scholar
Cross Ref
- P. Thukral, K. Mitra, and R. Chellappa. 2012. A hierarchical approach for human age estimation. In Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’12). 1529--1532. DOI:https://doi.org/10.1109/ICASSP.2012.6288182Google Scholar
Cross Ref
- Michal Uricar, Radu Timofte, Rasmus Rothe, Jiri Matas, and Luc Van Gool. 2016. Structured output SVM prediction of apparent age, gender and smile from deep features. In Computer Vision and Pattern Recognition Workshops. 730--738.Google Scholar
Cross Ref
- M. Uricár, R. Timofte, R. Rothe, J. Matas, and L. V. Gool. 2016. Structured output SVM prediction of apparent age, gender and smile from deep features. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’16). 730--738. DOI:https://doi.org/10.1109/CVPRW.2016.96Google Scholar
Cross Ref
- Z. Wu, Q. Ke, J. Sun, and H. Y. Shum. 2011. Scalable face image retrieval with identity-based quantization and multireference reranking. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 10 (Oct. 2011), 1991--2001. DOI:https://doi.org/10.1109/TPAMI.2011.111Google Scholar
- Xiao-Yuan Jing and D. Zhang. 2004. A face and palmprint recognition approach based on discriminant DCT feature extraction. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 34, 6 (Dec. 2004), 2405--2415. DOI:https://doi.org/10.1109/TSMCB.2004.837586Google Scholar
Digital Library
- Yongqiang Yao, Di Huang, Xudong Yang, Yunhong Wang, and Liming Chen. 2018. Texture and geometry scattering representation-based facial expression recognition in 2D+3D videos. ACM Transactions on Multimedia Computing Communications, and Applications 14, 1s (March 2018), Article 18, 23 pages. DOI:https://doi.org/10.1145/3131345Google Scholar
- Dong Yi, Zhen Lei, and Stan Z. Li. 2014. Age estimation by multi-scale convolutional network. In Asian Conference on Computer Vision. 144--158.Google Scholar
- K. Zhang, N. Liu, X. Yuan, X. Guo, C. Gao, and Z. Zhao. 2018. Fine-grained age group classification in the wild. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR’18). 788--793. DOI:https://doi.org/10.1109/ICPR.2018.8545333Google Scholar
Cross Ref
- N. Zhang, M. Paluri, M. Ranzato, T. Darrell, and L. Bourdev. 2014. PANDA: Pose aligned networks for deep attribute modeling. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. 1637--1644. DOI:https://doi.org/10.1109/CVPR.2014.212Google Scholar
Digital Library
- Z. Zhang, P. Luo, C. C. Loy, and X. Tang. 2016. Learning deep representation for face alignment with auxiliary attributes. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 5 (May 2016), 918--930. DOI:https://doi.org/10.1109/TPAMI.2015.2469286Google Scholar
Digital Library
- Zhanpeng Zhang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2018. From facial expression recognition to interpersonal relation prediction. International Journal of Computer Vision 126, 5 (1 May 2018), 550--569. DOI:https://doi.org/10.1007/s11263-017-1055-1Google Scholar
Digital Library
- Yang Zhong, Josephine Sullivan, and Haibo Li. 2016. Face attribute prediction using off-the-shelf CNN features. In International Conference on Biometrics. 1--7.Google Scholar
Cross Ref
- X. Zhu, X. Jing, X. You, X. Zhang, and T. Zhang. 2018. Video-based person re-identification by simultaneously learning intra-video and inter-video distance metrics. IEEE Transactions on Image Processing 27, 11 (Nov. 2018), 5683--5695. DOI:https://doi.org/10.1109/TIP.2018.2861366Google Scholar
Cross Ref
Index Terms
Features-Enhanced Multi-Attribute Estimation with Convolutional Tensor Correlation Fusion Network
Recommendations
A Novel Multi-task Tensor Correlation Neural Network for Facial Attribute Prediction
Regular PapersMulti-task learning plays an important role in face multi-attribute prediction. At present, most researches excavate the shared information between attributes by sharing all convolutional layers. However, it is not appropriate to treat the low-level and ...
Discriminative Multiple Canonical Correlation Analysis for Multi-feature Information Fusion
ISM '12: Proceedings of the 2012 IEEE International Symposium on MultimediaThis paper presents a novel approach for multi-feature information fusion. The proposed method is based on the Discriminative Multiple Canonical Correlation Analysis (DMCCA), which can extract more discriminative characteristics for recognition from ...
A Tri-Attention fusion guided multi-modal segmentation network
Highlights- A novel correlation description block is introduced to discover the latent multi-source correlation between modalities.
AbstractIn the field of multimodal segmentation, the correlation between different modalities can be considered for improving the segmentation results. Considering the correlation between different MR modalities, in this paper, we propose a ...






Comments