Abstract
Facial expression synthesis (FES) is a challenging task since the expression changes are highly non-linear and depend on the facial appearance. Person identity should also be well preserved in the synthesized face. In this article, we present a novel U-Net Conditional Generative Adversarial Network for FES. U-Net helps retain the property of the input face, including the identity information and facial details. Category condition is added to the U-Net model so that one-to-many expression synthesis can be achieved simultaneously. We also design constraints for identity preservation during FES to further guarantee that the identity of the input face can be well preserved in the generated face image. Specifically, we pair the generated output with condition image of other identities for the discriminator, so as to encourage it to learn the distinctions between the synthesized and natural images, as well as between input and other identities, which can help improve its discriminating ability. Additionally, we utilize the triplet loss to maintain the generated face images closer to the same identity person by imposing a margin between the positive pairs and negative pairs in feature space. Both qualitative and quantitative evaluations are conducted on the Oulu-CASIA NIR8VIS facial expression database, the Radboud Faces Database, and the Karolinska Directed Emotional Faces database, and the experimental results show that our method can generate faces with natural and realistic expressions while preserving identity information.
- Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, et al. 2016. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467.Google Scholar
- Thaddeus Beier and Shawn Neely. 1992. Feature-based image metamorphosis. In Proceedings of the 19th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’92). 35--42. DOI:https://doi.org/10.1145/133994.134003Google Scholar
Digital Library
- Alexander M. Bronstein, Michael M. Bronstein, and Ron Kimmel. 2003. Expression-invariant 3D face recognition. In Proceedings of the 4th International Conference on Audio-and Video-Based Biometrie Person Authentication (AVBPA’03). 62--69. DOI:https://doi.org/10.1007/3-540-44887-X_8Google Scholar
Digital Library
- Brian Cheung, Jesse A. Livezey, Arjun K. Bansal, and Bruno A. Olshausen. 2015. Discovering hidden factors of variation in deep networks. In Proceedings of the International Conference on Learning Representations Workshop (ICLR’15) Workshop.Google Scholar
- Yunjey Choi, Min-Je Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 8789--8797. DOI:https://doi.org/10.1109/CVPR.2018.00916Google Scholar
Cross Ref
- Hui Ding, Kumar Sricharan, and Rama Chellappa. 2018. ExprGAN: Facial expression editing with controllable expression intensity. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18). 6781--6788.Google Scholar
- Jon Gauthier. 2014. Conditional generative adversarial nets for convolutional face generation. Class Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter Semester.Google Scholar
- Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems 27 (NeurIPS’14). 2672--2680.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770--778. DOI:https://doi.org/10.1109/CVPR.2016.90Google Scholar
Cross Ref
- Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv:1703.07737.Google Scholar
- Rui Huang, Shu Zhang, Tianyu Li, and Ran He. 2017. Beyond face rotation: Global and local perception GAN for photorealistic and identity preserving frontal view synthesis. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 2458--2467. DOI:https://doi.org/10.1109/ICCV.2017.267Google Scholar
Cross Ref
- Megvii Inc. [n.d.]. Face++ Research Toolkit. Available at http://www.faceplusplus.com.Google Scholar
- Microsoft Inc. [n.d.]. Microsoft Emotion API. Retrieved September 18, 2019 from https://azure.microsoft.com/en-us/services/cognitive-services/emotion/.Google Scholar
- Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning (ICML’15), Vol. 37. 448--456.Google Scholar
- Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 5967--5976. DOI:https://doi.org/10.1109/CVPR.2017.632Google Scholar
- Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision (ECCV’16). 694--711. DOI:https://doi.org/10.1007/978-3-319-46475-6_43Google Scholar
Cross Ref
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15).Google Scholar
- Neeraj Kumar, Peter N. Belhumeur, and Shree K. Nayar. 2008. FaceTracer: A Search engine for large collections of images with faces. In Proceedings of the European Conference on Computer Vision (ECCV’08). 340--353. DOI:https://doi.org/10.1007/978-3-540-88693-8_25Google Scholar
- O. Langner, R. Dotsch, G. Bijlstra, D. H. J. Wigboldus, S. T. Hawk, and A. D. Van Knippenberg. 2010. Presentation and validation of the Radboud faces database. Cognition and Emotion 24, 8 (2010), 1377--1388.Google Scholar
Cross Ref
- Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew P. Aitken, et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 105--114. DOI:https://doi.org/10.1109/CVPR.2017.19Google Scholar
Cross Ref
- Mu Li, Wangmeng Zuo, and David Zhang. 2016. Deep identity-aware transfer of facial attributes. arXiv:1610.05586.Google Scholar
- Runde Li, Jinshan Pan, Zechao Li, and Jinhui Tang. 2018. Single image dehazing via conditional generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 8202--8211. DOI:https://doi.org/10.1109/CVPR.2018.00856Google Scholar
Cross Ref
- Xiaoxing Li, Greg Mori, and Hao Zhang. 2006. Expression-invariant face recognition with expression classification. In Proceedings of the 3rd Canadian Conference on Computer and Robot Vision (CRV’06). 77. DOI:https://doi.org/10.1109/CRV.2006.34Google Scholar
- James Jenn-Jier Lien, Takeo Kanade, Jeffrey F. Cohn, and Ching-Chung Li. 2000. Detection, tracking, and classification of action units in facial expression. Robotics and Autonomous Systems 31, 3 (2000), 131--146. DOI:https://doi.org/10.1016/S0921-8890(99)00103-7Google Scholar
Cross Ref
- Christine L. Lisetti and Diane J. Schiano. 2000. Automatic facial expression interpretation: Where human-computer interaction, artificial intelligence and cognitive science intersect. Pragmatics 8 Cognition 8, 1 (2000), 185--235.Google Scholar
- X. Liu, B. V. K. Vijaya Kumar, Y. Ge, C. Yang, J. You, and P. Jia. 2018. Normalized face image generation with perceptron generative adversarial networks. In Proceedings of the IEEE 4th International Conference on Identity, Security, and Behavior Analysis (ISBA’18). 1--8. DOI:https://doi.org/10.1109/ISBA.2018.8311462Google Scholar
- X. Liu, B. V. K. Vijaya Kumar, P. Jia, and J. You. 2019. Hard negative generation for identity-disentangled facial expression recognition. Pattern Recognition 88 (2019), 1--12. DOI:https://doi.org/10.1016/j.patcog.2018.11.001Google Scholar
Digital Library
- Zicheng Liu, Ying Shan, and Zhengyou Zhang. 2001. Expressive expression mapping with ratio images. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’01). 271--276. DOI:https://doi.org/10.1145/383259.383289Google Scholar
Digital Library
- Daniel Lundqvist, Anders Flykt, and Arne Öhman. 1998. The Karolinska Directed Emotional Faces (KDEF). CD ROM from the Department of Clinical Neuroscience, Psychology Section, Karolinska Institutet.Google Scholar
- Andrew L. Maas, Awni Y. Hannun, and Andrew Y. Ng. 2013. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the ICML Workshop on Deep Learning for Audio, Speech, and Language Processing, Vol. 1. 3.Google Scholar
- Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv:1411.1784.Google Scholar
- Augustus Odena, Christopher Olah, and Jonathon Shlens. 2017. Conditional image synthesis with auxiliary classifier GANs. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). 2642--2651.Google Scholar
Digital Library
- Kyle Olszewski, Zimo Li, Chao Yang, Yi Zhou, Ronald Yu, Zeng Huang, Sitao Xiang, Shunsuke Saito, Pushmeet Kohli, and Hao Li. 2017. Realistic dynamic facial textures from a single image using GANs. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 5439--5448. DOI:https://doi.org/10.1109/ICCV.2017.580Google Scholar
Cross Ref
- Frédéric H. Pighin, Jamie Hecker, Dani Lischinski, Richard Szeliski, and David Salesin. 2005. Synthesizing realistic facial expressions from photographs. In Proceedings of the 32th International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’05) Courses. 9. DOI:https://doi.org/10.1145/1198555.1198589Google Scholar
Digital Library
- Albert Pumarola, Antonio Agudo, Aleix M. Martinez, Alberto Sanfeliu, and Francesc Moreno-Noguer. 2018. GANimation: Anatomically-aware facial animation from a single image. In Proceedings of the European Conference on Computer Vision (ECCV’18). 835--851. DOI:https://doi.org/10.1007/978-3-030-01249-6_50Google Scholar
Cross Ref
- Alec Radford, Luke Metz, and Soumith Chintala. 2016. Unsupervised representation learning with deep convolutional generative adversarial networks. In Proceedings of the International Conference on Learning Representations (ICLR’16).Google Scholar
- Scott E. Reed, Kihyuk Sohn, Yuting Zhang, and Honglak Lee. 2014. Learning to disentangle factors of variation with manifold interaction. In Proceedings of the 31th International Conference on Machine Learning (ICML’14). 1431--1439.Google Scholar
- Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI’15). 234--241. DOI:https://doi.org/10.1007/978-3-319-24574-4_28Google Scholar
Cross Ref
- Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 815--823. DOI:https://doi.org/10.1109/CVPR.2015.7298682Google Scholar
Cross Ref
- Steven M. Seitz and Charles R. Dyer. 1996. View morphing. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’96). 21--30. DOI:https://doi.org/10.1145/237170.237196Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.Google Scholar
- Lingxiao Song, Zhihe Lu, Ran He, Zhenan Sun, and Tieniu Tan. 2018. Geometry guided adversarial facial expression synthesis. In Proceedings of the ACM Multimedia Conference on Multimedia Conference (MM’18). 627--635. DOI:https://doi.org/10.1145/3240508.3240612Google Scholar
Digital Library
- Joshua M. Susskind, Geoffrey E. Hinton, Javier R. Movellan, and Adam K. Anderson. 2008. Generating facial expressions with deep belief nets. In Affective Computing, J. Or (Ed.). I-Tech Education and Printing, London, United Kingdom, 421--440.Google Scholar
- Justus Thies, Michael Zollhöfer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2016. Face2Face: Real-time face capture and reenactment of RGB videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 2387--2395. DOI:https://doi.org/10.1109/CVPR.2016.262Google Scholar
Digital Library
- Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 8798--8807. DOI:https://doi.org/10.1109/CVPR.2018.00917Google Scholar
Cross Ref
- Xueping Wang, Weixin Li, Guodong Mu, Di Huang, and Yunhong Wang. 2018. Facial expression synthesis by U-Net conditional generative adversarial networks. In Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval (ICMR’18). 283--290. DOI:https://doi.org/10.1145/3206025.3206068Google Scholar
Digital Library
- Wenqi Xian, Patsorn Sangkloy, Varun Agrawal, Amit Raj, Jingwan Lu, Chen Fang, Fisher Yu, and James Hays. 2018. TextureGAN: Controlling deep image synthesis with texture patches. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 8456--8465. DOI:https://doi.org/10.1109/CVPR.2018.00882Google Scholar
Cross Ref
- Hongyu Yang, Di Huang, Yunhong Wang, and Anil K. Jain. 2018. Learning face age progression: A pyramid architecture of GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 31--39. DOI:https://doi.org/10.1109/CVPR.2018.00011Google Scholar
- Raymond Yeh, Ziwei Liu, Dan B Goldman, and Aseem Agarwala. 2016. Semantic facial expression editing using autoencoded flow. arXiv:1611.09961.Google Scholar
- Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters 23, 10 (2016), 1499--1503. DOI:https://doi.org/10.1109/LSP.2016.2603342Google Scholar
Digital Library
- Qingshan Zhang, Zicheng Liu, Baining Guo, Demetri Terzopoulos, and Heung-Yeung Shum. 2006. Geometry-driven photorealistic facial expression synthesis. IEEE Transactions on Visualization and Computer Graphics 12, 1 (2006), 48--60. DOI:https://doi.org/10.1109/TVCG.2006.9Google Scholar
Digital Library
- Guoying Zhao, Xiaohua Huang, Matti Taini, Stan Z. Li, and Matti Pietikäinen. 2011. Facial expression recognition from near-infrared videos. Image and Vision Computing 29, 9 (2011), 607--619. DOI:https://doi.org/10.1016/j.imavis.2011.07.002Google Scholar
Digital Library
- Yuqian Zhou and Bertram Emil Shi. 2017. Photorealistic facial expression synthesis by the conditional difference adversarial autoencoder. In Proceedings of the International Conference on Affective Computing and Intelligent Interaction (ACII’17). 370--376. DOI:https://doi.org/10.1109/ACII.2017.8273626Google Scholar
Cross Ref
- Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 2242--2251. DOI:https://doi.org/10.1109/ICCV.2017.244Google Scholar
Index Terms
U-Net Conditional GANs for Photo-Realistic and Identity-Preserving Facial Expression Synthesis
Recommendations
Facial Expression Synthesis by U-Net Conditional Generative Adversarial Networks
ICMR '18: Proceedings of the 2018 ACM on International Conference on Multimedia RetrievalHigh-level manipulation of facial expressions in images such as expression synthesis is challenging because facial expression changes are highly non-linear, and vary depending on the facial appearance. Identity of the person should also be well ...
Geometry Guided Adversarial Facial Expression Synthesis
MM '18: Proceedings of the 26th ACM international conference on MultimediaFacial expression synthesis has drawn much attention in the field of computer graphics and pattern recognition. It has been widely used in face animation and recognition. However, it is still challenging due to the high-level semantic presence of large ...
Photo-realistic facial expression synthesis
This paper details a procedure for generating a function which maps an image of a neutral face to one depicting a desired expression independent of age, sex, or skin colour. Facial expression synthesis is a growing and relatively new domain within ...






Comments