skip to main content
research-article

U-Net Conditional GANs for Photo-Realistic and Identity-Preserving Facial Expression Synthesis

Authors Info & Claims
Published:15 October 2019Publication History
Skip Abstract Section

Abstract

Facial expression synthesis (FES) is a challenging task since the expression changes are highly non-linear and depend on the facial appearance. Person identity should also be well preserved in the synthesized face. In this article, we present a novel U-Net Conditional Generative Adversarial Network for FES. U-Net helps retain the property of the input face, including the identity information and facial details. Category condition is added to the U-Net model so that one-to-many expression synthesis can be achieved simultaneously. We also design constraints for identity preservation during FES to further guarantee that the identity of the input face can be well preserved in the generated face image. Specifically, we pair the generated output with condition image of other identities for the discriminator, so as to encourage it to learn the distinctions between the synthesized and natural images, as well as between input and other identities, which can help improve its discriminating ability. Additionally, we utilize the triplet loss to maintain the generated face images closer to the same identity person by imposing a margin between the positive pairs and negative pairs in feature space. Both qualitative and quantitative evaluations are conducted on the Oulu-CASIA NIR8VIS facial expression database, the Radboud Faces Database, and the Karolinska Directed Emotional Faces database, and the experimental results show that our method can generate faces with natural and realistic expressions while preserving identity information.

References

  1. Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, et al. 2016. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467.Google ScholarGoogle Scholar
  2. Thaddeus Beier and Shawn Neely. 1992. Feature-based image metamorphosis. In Proceedings of the 19th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’92). 35--42. DOI:https://doi.org/10.1145/133994.134003Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alexander M. Bronstein, Michael M. Bronstein, and Ron Kimmel. 2003. Expression-invariant 3D face recognition. In Proceedings of the 4th International Conference on Audio-and Video-Based Biometrie Person Authentication (AVBPA’03). 62--69. DOI:https://doi.org/10.1007/3-540-44887-X_8Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Brian Cheung, Jesse A. Livezey, Arjun K. Bansal, and Bruno A. Olshausen. 2015. Discovering hidden factors of variation in deep networks. In Proceedings of the International Conference on Learning Representations Workshop (ICLR’15) Workshop.Google ScholarGoogle Scholar
  5. Yunjey Choi, Min-Je Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 8789--8797. DOI:https://doi.org/10.1109/CVPR.2018.00916Google ScholarGoogle ScholarCross RefCross Ref
  6. Hui Ding, Kumar Sricharan, and Rama Chellappa. 2018. ExprGAN: Facial expression editing with controllable expression intensity. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18). 6781--6788.Google ScholarGoogle Scholar
  7. Jon Gauthier. 2014. Conditional generative adversarial nets for convolutional face generation. Class Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter Semester.Google ScholarGoogle Scholar
  8. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems 27 (NeurIPS’14). 2672--2680.Google ScholarGoogle Scholar
  9. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770--778. DOI:https://doi.org/10.1109/CVPR.2016.90Google ScholarGoogle ScholarCross RefCross Ref
  10. Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv:1703.07737.Google ScholarGoogle Scholar
  11. Rui Huang, Shu Zhang, Tianyu Li, and Ran He. 2017. Beyond face rotation: Global and local perception GAN for photorealistic and identity preserving frontal view synthesis. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 2458--2467. DOI:https://doi.org/10.1109/ICCV.2017.267Google ScholarGoogle ScholarCross RefCross Ref
  12. Megvii Inc. [n.d.]. Face++ Research Toolkit. Available at http://www.faceplusplus.com.Google ScholarGoogle Scholar
  13. Microsoft Inc. [n.d.]. Microsoft Emotion API. Retrieved September 18, 2019 from https://azure.microsoft.com/en-us/services/cognitive-services/emotion/.Google ScholarGoogle Scholar
  14. Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning (ICML’15), Vol. 37. 448--456.Google ScholarGoogle Scholar
  15. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 5967--5976. DOI:https://doi.org/10.1109/CVPR.2017.632Google ScholarGoogle Scholar
  16. Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision (ECCV’16). 694--711. DOI:https://doi.org/10.1007/978-3-319-46475-6_43Google ScholarGoogle ScholarCross RefCross Ref
  17. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15).Google ScholarGoogle Scholar
  18. Neeraj Kumar, Peter N. Belhumeur, and Shree K. Nayar. 2008. FaceTracer: A Search engine for large collections of images with faces. In Proceedings of the European Conference on Computer Vision (ECCV’08). 340--353. DOI:https://doi.org/10.1007/978-3-540-88693-8_25Google ScholarGoogle Scholar
  19. O. Langner, R. Dotsch, G. Bijlstra, D. H. J. Wigboldus, S. T. Hawk, and A. D. Van Knippenberg. 2010. Presentation and validation of the Radboud faces database. Cognition and Emotion 24, 8 (2010), 1377--1388.Google ScholarGoogle ScholarCross RefCross Ref
  20. Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew P. Aitken, et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 105--114. DOI:https://doi.org/10.1109/CVPR.2017.19Google ScholarGoogle ScholarCross RefCross Ref
  21. Mu Li, Wangmeng Zuo, and David Zhang. 2016. Deep identity-aware transfer of facial attributes. arXiv:1610.05586.Google ScholarGoogle Scholar
  22. Runde Li, Jinshan Pan, Zechao Li, and Jinhui Tang. 2018. Single image dehazing via conditional generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 8202--8211. DOI:https://doi.org/10.1109/CVPR.2018.00856Google ScholarGoogle ScholarCross RefCross Ref
  23. Xiaoxing Li, Greg Mori, and Hao Zhang. 2006. Expression-invariant face recognition with expression classification. In Proceedings of the 3rd Canadian Conference on Computer and Robot Vision (CRV’06). 77. DOI:https://doi.org/10.1109/CRV.2006.34Google ScholarGoogle Scholar
  24. James Jenn-Jier Lien, Takeo Kanade, Jeffrey F. Cohn, and Ching-Chung Li. 2000. Detection, tracking, and classification of action units in facial expression. Robotics and Autonomous Systems 31, 3 (2000), 131--146. DOI:https://doi.org/10.1016/S0921-8890(99)00103-7Google ScholarGoogle ScholarCross RefCross Ref
  25. Christine L. Lisetti and Diane J. Schiano. 2000. Automatic facial expression interpretation: Where human-computer interaction, artificial intelligence and cognitive science intersect. Pragmatics 8 Cognition 8, 1 (2000), 185--235.Google ScholarGoogle Scholar
  26. X. Liu, B. V. K. Vijaya Kumar, Y. Ge, C. Yang, J. You, and P. Jia. 2018. Normalized face image generation with perceptron generative adversarial networks. In Proceedings of the IEEE 4th International Conference on Identity, Security, and Behavior Analysis (ISBA’18). 1--8. DOI:https://doi.org/10.1109/ISBA.2018.8311462Google ScholarGoogle Scholar
  27. X. Liu, B. V. K. Vijaya Kumar, P. Jia, and J. You. 2019. Hard negative generation for identity-disentangled facial expression recognition. Pattern Recognition 88 (2019), 1--12. DOI:https://doi.org/10.1016/j.patcog.2018.11.001Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Zicheng Liu, Ying Shan, and Zhengyou Zhang. 2001. Expressive expression mapping with ratio images. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’01). 271--276. DOI:https://doi.org/10.1145/383259.383289Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Daniel Lundqvist, Anders Flykt, and Arne Öhman. 1998. The Karolinska Directed Emotional Faces (KDEF). CD ROM from the Department of Clinical Neuroscience, Psychology Section, Karolinska Institutet.Google ScholarGoogle Scholar
  30. Andrew L. Maas, Awni Y. Hannun, and Andrew Y. Ng. 2013. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the ICML Workshop on Deep Learning for Audio, Speech, and Language Processing, Vol. 1. 3.Google ScholarGoogle Scholar
  31. Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv:1411.1784.Google ScholarGoogle Scholar
  32. Augustus Odena, Christopher Olah, and Jonathon Shlens. 2017. Conditional image synthesis with auxiliary classifier GANs. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). 2642--2651.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Kyle Olszewski, Zimo Li, Chao Yang, Yi Zhou, Ronald Yu, Zeng Huang, Sitao Xiang, Shunsuke Saito, Pushmeet Kohli, and Hao Li. 2017. Realistic dynamic facial textures from a single image using GANs. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 5439--5448. DOI:https://doi.org/10.1109/ICCV.2017.580Google ScholarGoogle ScholarCross RefCross Ref
  34. Frédéric H. Pighin, Jamie Hecker, Dani Lischinski, Richard Szeliski, and David Salesin. 2005. Synthesizing realistic facial expressions from photographs. In Proceedings of the 32th International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’05) Courses. 9. DOI:https://doi.org/10.1145/1198555.1198589Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Albert Pumarola, Antonio Agudo, Aleix M. Martinez, Alberto Sanfeliu, and Francesc Moreno-Noguer. 2018. GANimation: Anatomically-aware facial animation from a single image. In Proceedings of the European Conference on Computer Vision (ECCV’18). 835--851. DOI:https://doi.org/10.1007/978-3-030-01249-6_50Google ScholarGoogle ScholarCross RefCross Ref
  36. Alec Radford, Luke Metz, and Soumith Chintala. 2016. Unsupervised representation learning with deep convolutional generative adversarial networks. In Proceedings of the International Conference on Learning Representations (ICLR’16).Google ScholarGoogle Scholar
  37. Scott E. Reed, Kihyuk Sohn, Yuting Zhang, and Honglak Lee. 2014. Learning to disentangle factors of variation with manifold interaction. In Proceedings of the 31th International Conference on Machine Learning (ICML’14). 1431--1439.Google ScholarGoogle Scholar
  38. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI’15). 234--241. DOI:https://doi.org/10.1007/978-3-319-24574-4_28Google ScholarGoogle ScholarCross RefCross Ref
  39. Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 815--823. DOI:https://doi.org/10.1109/CVPR.2015.7298682Google ScholarGoogle ScholarCross RefCross Ref
  40. Steven M. Seitz and Charles R. Dyer. 1996. View morphing. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’96). 21--30. DOI:https://doi.org/10.1145/237170.237196Google ScholarGoogle Scholar
  41. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.Google ScholarGoogle Scholar
  42. Lingxiao Song, Zhihe Lu, Ran He, Zhenan Sun, and Tieniu Tan. 2018. Geometry guided adversarial facial expression synthesis. In Proceedings of the ACM Multimedia Conference on Multimedia Conference (MM’18). 627--635. DOI:https://doi.org/10.1145/3240508.3240612Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Joshua M. Susskind, Geoffrey E. Hinton, Javier R. Movellan, and Adam K. Anderson. 2008. Generating facial expressions with deep belief nets. In Affective Computing, J. Or (Ed.). I-Tech Education and Printing, London, United Kingdom, 421--440.Google ScholarGoogle Scholar
  44. Justus Thies, Michael Zollhöfer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2016. Face2Face: Real-time face capture and reenactment of RGB videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 2387--2395. DOI:https://doi.org/10.1109/CVPR.2016.262Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 8798--8807. DOI:https://doi.org/10.1109/CVPR.2018.00917Google ScholarGoogle ScholarCross RefCross Ref
  46. Xueping Wang, Weixin Li, Guodong Mu, Di Huang, and Yunhong Wang. 2018. Facial expression synthesis by U-Net conditional generative adversarial networks. In Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval (ICMR’18). 283--290. DOI:https://doi.org/10.1145/3206025.3206068Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Wenqi Xian, Patsorn Sangkloy, Varun Agrawal, Amit Raj, Jingwan Lu, Chen Fang, Fisher Yu, and James Hays. 2018. TextureGAN: Controlling deep image synthesis with texture patches. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 8456--8465. DOI:https://doi.org/10.1109/CVPR.2018.00882Google ScholarGoogle ScholarCross RefCross Ref
  48. Hongyu Yang, Di Huang, Yunhong Wang, and Anil K. Jain. 2018. Learning face age progression: A pyramid architecture of GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 31--39. DOI:https://doi.org/10.1109/CVPR.2018.00011Google ScholarGoogle Scholar
  49. Raymond Yeh, Ziwei Liu, Dan B Goldman, and Aseem Agarwala. 2016. Semantic facial expression editing using autoencoded flow. arXiv:1611.09961.Google ScholarGoogle Scholar
  50. Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters 23, 10 (2016), 1499--1503. DOI:https://doi.org/10.1109/LSP.2016.2603342Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Qingshan Zhang, Zicheng Liu, Baining Guo, Demetri Terzopoulos, and Heung-Yeung Shum. 2006. Geometry-driven photorealistic facial expression synthesis. IEEE Transactions on Visualization and Computer Graphics 12, 1 (2006), 48--60. DOI:https://doi.org/10.1109/TVCG.2006.9Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Guoying Zhao, Xiaohua Huang, Matti Taini, Stan Z. Li, and Matti Pietikäinen. 2011. Facial expression recognition from near-infrared videos. Image and Vision Computing 29, 9 (2011), 607--619. DOI:https://doi.org/10.1016/j.imavis.2011.07.002Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Yuqian Zhou and Bertram Emil Shi. 2017. Photorealistic facial expression synthesis by the conditional difference adversarial autoencoder. In Proceedings of the International Conference on Affective Computing and Intelligent Interaction (ACII’17). 370--376. DOI:https://doi.org/10.1109/ACII.2017.8273626Google ScholarGoogle ScholarCross RefCross Ref
  54. Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 2242--2251. DOI:https://doi.org/10.1109/ICCV.2017.244Google ScholarGoogle Scholar

Index Terms

  1. U-Net Conditional GANs for Photo-Realistic and Identity-Preserving Facial Expression Synthesis

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Multimedia Computing, Communications, and Applications
            ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 15, Issue 3s
            Special Issue on Face Analysis for Applications and Special Issue on Affective Computing for Large-Scale Heterogeneous Multimedia Data
            November 2019
            304 pages
            ISSN:1551-6857
            EISSN:1551-6865
            DOI:10.1145/3368027
            Issue’s Table of Contents

            Copyright © 2019 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 15 October 2019
            • Revised: 1 August 2019
            • Accepted: 1 August 2019
            • Received: 1 November 2018
            Published in tomm Volume 15, Issue 3s

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!