Abstract
Face reenactment aims to generate an animation of a source face using the poses and expressions from a target face. Although recent methods have made remarkable progress by exploiting generative adversarial networks, they are limited in generating high-fidelity and identity-preserving results due to the inappropriate driving information and insufficiently effective animating strategies. In this work, we propose a novel face reenactment framework that achieves both high-fidelity generation and identity preservation. Instead of sparse face representations (e.g., facial landmarks and keypoints), we utilize the Projected Normalized Coordinate Code (PNCC) to better preserve facial details. We propose to reconstruct the PNCC with the source identity parameters and the target pose and expression parameters estimated by 3D face reconstruction to factor out the target identity. By adopting the reconstructed representation as the driving information, we address the problem of identity mismatch. To effectively utilize the driving information, we establish the correspondence between the reconstructed representation and the source representation based on the features extracted by an encoder network. This identity-matched correspondence is then utilized to animate the source face using a novel feature transformation strategy. The generator network is further enhanced by the proposed geometry-aware skip connection. Once trained, our model can be applied to previously unseen faces without further training or fine-tuning. Through extensive experiments, we demonstrate the effectiveness of our method in face reenactment and show that our model outperforms state-of-the-art approaches both qualitatively and quantitatively. Additionally, the proposed PNCC reconstruction module can be easily inserted into other methods and improve their performance in cross-identity face reenactment.
- [1] . 2021. StyleFlow: Attribute-conditioned exploration of StyleGAN-generated images using conditional continuous normalizing flows. ACM Transactions on Graphics 40, 3 (2021), 1–21.Google Scholar
Digital Library
- [2] . 2018. OpenFace 2.0: Facial behavior analysis toolkit. In Proceedings of the 2018 13th IEEE International Conference on Automatic Face and Gesture Recognition (FG’18). IEEE, Los Alamitos, CA, 59–66.Google Scholar
Digital Library
- [3] . 2018. Recycle-GAN: Unsupervised video retargeting. In Proceedings of the European Conference on Computer Vision (ECCV’18). 119–135.Google Scholar
Digital Library
- [4] . 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques. 187–194.Google Scholar
Digital Library
- [5] . 2003. Face recognition based on fitting a 3D morphable model. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 9 (2003), 1063–1074.Google Scholar
Digital Library
- [6] . 2022. Finding directions in GAN’s latent space for neural face reenactment. arXiv preprint arXiv:2202.00046 (2022).Google Scholar
- [7] . 2017. How far are we from solving the 2D and 3D face alignment problem? (and a dataset of 230,000 3D facial landmarks). In Proceedings of the IEEE International Conference on Computer Vision. 1021–1030.Google Scholar
Cross Ref
- [8] . 2020. Neural head reenactment with latent pose descriptors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13786–13795.Google Scholar
Cross Ref
- [9] . 2013. FaceWarehouse: A 3D facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics 20, 3 (2013), 413–425.Google Scholar
- [10] . 2020. PuppeteerGAN: Arbitrary portrait animation with semantic-aware appearance transformation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13518–13527.Google Scholar
Cross Ref
- [11] . 2018. VoxCeleb2: Deep speaker recognition. In Proceedings of INTERSPEECH 2018.Google Scholar
Cross Ref
- [12] . 2019. ArcFace: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4690–4699.Google Scholar
Cross Ref
- [13] . 2020. Disentangled and controllable face image generation via 3D imitative-contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5154–5163.Google Scholar
Cross Ref
- [14] . 2021. Head2Head++: Deep facial attributes re-targeting. IEEE Transactions on Biometrics, Behavior, and Identity Science 3, 1 (2021), 31–43.Google Scholar
Cross Ref
- [15] . 2022. Transform, warp, and dress: A new transformation-guided model for virtual try-on. ACM Transactions on Multimedia Computing, Communications, and Applications 18, 2 (2022), 1–24.Google Scholar
Digital Library
- [16] . 2018. Unsupervised training for 3D morphable model regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8377–8386.Google Scholar
Cross Ref
- [17] . 2020. GIF: Generative interpretable faces. In Proceedings of the 2020 International Conference on 3D Vision (3DV’20). IEEE, Los Alamitos, CA, 868–878.Google Scholar
Cross Ref
- [18] . 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672–2680.Google Scholar
Digital Library
- [19] . 2017. Improved training of Wasserstein GANs. In Advances in Neural Information Processing Systems. 5767–5777.Google Scholar
Digital Library
- [20] . 2020. Towards fast, accurate and stable 3D dense face alignment. In Computer Vision—ECCV 2020. Lecture Notes in Computer Science, Vol. 12364. Springer, 152–168.Google Scholar
- [21] . 2020. Marionette: Few-shot face reenactment preserving identity of unseen targets. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 10893–10900.Google Scholar
Cross Ref
- [22] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google Scholar
Cross Ref
- [23] . 2017. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in Neural Information Processing Systems. 6626–6637.Google Scholar
- [24] . 2022. Dual-generator face reenactment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 642–650.Google Scholar
Cross Ref
- [25] . 2020. Learning identity-invariant motion representations for cross-ID face reenactment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7084–7092.Google Scholar
Cross Ref
- [26] . 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision. 1501–1510.Google Scholar
Cross Ref
- [27] . 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1125–1134.Google Scholar
Cross Ref
- [28] . 2016. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision. 694–711.Google Scholar
Cross Ref
- [29] . 2022. One-shot face reenactment on megapixels. arXiv preprint arXiv:2205.13368 (2022).Google Scholar
- [30] . 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401–4410.Google Scholar
Cross Ref
- [31] . 2018. Deep video portraits. ACM Transactions on Graphics 37, 4 (2018), 1–14.Google Scholar
Digital Library
- [32] . 2015. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [33] . 2020. Head2Head: Video-based neural head synthesis. In Proceedings of the 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG’20). IEEE, Los Alamitos, CA, 16–23.Google Scholar
Digital Library
- [34] . 2020. CONFIG: Controllable neural face image generation. In Computer Vision—ECCV 2020. Lecture Notes in Computer Science, Vol. 12356. Springer, 299–315.Google Scholar
- [35] . 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).Google Scholar
- [36] . 2018. Seeing voices and hearing faces: Cross-modal biometric matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8427–8436.Google Scholar
Cross Ref
- [37] . 2017. VoxCeleb: A large-scale speaker identification dataset. In Proceedings of INTERSPEECH2017.Google Scholar
Cross Ref
- [38] . 2019. FSGAN: Subject agnostic face swapping and reenactment. In Proceedings of the IEEE International Conference on Computer Vision. 7184–7193.Google Scholar
Cross Ref
- [39] . 2022. FSGANv2: Improved subject agnostic face swapping and reenactment. arXiv preprint arXiv:2202.12972 (2022).Google Scholar
- [40] . 2019. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2337–2346.Google Scholar
Cross Ref
- [41] . 2015. Deep face recognition. In Proceedings of the British Machine Vision Conference.Google Scholar
Cross Ref
- [42] . 2009. A 3D face model for pose and illumination invariant face recognition. In Proceedings of the 2009 6th IEEE International Conference on Advanced Video and Signal Based Surveillance. IEEE, Los Alamitos, CA, 296–301.Google Scholar
Digital Library
- [43] . 2016. Adaptive 3D face reconstruction from unconstrained photo collections. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4197–4206.Google Scholar
Cross Ref
- [44] . 2019. Synthesizing facial photometries and corresponding geometries using generative adversarial networks. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 3s (2019), 1–24.Google Scholar
Digital Library
- [45] . 2019. Animating arbitrary objects via deep motion transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2377–2386.Google Scholar
Cross Ref
- [46] . 2019. First order motion model for image animation. In Advances in Neural Information Processing Systems. 7137–7147.Google Scholar
- [47] . 2018. Deformable GANs for pose-based human image generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3408–3416.Google Scholar
Cross Ref
- [48] . 2021. Motion representations for articulated animation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13653–13662.Google Scholar
Cross Ref
- [49] . 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [50] . 2018. Dual conditional GANs for face aging and rejuvenation. In Proceedings of the 27th International Conference on Artificial Intelligence (IJCAI’18). 899–905.Google Scholar
Digital Library
- [51] . 2021. Pareidolia face reenactment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2236–2245.Google Scholar
Cross Ref
- [52] . 2020. StyleRig: Rigging StyleGAN for 3D control over portrait images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6142–6151.Google Scholar
Cross Ref
- [53] . 2017. MoFA: Model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 1274–1283.Google Scholar
- [54] . 2015. Real-time expression transfer for facial reenactment.ACM Transactions on Graphics 34, 6 (2015), Article 183, 14 pages.Google Scholar
Digital Library
- [55] . 2016. Face2Face: Real-time face capture and reenactment of RGB videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2387–2395.Google Scholar
Cross Ref
- [56] . 2020. ICface: Interpretable and controllable face reenactment using GANs. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 3385–3394.Google Scholar
Cross Ref
- [57] . 2018. Towards accurate generative models of video: A new metric and challenges. arXiv preprint arXiv:1812.01717 (2018).Google Scholar
- [58] . 2019. Few-shot video-to-video synthesis. In Advances in Neural Information Processing Systems. 5013–5024.Google Scholar
- [59] . 2021. One-shot free-view neural talking-head synthesis for video conferencing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10039–10049.Google Scholar
Cross Ref
- [60] . 2019. U-Net conditional GANs for photo-realistic and identity-preserving facial expression synthesis. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 3s (2019), 1–23.Google Scholar
Digital Library
- [61] . 2004. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612.Google Scholar
Digital Library
- [62] . 2018. X2Face: A network for controlling face generation using images, audio, and pose codes. In Proceedings of the European Conference on Computer Vision (ECCV’18). 670–686.Google Scholar
Digital Library
- [63] . 2018. ReenactGAN: Learning to reenact faces via boundary transfer. In Proceedings of the European Conference on Computer Vision (ECCV’18). 603–619.Google Scholar
Digital Library
- [64] . 2020. One-shot identity-preserving portrait reenactment. arXiv preprint arXiv:2004.12452 (2020).Google Scholar
- [65] . 2017. Face transfer with generative adversarial network. arXiv preprint arXiv:1710.06090 (2017).Google Scholar
- [66] . 2021. One-shot face reenactment using appearance adaptive normalization. In Proceedings of the AAAI Conference on Artificial Intelligence. 3172–3180.Google Scholar
Cross Ref
- [67] . 2020. Mesh guided one-shot face reenactment using graph convolutional networks. In Proceedings of the 28th ACM International Conference on Multimedia. 1773–1781.Google Scholar
Digital Library
- [68] . 2020. Fast bi-layer neural synthesis of one-shot realistic head avatars. In Proceedings of the European Conference on Computer Vision. 524–540.Google Scholar
Digital Library
- [69] . 2019. Few-shot adversarial learning of realistic neural talking head models. In Proceedings of the IEEE International Conference on Computer Vision. 9459–9468.Google Scholar
Cross Ref
- [70] . 2021. Fine-grained identity preserving landmark synthesis for face reenactment. arXiv preprint arXiv:2110.04708 (2021).Google Scholar
- [71] . 2022. Progressive meta-learning with curriculum. IEEE Transactions on Circuits and Systems for Video Technology 32, 5 (2022), 5916–5930.Google Scholar
- [72] . 2020. Cross-domain correspondence learning for exemplar-based image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5143–5153.Google Scholar
Cross Ref
- [73] . 2019. One-shot face reenactment. In Proceedings of the British Machine Vision Conference (BMVC’19).Google Scholar
- [74] . 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2223–2232.Google Scholar
Cross Ref
- [75] . 2017. Face alignment in full pose range: A 3D total solution. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 1 (2017), 78–92.Google Scholar
Digital Library
Index Terms
High-Fidelity Face Reenactment Via Identity-Matched Correspondence Learning
Recommendations
Face Reenactment Based Facial Expression Recognition
Advances in Visual ComputingAbstractRepresentations used for Facial Expression Recognition (FER) are usually contaminated with identity specific features. In this paper, we propose a novel Reenactment-based Expression-Representation Learning Generative Adversarial Network (REL-GAN) ...
2D face fitting-assisted 3D face reconstruction for pose-robust face recognition
Special issue on Digital Information ForensicsRecent face recognition algorithm can achieve high accuracy when the tested face samples are frontal. However, when the face pose changes largely, the performance of existing methods drop drastically. Efforts on pose-robust face recognition are highly ...
Automatic Face Reenactment
CVPR '14: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern RecognitionWe propose an image-based, facial reenactment system that replaces the face of an actor in an existing target video with the face of a user from a source video, while preserving the original target performance. Our system is fully automatic and does not ...






Comments