Abstract
We present a method for performing real-time facial animation of a 3D avatar from binocular video. Existing facial animation methods fail to automatically capture precise and subtle facial motions for driving a photo-realistic 3D avatar "in-the-wild" (i.e., variability in illumination, camera noise). The novelty of our approach lies in a light-weight process for specializing a personalized face model to new environments that enables extremely accurate real-time face tracking anywhere. Our method uses a pre-trained high-fidelity personalized model of the face that we complement with a novel illumination model to account for variations due to lighting and other factors often encountered in-the-wild (e.g., facial hair growth, makeup, skin blemishes). Our approach comprises two steps. First, we solve for our illumination model's parameters by applying analysis-by-synthesis on a short video recording. Using the pairs of model parameters (rigid, non-rigid) and the original images, we learn a regression for real-time inference from the image space to the 3D shape and texture of the avatar. Second, given a new video, we fine-tune the real-time regression model with a few-shot learning strategy to adapt the regression model to the new environment. We demonstrate our system's ability to precisely capture subtle facial motions in unconstrained scenarios, in comparison to competing methods, on a diverse collection of identities, expressions, and real-world environments.
Supplemental Material
- Sameer Agarwal, Keir Mierle, and Others. 2010. Ceres Solver. http://ceres-solver.org.Google Scholar
- Timur Bagautdinov, Chenglei Wu, Jason Saragih, Pascal Fua, and Yaser Sheikh. 2018. Modeling Facial Geometry Using Compositional VAEs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Cross Ref
- T. Baltrušaitis, P. Robinson, and L. Morency. 2012. 3D Constrained Local Model for rigid and non-rigid facial tracking. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. 2610--2617. Google Scholar
Cross Ref
- Volker Blanz, Curzio Basso, Tomaso Poggio, and Thomas Vetter. 2003. Reanimating Faces in Images and Video. Comput. Graph. Forum 22 (09 2003), 641--650. Google Scholar
Cross Ref
- Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques. 187--194.Google Scholar
Digital Library
- James Booth, Anastasios Roussos, Allan Ponniah, David Dunaway, and Stefanos Zafeiriou. 2018. Large scale 3D morphable models. International Journal of Computer Vision 126, 2-4 (2018), 233--254.Google Scholar
Digital Library
- Sofien Bouaziz, Yangang Wang, and Mark Pauly. 2013. Online Modeling for Realtime Facial Animation. ACM Transactions on Graphics (TOG) 32, 4 (July 2013), 10.Google Scholar
Digital Library
- Chen Cao, Derek Bradley, Kun Zhou, and Thabo Beeler. 2015. Real-Time High-Fidelity Facial Performance Capture. ACM Trans. Graph. 34, 4, Article 46 (July 2015), 9 pages. Google Scholar
Digital Library
- Chen Cao, Menglei Chai, Oliver Woodford, and Linjie Luo. 2018. Stabilized Real-Time Face Tracking via a Learned Dynamic Rigidity Prior. ACM Trans. Graph. 37, 6, Article 233 (Dec. 2018), 11 pages.Google Scholar
Digital Library
- Chen Cao, Qiming Hou, and Kun Zhou. 2014. Displaced Dynamic Expression Regression for Real-time Facial Tracking and Animation. ACM Transactions on Graphics (TOG) 33, 4 (July 2014), 43:1--43:10.Google Scholar
Digital Library
- Chen Cao, Yanlin Weng, Stephen Lin, and Kun Zhou. 2013a. 3D Shape Regression for Real-time Facial Animation. ACM Transactions on Graphics (TOG) 32, 4, Article 41 (July 2013), 10 pages.Google Scholar
Digital Library
- Chen Cao, Yanlin Weng, Shun Zhou, Yiying Tong, and Kun Zhou. 2013b. Faceware-house: A 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics 20, 3 (2013), 413--425.Google Scholar
- Dan Casas, Oleg Alexander, Andrew W. Feng, Graham Fyffe, Ryosuke Ichikari, Paul Debevec, Rhuizhe Wang, Evan Suma, and Ari Shapiro. 2015. Rapid Photorealistic Blendshapes from Commodity RGB-D Sensors. In Proceedings of the 19th Symposium on Interactive 3D Graphics and Games (i3D '15). Association for Computing Machinery, New York, NY, USA, 134. Google Scholar
Digital Library
- Jin-xiang Chai, Jing Xiao, and Jessica Hodgins. 2003. Vision-Based Control of 3D Facial Animation. In Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (San Diego, California) (SCA '03). Eurographics Association, Goslar, DEU, 193--206.Google Scholar
- Y. Chen, H. Wu, F. Shi, X. Tong, and J. Chai. 2013. Accurate and Robust 3D Facial Capture Using a Single RGBD Camera. In 2013 IEEE International Conference on Computer Vision. 3615--3622. Google Scholar
Digital Library
- Timothy F. Cootes, Gareth J. Edwards, and Christopher J. Taylor. 2001. Active appearance models. IEEE Transactions on pattern analysis and machine intelligence 23, 6 (2001), 681--685.Google Scholar
Digital Library
- Douglas Decarlo and Dimitris Metaxas. 2000. Optical flow constraints on deformable models with applications to face tracking. International Journal of Computer Vision 38, 2 (2000), 99--127.Google Scholar
Digital Library
- Mingsong Dou, Philip Davidson, Sean Ryan Fanello, Sameh Khamis, Adarsh Kowdle, Christoph Rhemann, Vladimir Tankovich, and Shahram Izadi. 2017. Motion2Fusion: Real-time Volumetric Performance Capture. SIGGRAPH Asia (2017).Google Scholar
- Mingsong Dou, Sameh Khamis, Yury Degtyarev, Philip Davidson, Sean Ryan Fanello, Adarsh Kowdle, Sergio Orts Escolano, Christoph Rhemann, David Kim, Jonathan Taylor, Pushmeet Kohli, Vladimir Tankovich, and Shahram Izadi. 2016. Fusion4D: Real-time Performance Capture of Challenging Scenes. SIGGRAPH (2016).Google Scholar
- Graham Fyffe, Andrew Jones, Oleg Alexander, Ryosuke Ichikari, and Paul Debevec. 2014. Driving High-Resolution Facial Scans with Video Performance Capture. ACM Transactions on Graphics (TOG) 34, 1, Article 8 (Dec. 2014), 8:1--8:14 pages.Google Scholar
Digital Library
- Thomas Gerig, Andreas Morel-Forster, Clemens Blumer, Bernhard Egger, Marcel Luthi, Sandro Schönborn, and Thomas Vetter. 2018. Morphable face models-an open framework. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, 75--82.Google Scholar
Digital Library
- Patrik Huber, Guosheng Hu, Rafael Tena, Pouria Mortazavian, P Koppen, William J Christmas, Matthias Ratsch, and Josef Kittler. 2016. A multiresolution 3d morphable face model and fitting framework. In Proceedings of the 11th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications.Google Scholar
Cross Ref
- Jing Xiao, S. Baker, I. Matthews, and T. Kanade. 2004. Real-time combined 2D+3D active appearance models. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., Vol. 2. II--II. Google Scholar
Cross Ref
- F. Kahraman, M. Gokmen, S. Darkner, and R. Larsen. 2007. An Active Illumination and Appearance (AIA) Model for Face Alignment. In 2007 IEEE Conference on Computer Vision and Pattern Recognition. 1--7. Google Scholar
Cross Ref
- Vahid Kazemi and Josephine Sullivan. 2014. One Millisecond Face Alignment with an Ensemble of Regression Trees. In IEEE International Conference on Computer Vision and Pattern Recognition.Google Scholar
Digital Library
- Hyeongwoo Kim, Mohamed Elgharib, Hans-Peter Zollöfer, Michael Seidel, Thabo Beeler, Christian Richardt, and Christian Theobalt. 2019. Neural Style-Preserving Visual Dubbing. ACM Transactions on Graphics (TOG) 38, 6 (2019), 178:1--13.Google Scholar
Digital Library
- Hyeongwoo Kim, Pablo Garrido, Ayush Tewari, Weipeng Xu, Justus Thies, Matthias Nießner, Patrick Pérez, Christian Richardt, Michael Zollöfer, and Christian Theobalt. 2018. Deep Video Portraits. ACM Transactions on Graphics (TOG) 37, 4 (2018), 163.Google Scholar
Digital Library
- Hyeongwoo Kim, Michael Zollhöfer, Ayush Tewari, Justus Thies, Christian Richardt, and Christian Theobalt. 2017. InverseFaceNet: Deep Single-Shot Inverse Face Rendering From A Single Image. CoRR abs/1703.10956 (2017). arXiv:1703.10956 http://arxiv.org/abs/1703.10956Google Scholar
- Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).Google Scholar
- Oliver Klehm, Fabrice Rousselle, Marios Papas, Derek Bradley, Christophe Hery, Bernd Bickel, Wojciech Jarosz, and Thabo Beeler. 2015. Recent Advances in Facial Appearance Capture. Computer Graphics Forum (Proceedings of Eurographics - State of the Art Reports) 34, 2 (May 2015), 709--733. https://doi.org/10/f7mb4bGoogle Scholar
Digital Library
- Samuli Laine, Tero Karras, Timo Aila, Antti Herva, Shunsuke Saito, Ronald Yu, Hao Li, and Jaakko Lehtinen. 2017. Production-level Facial Performance Capture Using Deep Convolutional Neural Networks. In Proceedings of the ACM SIGGRAPH / Eurographics Symposium on Computer Animation. Article 10, 10 pages.Google Scholar
Digital Library
- J. P. Lewis, Ken Anjyo, Taehyun Rhee, Mengjie Zhang, Fred Pighin, and Zhigang Deng. 2014. Practice and Theory of Blendshape Facial Models. In Eurographics 2014 - State of the Art Reports, Sylvain Lefebvre and Michela Spagnuolo (Eds.). The Eurographics Association. Google Scholar
Cross Ref
- Hao Li, Jihun Yu, Yuting Ye, and Chris Bregler. 2013. Realtime Facial Animation with On-the-Fly Correctives. ACM Trans. Graph. 32, 4, Article 42 (July 2013), 10 pages. Google Scholar
Digital Library
- Tianye Li, Timo Bolkart, Michael J Black, Hao Li, and Javier Romero. 2017. Learning a model of facial shape and expression from 4D scans. ACM Trans. Graph. 36, 6 (2017), 194--1.Google Scholar
Digital Library
- Zhengqin Li, Zexiang Xu, Ravi Ramamoorthi, Kalyan Sunkavalli, and Manmohan Chandraker. 2018. Learning to reconstruct shape and spatially-varying reflectance from a single image. In SIGGRAPH Asia 2018 Technical Papers. ACM, 269.Google Scholar
- Stephen Lombardi, Jason Saragih, Tomas Simon, and Yaser Sheikh. 2018. Deep Appearance Models for Face Rendering. ACM Trans. Graph. 37, 4, Article 68 (July 2018), 13 pages.Google Scholar
Digital Library
- Ricardo Martin-Brualla, Rohit Pandey, Shuoran Yang, Pavel Pidlypenskyi, Jonathan Taylor, Julien Valentin, Sameh Khamis, Philip Davidson, Anastasia Tkach, Peter Lincoln, et al. 2018. Lookingood: Enhancing performance capture with real-time neural re-rendering. arXiv preprint arXiv:1811.05029 (2018).Google Scholar
- Iain Matthews and Simon Baker. 2004. Active appearance models revisited. International journal of computer vision 60, 2 (2004), 135--164.Google Scholar
Digital Library
- S. McDonagh, M. Klaudiny, D. Bradley, T. Beeler, I. Matthews, and K. Mitchell. 2016. Synthetic Prior Design for Real-Time Face Tracking. In International Conference on 3D Vision (3DV). 639--648.Google Scholar
- Koki Nagano, Jaewoo Seo, Jun Xing, Lingyu Wei, Zimo Li, Shunsuke Saito, Aviral Agarwal, Jens Fursund, and Hao Li. 2018. PaGAN: Real-Time Avatars Using Dynamic Textures. ACM Trans. Graph. 37, 6, Article 258 (Dec. 2018), 12 pages. Google Scholar
Digital Library
- Kyle Olszewski, Joseph J. Lim, Shunsuke Saito, and Hao Li. 2016. High-fidelity Facial and Speech Animation for VR HMDs. ACM Transactions on Graphics (TOG) 35, 6, Article 221 (Nov. 2016), 14 pages.Google Scholar
Digital Library
- Sergio Orts-Escolano, Christoph Rhemann, Sean Fanello, Wayne Chang, Adarsh Kowdle, Yury Degtyarev, David Kim, Philip L. Davidson, Sameh Khamis, Mingsong Dou, Vladimir Tankovich, Charles Loop, Qin Cai, Philip A. Chou, Sarah Mennicken, Julien Valentin, Vivek Pradeep, Shenlong Wang, Sing Bing Kang, Pushmeet Kohli, Yuliya Lutchyn, Cem Keskin, and Shahram Izadi. 2016. Holoportation: Virtual 3D Teleportation in Real-time. In UIST.Google Scholar
- Rohit Pandey, Anastasia Tkach, Shuoran Yang, Pavel Pidlypenskyi, Jonathan Taylor, Ricardo Martin-Brualla, Andrea Tagliasacchi, George Papandreou, Philip Davidson, Cem Keskin, et al. 2019. Volumetric capture of humans with a single rgbd camera via semi-parametric learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9709--9718.Google Scholar
Cross Ref
- J. M. Saragih, S. Lucey, and J. F. Cohn. 2011. Real-time avatar animation from a single image. In 2011 IEEE International Conference on Automatic Face Gesture Recognition (FG). 213--220. Google Scholar
Cross Ref
- Gabriel Schwartz, Shih-En Wei, Te-Li Wang, Stephen Lombardi, Tomas Simon, Jason Saragih, and Yaser Sheikh. 2020. The Eyes Have It: An Integrated Eye and Face Model for Photorealistic Facial Animation. ACM Trans. Graph. 39, 4, Article 91 (July 2020), 15 pages.Google Scholar
Digital Library
- Soumyadip Sengupta, Angjoo Kanazawa, Carlos D Castillo, and David W Jacobs. 2018. SfSNet: Learning Shape, Reflectance and Illuminance of Facesin the Wild'. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6296--6305.Google Scholar
Cross Ref
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- J. Rafael Tena, Fernando De la Torre, and Iain Matthews. 2011. Interactive Region-Based Linear 3D Face Models. Association for Computing Machinery, New York, NY, USA. Google Scholar
Digital Library
- Ayush Tewari, Michael Zollöfer, Hyeongwoo Kim, Pablo Garrido, Florian Bernard, Patrick Perez, and Christian Theobalt. 2017. MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction. In IEEE International Conference on Computer Vision (ICCV).Google Scholar
- J. Thies, M. Zollhöfer, M. Nießner, L. Valgaerts, M. Stamminger, and C. Theobalt. 2015. Real-time Expression Transfer for Facial Reenactment. ACM Transactions on Graphics (TOG) 34, 6 (2015).Google Scholar
Digital Library
- Justus Thies, Michael Zollhöfer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2016. Face2Face: Real-time Face Capture and Reenactment of RGB Videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Digital Library
- Justus Thies, Michael Zollhöfer, Christian Theobalt, Marc Stamminger, and Matthias Niessner. 2018. HeadOn: Real-Time Reenactment of Human Portrait Videos. ACM Trans. Graph. 37, 4, Article 164 (July 2018), 13 pages. Google Scholar
Digital Library
- Anh Tuan Tran, Tal Hassner, Iacopo Masi, and Gerard Medioni. 2017. Regressing Robust and Discriminative 3D Morphable Models with a very Deep Neural Network. In Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Luan Tran, Feng Liu, and Xiaoming Liu. 2019. Towards High-fidelity Nonlinear 3D Face Morphable Model. In In Proceeding of IEEE Computer Vision and Pattern Recognition. Long Beach, CA.Google Scholar
- Luan Tran and Xiaoming Liu. 2018. Nonlinear 3D Face Morphable Model. In In Proceeding of IEEE Computer Vision and Pattern Recognition. Salt Lake City, UT.Google Scholar
- Georgios Tzimiropoulos, Joan Alabort-i Medina, Stefanos Zafeiriou, and Maja Pantic. 2013. Generic Active Appearance Models Revisited. Springer Berlin Heidelberg, Berlin, Heidelberg, 650--663.Google Scholar
- Levi Valgaerts, Chenglei Wu, Andrés Bruhn, Hans-Peter Seidel, and Christian Theobalt. 2012. Lightweight Binocular Facial Performance Capture under Uncontrolled Lighting. In ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia 2012), Vol. 31. 187:1--187:11. Google Scholar
Digital Library
- Daniel Vlasic, Matthew Brand, Hanspeter Pfister, and Jovan Popović. 2005. Face Transfer with Multilinear Models. ACM Trans. Graph. 24, 3 (July 2005), 426--433. Google Scholar
Digital Library
- Shih-En Wei, Jason Saragih, Tomas Simon, Adam W. Harley, Stephen Lombardi, Michal Perdoch, Alexander Hypes, Dawei Wang, Hernan Badino, and Yaser Sheikh. 2019. VR Facial Animation via Multiview Image Translation. ACM Trans. Graph. 38, 4, Article 67 (July 2019), 16 pages. Google Scholar
Digital Library
- Thibaut Weise, Sofien Bouaziz, Hao Li, and Mark Pauly. 2011. Realtime Performance-Based Facial Animation. Association for Computing Machinery, New York, NY, USA. Google Scholar
Digital Library
- Xuehan Xiong and Fernando De la Torre. 2013. Supervised Descent Method and Its Applications to Face Alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Digital Library
- Jae Shin Yoon, Takaaki Shiratori, Shoou-I Yu, and Hyun Soo Park. 2019. Self-supervised adaptation of high-fidelity face models for monocular performance tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4601--4609.Google Scholar
Cross Ref
- Alan Yuille and Daniel Kersten. 2006. Vision as Bayesian inference: analysis by synthesis? Trends in cognitive sciences 10, 7 (2006), 301--308.Google Scholar
- Egor Zakharov, Aliaksandra Shysheya, Egor Burkov, and Victor Lempitsky. 2019. Few-shot adversarial learning of realistic neural talking head models. In Proceedings of the IEEE International Conference on Computer Vision. 9459--9468.Google Scholar
Cross Ref
- Richard Zhang. 2019. Making convolutional networks shift-invariant again. arXiv preprint arXiv:1904.11486 (2019).Google Scholar
- Michael Zollhöfer, Justus Thies, Pablo Garrido, Derek Bradley, Thabo Beeler, Patrick Pérez, Marc Stamminger, Matthias Nießner, and Christian Theobalt. 2018. State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications. Computer Graphics Forum (2018). Google Scholar
Cross Ref
Index Terms
Real-time 3D neural facial animation from binocular video
Recommendations
Synthesizing realistic facial expressions from photographs
SIGGRAPH '05: ACM SIGGRAPH 2005 CoursesWe present new techniques for creating photorealistic textured 3D facial models from photographs of a human subject, and for creating smooth transitions between different facial expressions by morphing between these different models. Starting from ...
Easy acquisition and real-time animation of facial wrinkles
Facial animation details like wrinkles or bulges are very useful for the analysis and the interpretation of facial emotions and expressions. However, outfitting a virtual face with expression details for real-time applications is a difficult task. In ...
Animatomy: an Animator-centric, Anatomically Inspired System for 3D Facial Modeling, Animation and Transfer
SA '22: SIGGRAPH Asia 2022 Conference PapersWe present Animatomy, a novel anatomic+animator centric representation of the human face. Present FACS-based systems are plagued with problems of face muscle separation, coverage, opposition, and redundancy. We, therefore, propose a collection of ...





Comments