skip to main content
research-article

Real-time 3D neural facial animation from binocular video

Published:19 July 2021Publication History
Skip Abstract Section

Abstract

We present a method for performing real-time facial animation of a 3D avatar from binocular video. Existing facial animation methods fail to automatically capture precise and subtle facial motions for driving a photo-realistic 3D avatar "in-the-wild" (i.e., variability in illumination, camera noise). The novelty of our approach lies in a light-weight process for specializing a personalized face model to new environments that enables extremely accurate real-time face tracking anywhere. Our method uses a pre-trained high-fidelity personalized model of the face that we complement with a novel illumination model to account for variations due to lighting and other factors often encountered in-the-wild (e.g., facial hair growth, makeup, skin blemishes). Our approach comprises two steps. First, we solve for our illumination model's parameters by applying analysis-by-synthesis on a short video recording. Using the pairs of model parameters (rigid, non-rigid) and the original images, we learn a regression for real-time inference from the image space to the 3D shape and texture of the avatar. Second, given a new video, we fine-tune the real-time regression model with a few-shot learning strategy to adapt the regression model to the new environment. We demonstrate our system's ability to precisely capture subtle facial motions in unconstrained scenarios, in comparison to competing methods, on a diverse collection of identities, expressions, and real-world environments.

Skip Supplemental Material Section

Supplemental Material

a87-cao.mp4
3450626.3459806.mp4

References

  1. Sameer Agarwal, Keir Mierle, and Others. 2010. Ceres Solver. http://ceres-solver.org.Google ScholarGoogle Scholar
  2. Timur Bagautdinov, Chenglei Wu, Jason Saragih, Pascal Fua, and Yaser Sheikh. 2018. Modeling Facial Geometry Using Compositional VAEs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  3. T. Baltrušaitis, P. Robinson, and L. Morency. 2012. 3D Constrained Local Model for rigid and non-rigid facial tracking. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. 2610--2617. Google ScholarGoogle ScholarCross RefCross Ref
  4. Volker Blanz, Curzio Basso, Tomaso Poggio, and Thomas Vetter. 2003. Reanimating Faces in Images and Video. Comput. Graph. Forum 22 (09 2003), 641--650. Google ScholarGoogle ScholarCross RefCross Ref
  5. Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques. 187--194.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. James Booth, Anastasios Roussos, Allan Ponniah, David Dunaway, and Stefanos Zafeiriou. 2018. Large scale 3D morphable models. International Journal of Computer Vision 126, 2-4 (2018), 233--254.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Sofien Bouaziz, Yangang Wang, and Mark Pauly. 2013. Online Modeling for Realtime Facial Animation. ACM Transactions on Graphics (TOG) 32, 4 (July 2013), 10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chen Cao, Derek Bradley, Kun Zhou, and Thabo Beeler. 2015. Real-Time High-Fidelity Facial Performance Capture. ACM Trans. Graph. 34, 4, Article 46 (July 2015), 9 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chen Cao, Menglei Chai, Oliver Woodford, and Linjie Luo. 2018. Stabilized Real-Time Face Tracking via a Learned Dynamic Rigidity Prior. ACM Trans. Graph. 37, 6, Article 233 (Dec. 2018), 11 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chen Cao, Qiming Hou, and Kun Zhou. 2014. Displaced Dynamic Expression Regression for Real-time Facial Tracking and Animation. ACM Transactions on Graphics (TOG) 33, 4 (July 2014), 43:1--43:10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chen Cao, Yanlin Weng, Stephen Lin, and Kun Zhou. 2013a. 3D Shape Regression for Real-time Facial Animation. ACM Transactions on Graphics (TOG) 32, 4, Article 41 (July 2013), 10 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chen Cao, Yanlin Weng, Shun Zhou, Yiying Tong, and Kun Zhou. 2013b. Faceware-house: A 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics 20, 3 (2013), 413--425.Google ScholarGoogle Scholar
  13. Dan Casas, Oleg Alexander, Andrew W. Feng, Graham Fyffe, Ryosuke Ichikari, Paul Debevec, Rhuizhe Wang, Evan Suma, and Ari Shapiro. 2015. Rapid Photorealistic Blendshapes from Commodity RGB-D Sensors. In Proceedings of the 19th Symposium on Interactive 3D Graphics and Games (i3D '15). Association for Computing Machinery, New York, NY, USA, 134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jin-xiang Chai, Jing Xiao, and Jessica Hodgins. 2003. Vision-Based Control of 3D Facial Animation. In Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (San Diego, California) (SCA '03). Eurographics Association, Goslar, DEU, 193--206.Google ScholarGoogle Scholar
  15. Y. Chen, H. Wu, F. Shi, X. Tong, and J. Chai. 2013. Accurate and Robust 3D Facial Capture Using a Single RGBD Camera. In 2013 IEEE International Conference on Computer Vision. 3615--3622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Timothy F. Cootes, Gareth J. Edwards, and Christopher J. Taylor. 2001. Active appearance models. IEEE Transactions on pattern analysis and machine intelligence 23, 6 (2001), 681--685.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Douglas Decarlo and Dimitris Metaxas. 2000. Optical flow constraints on deformable models with applications to face tracking. International Journal of Computer Vision 38, 2 (2000), 99--127.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mingsong Dou, Philip Davidson, Sean Ryan Fanello, Sameh Khamis, Adarsh Kowdle, Christoph Rhemann, Vladimir Tankovich, and Shahram Izadi. 2017. Motion2Fusion: Real-time Volumetric Performance Capture. SIGGRAPH Asia (2017).Google ScholarGoogle Scholar
  19. Mingsong Dou, Sameh Khamis, Yury Degtyarev, Philip Davidson, Sean Ryan Fanello, Adarsh Kowdle, Sergio Orts Escolano, Christoph Rhemann, David Kim, Jonathan Taylor, Pushmeet Kohli, Vladimir Tankovich, and Shahram Izadi. 2016. Fusion4D: Real-time Performance Capture of Challenging Scenes. SIGGRAPH (2016).Google ScholarGoogle Scholar
  20. Graham Fyffe, Andrew Jones, Oleg Alexander, Ryosuke Ichikari, and Paul Debevec. 2014. Driving High-Resolution Facial Scans with Video Performance Capture. ACM Transactions on Graphics (TOG) 34, 1, Article 8 (Dec. 2014), 8:1--8:14 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Thomas Gerig, Andreas Morel-Forster, Clemens Blumer, Bernhard Egger, Marcel Luthi, Sandro Schönborn, and Thomas Vetter. 2018. Morphable face models-an open framework. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, 75--82.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Patrik Huber, Guosheng Hu, Rafael Tena, Pouria Mortazavian, P Koppen, William J Christmas, Matthias Ratsch, and Josef Kittler. 2016. A multiresolution 3d morphable face model and fitting framework. In Proceedings of the 11th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications.Google ScholarGoogle ScholarCross RefCross Ref
  23. Jing Xiao, S. Baker, I. Matthews, and T. Kanade. 2004. Real-time combined 2D+3D active appearance models. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., Vol. 2. II--II. Google ScholarGoogle ScholarCross RefCross Ref
  24. F. Kahraman, M. Gokmen, S. Darkner, and R. Larsen. 2007. An Active Illumination and Appearance (AIA) Model for Face Alignment. In 2007 IEEE Conference on Computer Vision and Pattern Recognition. 1--7. Google ScholarGoogle ScholarCross RefCross Ref
  25. Vahid Kazemi and Josephine Sullivan. 2014. One Millisecond Face Alignment with an Ensemble of Regression Trees. In IEEE International Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hyeongwoo Kim, Mohamed Elgharib, Hans-Peter Zollöfer, Michael Seidel, Thabo Beeler, Christian Richardt, and Christian Theobalt. 2019. Neural Style-Preserving Visual Dubbing. ACM Transactions on Graphics (TOG) 38, 6 (2019), 178:1--13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hyeongwoo Kim, Pablo Garrido, Ayush Tewari, Weipeng Xu, Justus Thies, Matthias Nießner, Patrick Pérez, Christian Richardt, Michael Zollöfer, and Christian Theobalt. 2018. Deep Video Portraits. ACM Transactions on Graphics (TOG) 37, 4 (2018), 163.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hyeongwoo Kim, Michael Zollhöfer, Ayush Tewari, Justus Thies, Christian Richardt, and Christian Theobalt. 2017. InverseFaceNet: Deep Single-Shot Inverse Face Rendering From A Single Image. CoRR abs/1703.10956 (2017). arXiv:1703.10956 http://arxiv.org/abs/1703.10956Google ScholarGoogle Scholar
  29. Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).Google ScholarGoogle Scholar
  30. Oliver Klehm, Fabrice Rousselle, Marios Papas, Derek Bradley, Christophe Hery, Bernd Bickel, Wojciech Jarosz, and Thabo Beeler. 2015. Recent Advances in Facial Appearance Capture. Computer Graphics Forum (Proceedings of Eurographics - State of the Art Reports) 34, 2 (May 2015), 709--733. https://doi.org/10/f7mb4bGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  31. Samuli Laine, Tero Karras, Timo Aila, Antti Herva, Shunsuke Saito, Ronald Yu, Hao Li, and Jaakko Lehtinen. 2017. Production-level Facial Performance Capture Using Deep Convolutional Neural Networks. In Proceedings of the ACM SIGGRAPH / Eurographics Symposium on Computer Animation. Article 10, 10 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. P. Lewis, Ken Anjyo, Taehyun Rhee, Mengjie Zhang, Fred Pighin, and Zhigang Deng. 2014. Practice and Theory of Blendshape Facial Models. In Eurographics 2014 - State of the Art Reports, Sylvain Lefebvre and Michela Spagnuolo (Eds.). The Eurographics Association. Google ScholarGoogle ScholarCross RefCross Ref
  33. Hao Li, Jihun Yu, Yuting Ye, and Chris Bregler. 2013. Realtime Facial Animation with On-the-Fly Correctives. ACM Trans. Graph. 32, 4, Article 42 (July 2013), 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Tianye Li, Timo Bolkart, Michael J Black, Hao Li, and Javier Romero. 2017. Learning a model of facial shape and expression from 4D scans. ACM Trans. Graph. 36, 6 (2017), 194--1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Zhengqin Li, Zexiang Xu, Ravi Ramamoorthi, Kalyan Sunkavalli, and Manmohan Chandraker. 2018. Learning to reconstruct shape and spatially-varying reflectance from a single image. In SIGGRAPH Asia 2018 Technical Papers. ACM, 269.Google ScholarGoogle Scholar
  36. Stephen Lombardi, Jason Saragih, Tomas Simon, and Yaser Sheikh. 2018. Deep Appearance Models for Face Rendering. ACM Trans. Graph. 37, 4, Article 68 (July 2018), 13 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ricardo Martin-Brualla, Rohit Pandey, Shuoran Yang, Pavel Pidlypenskyi, Jonathan Taylor, Julien Valentin, Sameh Khamis, Philip Davidson, Anastasia Tkach, Peter Lincoln, et al. 2018. Lookingood: Enhancing performance capture with real-time neural re-rendering. arXiv preprint arXiv:1811.05029 (2018).Google ScholarGoogle Scholar
  38. Iain Matthews and Simon Baker. 2004. Active appearance models revisited. International journal of computer vision 60, 2 (2004), 135--164.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. McDonagh, M. Klaudiny, D. Bradley, T. Beeler, I. Matthews, and K. Mitchell. 2016. Synthetic Prior Design for Real-Time Face Tracking. In International Conference on 3D Vision (3DV). 639--648.Google ScholarGoogle Scholar
  40. Koki Nagano, Jaewoo Seo, Jun Xing, Lingyu Wei, Zimo Li, Shunsuke Saito, Aviral Agarwal, Jens Fursund, and Hao Li. 2018. PaGAN: Real-Time Avatars Using Dynamic Textures. ACM Trans. Graph. 37, 6, Article 258 (Dec. 2018), 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Kyle Olszewski, Joseph J. Lim, Shunsuke Saito, and Hao Li. 2016. High-fidelity Facial and Speech Animation for VR HMDs. ACM Transactions on Graphics (TOG) 35, 6, Article 221 (Nov. 2016), 14 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Sergio Orts-Escolano, Christoph Rhemann, Sean Fanello, Wayne Chang, Adarsh Kowdle, Yury Degtyarev, David Kim, Philip L. Davidson, Sameh Khamis, Mingsong Dou, Vladimir Tankovich, Charles Loop, Qin Cai, Philip A. Chou, Sarah Mennicken, Julien Valentin, Vivek Pradeep, Shenlong Wang, Sing Bing Kang, Pushmeet Kohli, Yuliya Lutchyn, Cem Keskin, and Shahram Izadi. 2016. Holoportation: Virtual 3D Teleportation in Real-time. In UIST.Google ScholarGoogle Scholar
  43. Rohit Pandey, Anastasia Tkach, Shuoran Yang, Pavel Pidlypenskyi, Jonathan Taylor, Ricardo Martin-Brualla, Andrea Tagliasacchi, George Papandreou, Philip Davidson, Cem Keskin, et al. 2019. Volumetric capture of humans with a single rgbd camera via semi-parametric learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9709--9718.Google ScholarGoogle ScholarCross RefCross Ref
  44. J. M. Saragih, S. Lucey, and J. F. Cohn. 2011. Real-time avatar animation from a single image. In 2011 IEEE International Conference on Automatic Face Gesture Recognition (FG). 213--220. Google ScholarGoogle ScholarCross RefCross Ref
  45. Gabriel Schwartz, Shih-En Wei, Te-Li Wang, Stephen Lombardi, Tomas Simon, Jason Saragih, and Yaser Sheikh. 2020. The Eyes Have It: An Integrated Eye and Face Model for Photorealistic Facial Animation. ACM Trans. Graph. 39, 4, Article 91 (July 2020), 15 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Soumyadip Sengupta, Angjoo Kanazawa, Carlos D Castillo, and David W Jacobs. 2018. SfSNet: Learning Shape, Reflectance and Illuminance of Facesin the Wild'. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6296--6305.Google ScholarGoogle ScholarCross RefCross Ref
  47. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  48. J. Rafael Tena, Fernando De la Torre, and Iain Matthews. 2011. Interactive Region-Based Linear 3D Face Models. Association for Computing Machinery, New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Ayush Tewari, Michael Zollöfer, Hyeongwoo Kim, Pablo Garrido, Florian Bernard, Patrick Perez, and Christian Theobalt. 2017. MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction. In IEEE International Conference on Computer Vision (ICCV).Google ScholarGoogle Scholar
  50. J. Thies, M. Zollhöfer, M. Nießner, L. Valgaerts, M. Stamminger, and C. Theobalt. 2015. Real-time Expression Transfer for Facial Reenactment. ACM Transactions on Graphics (TOG) 34, 6 (2015).Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Justus Thies, Michael Zollhöfer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2016. Face2Face: Real-time Face Capture and Reenactment of RGB Videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Justus Thies, Michael Zollhöfer, Christian Theobalt, Marc Stamminger, and Matthias Niessner. 2018. HeadOn: Real-Time Reenactment of Human Portrait Videos. ACM Trans. Graph. 37, 4, Article 164 (July 2018), 13 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Anh Tuan Tran, Tal Hassner, Iacopo Masi, and Gerard Medioni. 2017. Regressing Robust and Discriminative 3D Morphable Models with a very Deep Neural Network. In Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  54. Luan Tran, Feng Liu, and Xiaoming Liu. 2019. Towards High-fidelity Nonlinear 3D Face Morphable Model. In In Proceeding of IEEE Computer Vision and Pattern Recognition. Long Beach, CA.Google ScholarGoogle Scholar
  55. Luan Tran and Xiaoming Liu. 2018. Nonlinear 3D Face Morphable Model. In In Proceeding of IEEE Computer Vision and Pattern Recognition. Salt Lake City, UT.Google ScholarGoogle Scholar
  56. Georgios Tzimiropoulos, Joan Alabort-i Medina, Stefanos Zafeiriou, and Maja Pantic. 2013. Generic Active Appearance Models Revisited. Springer Berlin Heidelberg, Berlin, Heidelberg, 650--663.Google ScholarGoogle Scholar
  57. Levi Valgaerts, Chenglei Wu, Andrés Bruhn, Hans-Peter Seidel, and Christian Theobalt. 2012. Lightweight Binocular Facial Performance Capture under Uncontrolled Lighting. In ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia 2012), Vol. 31. 187:1--187:11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Daniel Vlasic, Matthew Brand, Hanspeter Pfister, and Jovan Popović. 2005. Face Transfer with Multilinear Models. ACM Trans. Graph. 24, 3 (July 2005), 426--433. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Shih-En Wei, Jason Saragih, Tomas Simon, Adam W. Harley, Stephen Lombardi, Michal Perdoch, Alexander Hypes, Dawei Wang, Hernan Badino, and Yaser Sheikh. 2019. VR Facial Animation via Multiview Image Translation. ACM Trans. Graph. 38, 4, Article 67 (July 2019), 16 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Thibaut Weise, Sofien Bouaziz, Hao Li, and Mark Pauly. 2011. Realtime Performance-Based Facial Animation. Association for Computing Machinery, New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Xuehan Xiong and Fernando De la Torre. 2013. Supervised Descent Method and Its Applications to Face Alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Jae Shin Yoon, Takaaki Shiratori, Shoou-I Yu, and Hyun Soo Park. 2019. Self-supervised adaptation of high-fidelity face models for monocular performance tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4601--4609.Google ScholarGoogle ScholarCross RefCross Ref
  63. Alan Yuille and Daniel Kersten. 2006. Vision as Bayesian inference: analysis by synthesis? Trends in cognitive sciences 10, 7 (2006), 301--308.Google ScholarGoogle Scholar
  64. Egor Zakharov, Aliaksandra Shysheya, Egor Burkov, and Victor Lempitsky. 2019. Few-shot adversarial learning of realistic neural talking head models. In Proceedings of the IEEE International Conference on Computer Vision. 9459--9468.Google ScholarGoogle ScholarCross RefCross Ref
  65. Richard Zhang. 2019. Making convolutional networks shift-invariant again. arXiv preprint arXiv:1904.11486 (2019).Google ScholarGoogle Scholar
  66. Michael Zollhöfer, Justus Thies, Pablo Garrido, Derek Bradley, Thabo Beeler, Patrick Pérez, Marc Stamminger, Matthias Nießner, and Christian Theobalt. 2018. State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications. Computer Graphics Forum (2018). Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Real-time 3D neural facial animation from binocular video

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Graphics
        ACM Transactions on Graphics  Volume 40, Issue 4
        August 2021
        2170 pages
        ISSN:0730-0301
        EISSN:1557-7368
        DOI:10.1145/3450626
        Issue’s Table of Contents

        Copyright © 2021 Owner/Author

        This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 July 2021
        Published in tog Volume 40, Issue 4

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader