Abstract
We present a process for rendering a realistic facial performance with control of viewpoint and illumination. The performance is based on one or more high-quality geometry and reflectance scans of an actor in static poses, driven by one or more video streams of a performance. We compute optical flow correspondences between neighboring video frames, and a sparse set of correspondences between static scans and video frames. The latter are made possible by leveraging the relightability of the static 3D scans to match the viewpoint(s) and appearance of the actor in videos taken in arbitrary environments. As optical flow tends to compute proper correspondence for some areas but not others, we also compute a smoothed, per-pixel confidence map for every computed flow, based on normalized cross-correlation. These flows and their confidences yield a set of weighted triangulation constraints among the static poses and the frames of a performance. Given a single artist-prepared face mesh for one static pose, we optimally combine the weighted triangulation constraints, along with a shape regularization term, into a consistent 3D geometry solution over the entire performance that is drift free by construction. In contrast to previous work, even partial correspondences contribute to drift minimization, for example, where a successful match is found in the eye region but not the mouth. Our shape regularization employs a differential shape term based on a spatially varying blend of the differential shapes of the static poses and neighboring dynamic poses, weighted by the associated flow confidences. These weights also permit dynamic reflectance maps to be produced for the performance by blending the static scan maps. Finally, as the geometry and maps are represented on a consistent artist-friendly mesh, we render the resulting high-quality animated face geometry and animated reflectance maps using standard rendering tools.
Supplemental Material
Available for Download
Supplemental movie and image files for, Driving High-Resolution Facial Scans with Video Performance Capture
References
- O. Alexander, M. Rogers, W. Lambeth, M. Chiang, and P. Debevec. 2009. Creating a photoreal digital actor: The digital Emily project. In Proceedings of the Conference on Visual Media Production (CVMP'09). 176--187. Google Scholar
Digital Library
- S. Baker, D. Scharstein, J. P. Lewis, S. Roth, M. J. Black, and R. Szeliski. 2011. A database and evaluation methodology for optical flow. Int. J. Comput. Vis. 92, 1, 1--31. Google Scholar
Digital Library
- T. Beeler, B. Bickel, P. Beardsley, B. Sumner, and M. Gross. 2010. High-quality single-shot capture of facial geometry. ACM Trans. Graph. 29, 3, 40:1--40:9. Google Scholar
Digital Library
- T. Beeler, F. Hahn, D. Bradley, B. Bickel, P. Beardsley, C. Gotsman, R. W. Sumner, and M. Gross. 2011. High-quality passive facial performance capture using anchor frames. In ACM SIGGRAPH Papers. ACM Press, New York, 75:1--75:10. Google Scholar
Digital Library
- B. Bickel, M. Lang, M. Botsch, M. A. Otaduy, and M. Gross. 2008. Pose-space animation and transfer of facial details. In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Graphics and Interactive Techniques. Eurographics Association, Aire-la-Ville, Switzerland. 57--66. Google Scholar
Digital Library
- G. Borshukov, D. Piponi, O. Larsen, J. P. Lewis, and C. Tempelaar-Lietz. 2003. Universal capture: Image-based facial animation for “the matrix reloaded.” In Proceedings of the ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques, A. P. Rockwood, Ed. ACM Press, New York. Google Scholar
Digital Library
- D. Bradley, W. Heidrich, T. Popa, and A. Sheffer. 2010. High resolution passive facial performance capture. In ACM SIGGRAPH Papers. ACM Press, New York, 41:1--41:10. Google Scholar
Digital Library
- T. F. Cootes, G. J. Edwards, and C. J. Taylor. 1998. Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23, 6, 484--498. Google Scholar
Digital Library
- P. Debevec. 1998. Rendering synthetic objects into real scenes: Bridging traditional and image-based graphics with global illumination and high dynamic range photography. In Proceedings of the 25th Annual ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques. ACM Press, New York, 189--198. Google Scholar
Digital Library
- D. DeCarlo and D. Metaxas. 1996. The integration of optical flow and deformable models with applications to human face shape and motion estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'96). 231--238. Google Scholar
Digital Library
- P. Ekman and W. Friesen. 1978. Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Palo Alto, CA.Google Scholar
- A. Ghosh, G. Fyffe, B. Tunwattanapong, J. Busch, X. Yu, and P. Debevec. 2011. Multiview face capture using polarized spherical gradient illumination. In Proceedings of the SIGGRAPH Asia Conference. ACM Press, New York, 129:1--129:10. Google Scholar
Digital Library
- B. Guenter, C. Grimm, D. Wood, H. Malvar, and F. Pighin. 1998. Making faces. In Proceedings of the 25th Annual ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques. ACM Press, New York, 55--66. Google Scholar
Digital Library
- T. Hawkins, A. Wenger, C. Tchou, A. Gardner, F. Goransson, and P. Debevec. 2004. Animatable facial reflectance fields. In Proceedings of the 15th Eurographics Workshop on Rendering Techniques. 309--320. Google Scholar
Digital Library
- H. Huang, J. Chai, X. Tong, and H.-T. Wu. 2011. Leveraging motion capture and 3d scanning for high-fidelity facial performance acquisition. ACM Trans. Graph. 30, 4, 74:1--74:10. Google Scholar
Digital Library
- J. Jimenez, A. Jarabo, D. Gutierrez, E. Danvoye, and J. Von Der Pahlen. 2012. Separable subsurface scattering and photorealistic eyes rendering. In ACM SIGGRAPH Courses. ACM Press, New York. Google Scholar
Digital Library
- M. Klaudiny and A. Hilton. 2012. High-detail 3d capture and non-sequential alignment of facial performance. In Proceedings of the International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DPVT'12). Google Scholar
Digital Library
- M. Klaudiny, A. Hilton, and J. Edge. 2010. High-detail 3d capture of facial performance. In Proceedings of the International Conference on 3D Imaging, Modeling, Processing, Visualization and Trasmission (3DPVT'10).Google Scholar
- V. Kolmogorov. 2006. Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Anal. Mach. Intell. 28, 10, 1568--1583. Google Scholar
Digital Library
- H. Li, P. Roivainen, and R. Forcheimer. 1993. 3-d motion estimation in model-based facial image coding. IEEE Trans. Pattern Anal. Mach. Intell. 15, 6, 545--555. Google Scholar
Digital Library
- W.-C. Ma, A. Jones, J.-Y. Chiang, T. Hawkins, S. Frederiksen, P. Peers, M. Vukovic, M. Ouhyoung, and P. Debevec. 2008. Facial performance synthesis using deformation-driven polynomial displacement maps. ACM Trans. Graph. 27, 5, 121:1--121:10. Google Scholar
Digital Library
- K. Nishino and S. K. Navar. 2004. Eyes for relighting. ACM Trans. Graph. 23, 3, 704--711. Google Scholar
Digital Library
- M. Park, S. Kashyap, R. Collins, and Y. Liu. 2010. Data driven mean-shift belief propagation for non-gaussian mrfs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'10). 3547--3554.Google Scholar
- T. Popa, I. South Ickinson, D. Bradley, A. Sheffer, and W. Heidrich. 2010. Globally consistent space-time reconstruction. In Proceedings of the Eurographics Symposium on Geometry Processing (SGP'10).Google Scholar
- Y. Seol, J. Lewis, J. Seo, B. Choi, K. Anjyo, and J. Noh. 2012. Spacetime expression cloning for blendshapes. ACM Trans. Graph. 31, 2, 14:1--14:12. Google Scholar
Digital Library
- L. Valgaerts, C. Wu, A. Bruhn, H.-P. Seidel, and C. Theobalt. 2012. Lightweight binocular facial performance capture under uncontrolled lighting. ACM Trans. Graph. 31, 6, 187:1--187:11. Google Scholar
Digital Library
- T. Weise, S. Bouaziz, H. Li, and M. Pauly. 2011. Realtime performance-based facial animation. In ACM SIGGRAPH Papers. ACM Press, New York, 77:1--77:10. Google Scholar
Digital Library
- M. Werlberger. 2012. Convex approaches for high performance video processing. Ph.D. thesis, Institute for Computer Graphics and Vision, Graz University of Technology, Graz, Austria.Google Scholar
- L. Zhang, N. Snavely, B. Curless, and S. M. Seitz. 2004. Spacetime faces: High resolution capture for modeling and animation. In ACM SIGGRAPH Papers. ACM Press, New York, 548--558. Google Scholar
Digital Library
- X. Zhu and D. Ramanan. 2012. Face detection, pose estimation, and landmark localization in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'12). 2879--2886. Google Scholar
Digital Library
Index Terms
Driving High-Resolution Facial Scans with Video Performance Capture





Comments