skip to main content
research-article

LiveCap: Real-Time Human Performance Capture From Monocular Video

Published:13 March 2019Publication History
Skip Abstract Section

Abstract

We present the first real-time human performance capture approach that reconstructs dense, space-time coherent deforming geometry of entire humans in general everyday clothing from just a single RGB video. We propose a novel two-stage analysis-by-synthesis optimization whose formulation and implementation are designed for high performance. In the first stage, a skinned template model is jointly fitted to background subtracted input video, 2D and 3D skeleton joint positions found using a deep neural network, and a set of sparse facial landmark detections. In the second stage, dense non-rigid 3D deformations of skin and even loose apparel are captured based on a novel real-time capable algorithm for non-rigid tracking using dense photometric and silhouette constraints. Our novel energy formulation leverages automatically identified material regions on the template to model the differing non-rigid deformation behavior of skin and apparel. The two resulting non-linear optimization problems per frame are solved with specially tailored data-parallel Gauss-Newton solvers. To achieve real-time performance of over 25Hz, we design a pipelined parallel architecture using the CPU and two commodity GPUs. Our method is the first real-time monocular approach for full-body performance capture. Our method yields comparable accuracy with off-line performance capture techniques while being orders of magnitude faster.

Skip Supplemental Material Section

Supplemental Material

References

  1. Benjamin Allain, Jean-Sébastien Franco, and Edmond Boyer. 2015. An efficient volumetric framework for shape tracking. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE, Los Alamitos, CA, 268–276.Google ScholarGoogle ScholarCross RefCross Ref
  2. Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers, and James Davis. 2005. SCAPE: Shape completion and animation of people. ACM Transactions on Graphics 24, 3 (2005), 408--416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alexandru O. Bălan and Michael J. Black. 2008. The naked truth: Estimating body shape under clothing. In Proceedings of the European Conference on Computer Vision. 15--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Alexandru O. Balan, Leonid Sigal, Michael J. Black, James E. Davis, and Horst W. Haussecker. 2007. Detailed human shape and pose from images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’07). 1--8.Google ScholarGoogle Scholar
  5. A. Bartoli, Y. Gérard, F. Chadebecq, T. Collins, and D. Pizarro. 2015. Shape-from-template. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 10 (Oct. 2015), 2099--2118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Federica Bogo, Michael J. Black, Matthew Loper, and Javier Romero. 2015. Detailed full-body reconstructions of moving people from monocular RGB-D sequences. In Proceedings of the International Conference on Computer Vision (ICCV’15). 2300--2308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In Proceedings of the European Conference on Computer Vision (ECCV’16).Google ScholarGoogle Scholar
  8. Matthieu Bray, Pushmeet Kohli, and Philip H. S. Torr. 2006. Posecut: Simultaneous segmentation and 3D pose estimation of humans using dynamic graph-cuts. In Proceedings of the European Conference on Computer Vision. 642--655. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Thomas Brox, Bodo Rosenhahn, Juergen Gall, and Daniel Cremers. 2010. Combined region and motion-based 3D tracking of rigid and articulated objects. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 3 (2010), 402--415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cedric Cagniart, Edmond Boyer, and Slobodan Ilic. 2010. Free-form mesh tracking: A patch-based approach. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, Los Alamitos, CA, 1339--1346.Google ScholarGoogle ScholarCross RefCross Ref
  11. Chen Cao, Derek Bradley, Kun Zhou, and Thabo Beeler. 2015. Real-time high-fidelity facial performance capture. ACM Transactions on Graphics 34, 4 (July 2015), Article 46, 9 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Joel Carranza, Christian Theobalt, Marcus A. Magnor, and Hans-Peter Seidel. 2003. Free-viewpoint video of human actors. ACM Transactions on Graphics 22, 3 (July 2003), 569--577. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Xiaowu Chen, Yu Guo, Bin Zhou, and Qinping Zhao. 2013. Deformable model for estimating clothed and naked human shapes from a single image. Visual Computer 29, 11 (2013), 1187--1196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, et al. 2015. High-quality streamable free-viewpoint video. ACM Transactions on Graphics 34, 4 (2015), 69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Edilson De Aguiar, Carsten Stoll, Christian Theobalt, Naveed Ahmed, Hans-Peter Seidel, and Sebastian Thrun. 2008. Performance capture from sparse multi-view video. ACM Transactions on Graphics 27, 3 (Aug. 2008), Article 98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mingsong Dou, Philip Davidson, Sean Ryan Fanello, Sameh Khamis, Adarsh Kowdle, Christoph Rhemann, Vladimir Tankovich, et al. 2017. Motion2Fusion: Real-time volumetric performance capture. ACM Transactions on Graphics 36, 6 (Nov. 2017), Article 246, 16 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mingsong Dou, Sameh Khamis, Yury Degtyarev, Philip Davidson, Sean Ryan Fanello, Adarsh Kowdle, Sergio Orts Escolano, et al. 2016. Fusion4D: Real-time performance capture of challenging scenes. ACM Transactions on Graphics 35, 4 (2016), 114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Juergen Gall, Carsten Stoll, Edilson De Aguiar, Christian Theobalt, Bodo Rosenhahn, and Hans-Peter Seidel. 2009. Motion capture using joint skeleton tracking and surface estimation. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). IEEE, Los Alamitos, CA, 1746--1753.Google ScholarGoogle ScholarCross RefCross Ref
  19. R. Garg, A. Roussos, and L. Agapito. 2013. Dense variational reconstruction of non-rigid surfaces from monocular video. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. 1272--1279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Pablo Garrido, Michael Zollhoefer, Dan Casas, Levi Valgaerts, Kiran Varanasi, Patrick Perez, and Christian Theobalt. 2016. Reconstruction of personalized 3D face rigs from monocular video. ACM Transactions on Graphics 35, 3 (2016), Article 28, 15 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ke Gong, Xiaodan Liang, Dongyu Zhang, Xiaohui Shen, and Liang Lin. 2017. Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google ScholarGoogle ScholarCross RefCross Ref
  22. Peng Guan, Alexander Weiss, Alexandru O Bălan, and Michael J. Black. 2009. Estimating human shape and pose from a single image. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision (ICCV 09). 1381--1388.Google ScholarGoogle Scholar
  23. Riza Alp Güler, Natalia Neverova, and Iasonas Kokkinos. 2018. DensePose: Dense human pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).Google ScholarGoogle ScholarCross RefCross Ref
  24. Kaiwen Guo, Jonathan Taylor, Sean Fanello, Andrea Tagliasacchi, Mingsong Dou, Philip Davidson, Adarsh Kowdle, et al. 2018. TwinFusion: High framerate non-rigid fusion through fast correspondence tracking. In Proceedings of the 2018 International Conference on 3D Vision (3DV’18).Google ScholarGoogle ScholarCross RefCross Ref
  25. Kaiwen Guo, Feng Xu, Tao Yu, Xiaoyang Liu, Qionghai Dai, and Yebin Liu. 2017. Real-time geometry, albedo, and motion reconstruction using a single RGB-D camera. ACM Transactions on Graphics 36, 3 (2017), 32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yu Guo, Xiaowu Chen, Bin Zhou, and Qinping Zhao. 2012. Clothed and naked human shapes estimation from a single image. In Proceedings of the 1st International Conference on Computational Visual Media (CVM’12). 43--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Nils Hasler, Hanno Ackermann, Bodo Rosenhahn, Thorsten Thormählen, and Hans-Peter Seidel. 2010. Multilinear pose and body shape estimation of dressed subjects from image sets. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, Los Alamitos, CA, 1823--1830.Google ScholarGoogle ScholarCross RefCross Ref
  28. Thomas Helten, Meinard Muller, Hans-Peter Seidel, and Christian Theobalt. 2013. Real-time body tracking with one depth camera and inertial sensors. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Anna Hilsmann and Peter Eisert. 2009. Tracking and retexturing cloth for real-time virtual clothing applications. In Proceedings of the 4th International Conference on Computer Vision/Computer Graphics CollaborationTechniques (MIRAGE’09). 94--105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. C.-H. Huang, B. Allain, J.-S. Franco, N. Navab, S. Ilic, and E. Boyer. 2016. Volumetric 3D tracking by detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).Google ScholarGoogle Scholar
  31. Yinghao Huang, Federica Bogo, Christoph Lassner, Angjoo Kanazawa, Peter V. Gehler, Javier Romero, Ijaz Akhter, et al. 2017. Towards accurate marker-less human shape and pose estimation over time. In Proceedings of the 2017 International Conference on 3D Vision (3DV’17).Google ScholarGoogle ScholarCross RefCross Ref
  32. Matthias Innmann, Michael Zollhöfer, Matthias Nießner, Christian Theobalt, and Marc Stamminger. 2016. VolumeDeform: Real-time volumetric non-rigid reconstruction. In Computer Vision—ECCV 2016. Springer, 17.Google ScholarGoogle ScholarCross RefCross Ref
  33. Shahram Izadi, David Kim, Otmar Hilliges, David Molyneaux, Richard Newcombe, Pushmeet Kohli, Jamie Shotton, et al. 2011. KinectFusion: Real-time 3D reconstruction and interaction using a moving depth camera. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (UIST’11). ACM, New York, NY, 559--568. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Arjun Jain, Thorsten Thormählen, Hans-Peter Seidel, and Christian Theobalt. 2010. MovieReshape: Tracking and reshaping of humans in videos. ACM Transactions on Graphics 29, 6 (2010), Article 148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Hanbyul Joo, Tomas Simon, and Yaser Sheikh. 2018. Total capture: A 3D deformation model for tracking faces, hands, and bodies. arXiv:1801.01615.Google ScholarGoogle Scholar
  36. Petr Kadlecek, Alexandru-Eugen Ichim, Tiantian Liu, Jaroslav Krivanek, and Ladislav Kavan. 2016. Reconstructing personalized anatomical models for physics-based body animation. ACM Transactions on Graphics 35, 6 (Nov. 2016), Article 213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. 2018. End-to-end recovery of human shape and pose. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’18).Google ScholarGoogle ScholarCross RefCross Ref
  38. Ladislav Kavan, Steven Collins, Jiří Žára, and Carol O’Sullivan. 2007. Skinning with dual quaternions. In Proceedings of the 2007 Symposium on Interactive 3D Graphics and Games. ACM, New York, NY, 39--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Meekyoung Kim, Gerard Pons-Moll, Sergi Pujades, Sungbae Bang, Jinwwok Kim, Michael Black, and Sung-Hee Lee. 2017. Data-driven physics for human soft tissue animation. ACM Transactions on Graphics 36, 4 (July 2017), Article 54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Adarsh Kowdle, Christoph Rhemann, Sean Fanello, Andrea Tagliasacchi, Jonathan Taylor, Philip Davidson, Mingsong Dou, et al. 2018. The need 4 speed in real-time dense visual tracking. ACM Transactions on Graphics 37, 6 (Nov. 2018), Article 220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Vladislav Kraevoy, Alla Sheffer, and Michiel van de Panne. 2009. Modeling from contour drawings. In Proceedings of the 6th Eurographics Symposium on Sketch-Based Interfaces and Modeling. ACM, New York, NY, 37--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Christoph Lassner, Javier Romero, Martin Kiefel, Federica Bogo, Michael J. Black, and Peter V. Gehler. 2017. Unite the people: Closing the loop between 3D and 2D human representations. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google ScholarGoogle Scholar
  43. Vincent Leroy, Jean-Sébastien Franco, and Edmond Boyer. 2017. Multi-view dynamic shape refinement using local temporal integration. In Proceedings of the IEEE International Conference on Computer Vision. https://hal.archives-ouvertes.fr/hal-01567758Google ScholarGoogle ScholarCross RefCross Ref
  44. Yebin Liu, Carsten Stoll, Juergen Gall, Hans-Peter Seidel, and Christian Theobalt. 2011. Markerless motion capture of interacting characters using multi-view image segmentation. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). IEEE, Los Alamitos, CA, 1249--1256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A skinned multi-person linear model. ACM Transactions on Graphics 34, 6 (Nov. 2015), Article 248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Wojciech Matusik, Chris Buehler, Ramesh Raskar, Steven J. Gortler, and Leonard McMillan. 2000. Image-based visual hulls. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques. ACM, New York, NY, 369--374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, et al. 2017. VNect: Real-time 3D human pose estimation with a single RGB camera. ACM Transactions on Graphics 36, 4 (2017), 14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Dimitris Metaxas and Demetri Terzopoulos. 1993. Shape and nonrigid motion estimation through physics-based synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence 15, 6 (June 1993), 580--591. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Armin Mustafa, Hansung Kim, Jean-Yves Guillemaut, and Adrian Hilton. 2016. Temporally coherent 4D reconstruction of complex dynamic scenes. In Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 4660--4669.Google ScholarGoogle ScholarCross RefCross Ref
  50. Richard A. Newcombe, Dieter Fox, and Steven M. Seitz. 2015. DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15).Google ScholarGoogle Scholar
  51. Richard A. Newcombe, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J. Davison, Pushmeet Kohi, et al. 2011. KinectFusion: Real-time dense surface mapping and tracking. In Proceedings of the 2011 10th International Symposium on Mixed and Augmented Reality (ISMAR’11). IEEE, Los Alamitos, CA, 127--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Sergio Orts-Escolano, Christoph Rhemann, Sean Fanello, Wayne Chang, Adarsh Kowdle, Yury Degtyarev, David Kim, et al. 2016. Holoportation: Virtual 3D teleportation in real-time. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology. ACM, New York, NY, 741--754. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Sang Il Park and Jessica K. Hodgins. 2008. Data-driven modeling of skin and muscle deformation. ACM Transactions on Graphics 27, 3 (Aug. 2008), Article 96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Ralf Plänkers and Pascal Fua. 2001. Tracking and modeling people in video sequences. Computer Vision and Image Understanding 81, 3 (2001), 285--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Gerard Pons-Moll, Sergi Pujades, Sonny Hu, and Michael Black. 2017. ClothCap: Seamless 4D clothing capture and retargeting. ACM Transactions on Graphics 36, 4 (July 2017), Article 73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Gerard Pons-Moll, Javier Romero, Naureen Mahmood, and Michael J. Black. 2015. Dyna: A model of dynamic human shape in motion. ACM Transactions on Graphics 34, 4 (2015), 120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Alin-Ionut Popa, Mihai Zanfir, and Cristian Sminchisescu. 2017. Deep multitask architecture for integrated 2D and 3D human sensing. In Proceedings of the 2017IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google ScholarGoogle ScholarCross RefCross Ref
  58. Fabián Prada, Misha Kazhdan, Ming Chuang, Alvaro Collet, and Hugues Hoppe. 2017. Spatiotemporal atlas parameterization for evolving meshes. ACM Transactions on Graphics 36, 4 (2017), 58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Helge Rhodin, Nadia Robertini, Dan Casas, Christian Richardt, Hans-Peter Seidel, and Christian Theobalt. 2016. General automatic human shape and motion capture using volumetric contour cues. In Computer Vision—ECCV 2016. Lecture Notes in Computer Science, Vol. 9909. Springer, 509--526.Google ScholarGoogle ScholarCross RefCross Ref
  60. Nadia Robertini, Dan Casas, Helge Rhodin, Hans-Peter Seidel, and Christian Theobalt. 2016. Model-based outdoor performance capture. In Proceedings of the 2016 4th International Conference on 3D Vision (3DV’16).Google ScholarGoogle ScholarCross RefCross Ref
  61. Gregory Rogez, Philippe Weinzaepfel, and Cordelia Schmid. 2017. LCR-Net: Localization-classification-regression for human pose. In Proceedings of the 2017 IIEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google ScholarGoogle ScholarCross RefCross Ref
  62. Lorenz Rogge, Felix Klose, Michael Stengel, Martin Eisemann, and Marcus Magnor. 2014. Garment replacement in monocular video sequences. ACM Transactions on Graphics 34, 1 (2014), 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Javier Romero, Dimitrios Tzionas, and Michael J. Black. 2017. Embodied hands: Modeling and capturing hands and bodies together. ACM Transactions on Graphics 36, 6 (Nov. 2017), Article 245, 17 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Chris Russell, Rui Yu, and Lourdes Agapito. 2014. Video Pop-Up: Monocular 3D Reconstruction of Dynamic Scenes. Springer, Cham, Switzerland, 583--598.Google ScholarGoogle Scholar
  65. Mathieu Salzmann and Pascal Fua. 2011. Linear local models for monocular reconstruction of deformable surfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 5 (2011), 931--944. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. J. M. Saragih, S. Lucey, and J. F. Cohn. 2009. Face alignment through subspace constrained mean-shifts. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. 1034--1041.Google ScholarGoogle Scholar
  67. M. Sekine, K. Sugita, F. Perbet, B. Stenger, and M. Nishiyama. 2014. Virtual fitting by single-shot body shape estimation. In Proceedings of the International Conference on 3D Body Scanning Technologies. 406--413.Google ScholarGoogle Scholar
  68. Leonid Sigal, Sidharth Bhatia, Stefan Roth, Michael J. Black, and Michael Isard. 2004. Tracking loose-limbed people. In Proceedings of the 2004 IEEE Computer Scoeity Conference on Computer Vision and Pattern Recognition (CVPR’04), Vol. 1. IEEE, Los Alamitos, CA.Google ScholarGoogle ScholarCross RefCross Ref
  69. Miroslava Slavcheva, Maximilian Baust, Daniel Cremers, and Slobodan Ilic. 2017. KillingFusion: Non-rigid 3D reconstruction without correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17), Vol. 3. 7.Google ScholarGoogle ScholarCross RefCross Ref
  70. Cristian Sminchisescu and Bill Triggs. 2003. Kinematic jump processes for monocular 3D human tracking. In Proceedings. of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1. IEEE, Los Alamitos, CA, I--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Jonathan Starck and Adrian Hilton. 2007. Surface capture for performance-based animation. IEEE Computer Graphics and Applications 27, 3 (2007), 21--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Xiao Sun, Jiaxiang Shang, Shuang Liang, and Yichen Wei. 2017. Compositional human pose regression. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV’17).Google ScholarGoogle ScholarCross RefCross Ref
  73. Andrea Tagliasacchi, Matthias Schroeder, Anastasia Tkach, Sofien Bouaziz, Mario Botsch, and Mark Pauly. 2015. Robust articulated-ICP for real-time hand tracking. Computer Graphics Forum 34, 5 (2015), Article 5.Google ScholarGoogle Scholar
  74. Bugra Tekin, Pablo Márquez-Neila, Mathieu Salzmann, and Pascal Fua. 2017. Learning to fuse 2D and 3D image cues for monocular body pose estimation. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV’17). IEEE, Los Alamitos, CA, 3961--3970.Google ScholarGoogle ScholarCross RefCross Ref
  75. B. Tekin, A. Rozantsev, V. Lepetit, and P. Fua. 2016. Direct prediction of 3D body poses from motion compensated sequences. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 991--1000.Google ScholarGoogle Scholar
  76. Denis Tome, Chris Russell, and Lourdes Agapito. 2017. Lifting from the deep: Convolutional 3D pose estimation from a single image. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  77. Gül Varol, Duygu Ceylan, Bryan Russell, Jimei Yang, Ersin Yumer, Ivan Laptev, and Cordelia Schmid. 2018. BodyNet: Volumetric inference of 3D human body shapes. In Proceedings of the 2018 15th European Conference on Computer Vision (ECCV’18).Google ScholarGoogle ScholarCross RefCross Ref
  78. Daniel Vlasic, Ilya Baran, Wojciech Matusik, and Jovan Popović. 2008. Articulated mesh animation from multi-view silhouettes. ACM Transactions on Graphics 27, 3 (2008), Article 97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Daniel Vlasic, Pieter Peers, Ilya Baran, Paul Debevec, Jovan Popović, Szymon Rusinkiewicz, and Wojciech Matusik. 2009. Dynamic shape capture using multi-view photometric stereo. ACM Transactions on Graphics 28, 5 (2009), 174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Ruizhe Wang, Lingyu Wei, Etienne Vouga, Qixing Huang, Duygu Ceylan, Gerard Medioni, and Hao Li. 2016. Capturing dynamic textured surfaces of moving targets. In Proceedings of the European Conference on Computer Vision (ECCV’16).Google ScholarGoogle ScholarCross RefCross Ref
  81. Michael Waschbüsch, Stephan Würmlin, Daniel Cotting, Filip Sadlo, and Markus Gross. 2005. Scalable 3D video of dynamic scenes. Visual Computer 21, 8-10 (2005), 629--638.Google ScholarGoogle ScholarCross RefCross Ref
  82. X. Wei, P. Zhang, and J. Chai. 2012. Accurate realtime full-body motion capture using a single depth camera. ACM Transactions on Graphics 31, 6 (2012), Article 188, 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Alexander Weiss, David Hirshberg, and Michael J. Black. 2011. Home 3D body scans from noisy image and range data. In Proceedings of the 2011 13th International Conference on Computer Vision (ICCV’11). IEEE, Los Alamitos, CA, 1951--1958. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Chenglei Wu, Carsten Stoll, Levi Valgaerts, and Christian Theobalt. 2013. On-set performance capture of multiple actors with a stereo camera. ACM Transactions on Graphics 32, Article 161, 11 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Chenglei Wu, Kiran Varanasi, and Christian Theobalt. 2012. Full body performance capture under uncontrolled and varying illumination: A shading-based approach. In Proceedings of the 2012 European Conference on Computer Vision (ECCV’12). 757--770. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Weipeng Xu, Avishek Chatterjee, Michael Zollöfer, Helge Rhodin, Dushyant Mehta, Hans-Peter Seidel, and Christian Theobalt. 2018. MonoPerfCap: Human performance capture from monocular video. ACM Transactions on Graphics 37, 2 (July 2018), Article 27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Jinlong Yang, Jean-Sébastien Franco, Franck Hétroy-Wheeler, and Stefanie Wuhrer. 2016. Estimation of human body shape in motion with wide clothing. In Proceedings of the 2016 European Conference on Computer Vision (ECCV’16).Google ScholarGoogle ScholarCross RefCross Ref
  88. Genzhi Ye, Yebin Liu, Nils Hasler, Xiangyang Ji, Qionghai Dai, and Christian Theobalt. 2012. Performance capture of interacting characters with handheld Kinects. In Computer Vision—ECCV 2012. Lecture Notes in Computer Science, Vol. 7573. Springer, 828--841.Google ScholarGoogle ScholarCross RefCross Ref
  89. Mao Ye and Ruigang Yang. 2014. Real-time simultaneous pose and shape estimation for articulated objects using a single depth camera. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2345--2352. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Rui Yu, Chris Russell, Neill D. F. Campbell, and Lourdes Agapito. 2015. Direct, dense, and deformable: Template-based non-rigid 3D reconstruction from RGB video. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Tao Yu, Kaiwen Guo, Feng Xu, Yuan Dong, Zhaoqi Su, Jianhui Zhao, Jianguo Li, et al. 2017. BodyFusion: Real-time capture of human motion and surface geometry using a single depth camera. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17).Google ScholarGoogle ScholarCross RefCross Ref
  92. Tao Yu, Zerong Zheng, Kaiwen Guo, Jianhui Zhao, Qionghai Dai, Hao Li, Gerard Pons-Moll, et al. 2018. DoubleFusion: Real-time capture of human performances with inner body shapes from a single depth sensor. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, Los Alamitos, CA.Google ScholarGoogle ScholarCross RefCross Ref
  93. Chao Zhang, Sergi Pujades, Michael Black, and Gerard Pons-Moll. 2017. Detailed, accurate, human shape estimation from clothed 3D scan sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google ScholarGoogle ScholarCross RefCross Ref
  94. Peizhao Zhang, Kristin Siu, Jianjie Zhang, C. Karen Liu, and Jinxiang Chai. 2014b. Leveraging depth cameras and wearable pressure sensors for full-body kinematics and dynamics capture. ACM Transactions on Graphics 33, 6 (2014), 14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Qing Zhang, Bo Fu, Mao Ye, and Ruigang Yang. 2014a. Quality dynamic human body modeling using a single low-cost depth camera. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 676--683. Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Qian-Yi Zhou and Vladlen Koltun. 2014. Color map optimization for 3D reconstruction with consumer depth cameras. ACM Transactions on Graphics 33, 4 (2014), 155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Shizhe Zhou, Hongbo Fu, Ligang Liu, Daniel Cohen-Or, and Xiaoguang Han. 2010. Parametric reshaping of human bodies in images. ACM Transactions on Graphics 29, 4 (2010), 126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Xingyi Zhou, Qixing Huang, Xiao Sun, Xiangyang Xue, and Yichen Wei. 2017. Towards 3D human pose estimation in the wild: A weakly-supervised approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 398--407.Google ScholarGoogle ScholarCross RefCross Ref
  99. Xiaowei Zhou, Menglong Zhu, Spyridon Leonardos, Konstantinos G Derpanis, and Kostas Daniilidis. 2016. Sparseness meets deepness: 3D human pose estimation from monocular video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4966--4975.Google ScholarGoogle ScholarCross RefCross Ref
  100. Zoran Zivkovic and Ferdinand van der Heijden. 2006. Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognition Letters 27, 7 (May 2006), 773--780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Michael Zollhöfer, Matthias Nießner, Shahram Izadi, Christoph Rhemann, Christopher Zach, Matthew Fisher, Chenglei Wu, et al. 2014. Real-time non-rigid reconstruction using an RGB-D camera. ACM Transactions on Graphics 33, 4 (July 2014), Article 156. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. LiveCap: Real-Time Human Performance Capture From Monocular Video

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Graphics
        ACM Transactions on Graphics  Volume 38, Issue 2
        April 2019
        112 pages
        ISSN:0730-0301
        EISSN:1557-7368
        DOI:10.1145/3313807
        Issue’s Table of Contents

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 March 2019
        • Revised: 1 January 2019
        • Accepted: 1 January 2019
        • Received: 1 September 2018
        Published in tog Volume 38, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format