Abstract
We present the first real-time human performance capture approach that reconstructs dense, space-time coherent deforming geometry of entire humans in general everyday clothing from just a single RGB video. We propose a novel two-stage analysis-by-synthesis optimization whose formulation and implementation are designed for high performance. In the first stage, a skinned template model is jointly fitted to background subtracted input video, 2D and 3D skeleton joint positions found using a deep neural network, and a set of sparse facial landmark detections. In the second stage, dense non-rigid 3D deformations of skin and even loose apparel are captured based on a novel real-time capable algorithm for non-rigid tracking using dense photometric and silhouette constraints. Our novel energy formulation leverages automatically identified material regions on the template to model the differing non-rigid deformation behavior of skin and apparel. The two resulting non-linear optimization problems per frame are solved with specially tailored data-parallel Gauss-Newton solvers. To achieve real-time performance of over 25Hz, we design a pipelined parallel architecture using the CPU and two commodity GPUs. Our method is the first real-time monocular approach for full-body performance capture. Our method yields comparable accuracy with off-line performance capture techniques while being orders of magnitude faster.
Supplemental Material
Available for Download
Supplemental movie and image files for, LiveCap: Real-Time Human Performance Capture From Monocular Video
- Benjamin Allain, Jean-Sébastien Franco, and Edmond Boyer. 2015. An efficient volumetric framework for shape tracking. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE, Los Alamitos, CA, 268–276.Google Scholar
Cross Ref
- Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers, and James Davis. 2005. SCAPE: Shape completion and animation of people. ACM Transactions on Graphics 24, 3 (2005), 408--416. Google Scholar
Digital Library
- Alexandru O. Bălan and Michael J. Black. 2008. The naked truth: Estimating body shape under clothing. In Proceedings of the European Conference on Computer Vision. 15--29. Google Scholar
Digital Library
- Alexandru O. Balan, Leonid Sigal, Michael J. Black, James E. Davis, and Horst W. Haussecker. 2007. Detailed human shape and pose from images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’07). 1--8.Google Scholar
- A. Bartoli, Y. Gérard, F. Chadebecq, T. Collins, and D. Pizarro. 2015. Shape-from-template. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 10 (Oct. 2015), 2099--2118. Google Scholar
Digital Library
- Federica Bogo, Michael J. Black, Matthew Loper, and Javier Romero. 2015. Detailed full-body reconstructions of moving people from monocular RGB-D sequences. In Proceedings of the International Conference on Computer Vision (ICCV’15). 2300--2308. Google Scholar
Digital Library
- Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In Proceedings of the European Conference on Computer Vision (ECCV’16).Google Scholar
- Matthieu Bray, Pushmeet Kohli, and Philip H. S. Torr. 2006. Posecut: Simultaneous segmentation and 3D pose estimation of humans using dynamic graph-cuts. In Proceedings of the European Conference on Computer Vision. 642--655. Google Scholar
Digital Library
- Thomas Brox, Bodo Rosenhahn, Juergen Gall, and Daniel Cremers. 2010. Combined region and motion-based 3D tracking of rigid and articulated objects. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 3 (2010), 402--415. Google Scholar
Digital Library
- Cedric Cagniart, Edmond Boyer, and Slobodan Ilic. 2010. Free-form mesh tracking: A patch-based approach. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, Los Alamitos, CA, 1339--1346.Google Scholar
Cross Ref
- Chen Cao, Derek Bradley, Kun Zhou, and Thabo Beeler. 2015. Real-time high-fidelity facial performance capture. ACM Transactions on Graphics 34, 4 (July 2015), Article 46, 9 pages. Google Scholar
Digital Library
- Joel Carranza, Christian Theobalt, Marcus A. Magnor, and Hans-Peter Seidel. 2003. Free-viewpoint video of human actors. ACM Transactions on Graphics 22, 3 (July 2003), 569--577. Google Scholar
Digital Library
- Xiaowu Chen, Yu Guo, Bin Zhou, and Qinping Zhao. 2013. Deformable model for estimating clothed and naked human shapes from a single image. Visual Computer 29, 11 (2013), 1187--1196. Google Scholar
Digital Library
- Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, et al. 2015. High-quality streamable free-viewpoint video. ACM Transactions on Graphics 34, 4 (2015), 69. Google Scholar
Digital Library
- Edilson De Aguiar, Carsten Stoll, Christian Theobalt, Naveed Ahmed, Hans-Peter Seidel, and Sebastian Thrun. 2008. Performance capture from sparse multi-view video. ACM Transactions on Graphics 27, 3 (Aug. 2008), Article 98. Google Scholar
Digital Library
- Mingsong Dou, Philip Davidson, Sean Ryan Fanello, Sameh Khamis, Adarsh Kowdle, Christoph Rhemann, Vladimir Tankovich, et al. 2017. Motion2Fusion: Real-time volumetric performance capture. ACM Transactions on Graphics 36, 6 (Nov. 2017), Article 246, 16 pages. Google Scholar
Digital Library
- Mingsong Dou, Sameh Khamis, Yury Degtyarev, Philip Davidson, Sean Ryan Fanello, Adarsh Kowdle, Sergio Orts Escolano, et al. 2016. Fusion4D: Real-time performance capture of challenging scenes. ACM Transactions on Graphics 35, 4 (2016), 114. Google Scholar
Digital Library
- Juergen Gall, Carsten Stoll, Edilson De Aguiar, Christian Theobalt, Bodo Rosenhahn, and Hans-Peter Seidel. 2009. Motion capture using joint skeleton tracking and surface estimation. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). IEEE, Los Alamitos, CA, 1746--1753.Google Scholar
Cross Ref
- R. Garg, A. Roussos, and L. Agapito. 2013. Dense variational reconstruction of non-rigid surfaces from monocular video. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. 1272--1279. Google Scholar
Digital Library
- Pablo Garrido, Michael Zollhoefer, Dan Casas, Levi Valgaerts, Kiran Varanasi, Patrick Perez, and Christian Theobalt. 2016. Reconstruction of personalized 3D face rigs from monocular video. ACM Transactions on Graphics 35, 3 (2016), Article 28, 15 pages. Google Scholar
Digital Library
- Ke Gong, Xiaodan Liang, Dongyu Zhang, Xiaohui Shen, and Liang Lin. 2017. Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google Scholar
Cross Ref
- Peng Guan, Alexander Weiss, Alexandru O Bălan, and Michael J. Black. 2009. Estimating human shape and pose from a single image. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision (ICCV 09). 1381--1388.Google Scholar
- Riza Alp Güler, Natalia Neverova, and Iasonas Kokkinos. 2018. DensePose: Dense human pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).Google Scholar
Cross Ref
- Kaiwen Guo, Jonathan Taylor, Sean Fanello, Andrea Tagliasacchi, Mingsong Dou, Philip Davidson, Adarsh Kowdle, et al. 2018. TwinFusion: High framerate non-rigid fusion through fast correspondence tracking. In Proceedings of the 2018 International Conference on 3D Vision (3DV’18).Google Scholar
Cross Ref
- Kaiwen Guo, Feng Xu, Tao Yu, Xiaoyang Liu, Qionghai Dai, and Yebin Liu. 2017. Real-time geometry, albedo, and motion reconstruction using a single RGB-D camera. ACM Transactions on Graphics 36, 3 (2017), 32. Google Scholar
Digital Library
- Yu Guo, Xiaowu Chen, Bin Zhou, and Qinping Zhao. 2012. Clothed and naked human shapes estimation from a single image. In Proceedings of the 1st International Conference on Computational Visual Media (CVM’12). 43--50. Google Scholar
Digital Library
- Nils Hasler, Hanno Ackermann, Bodo Rosenhahn, Thorsten Thormählen, and Hans-Peter Seidel. 2010. Multilinear pose and body shape estimation of dressed subjects from image sets. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, Los Alamitos, CA, 1823--1830.Google Scholar
Cross Ref
- Thomas Helten, Meinard Muller, Hans-Peter Seidel, and Christian Theobalt. 2013. Real-time body tracking with one depth camera and inertial sensors. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13). Google Scholar
Digital Library
- Anna Hilsmann and Peter Eisert. 2009. Tracking and retexturing cloth for real-time virtual clothing applications. In Proceedings of the 4th International Conference on Computer Vision/Computer Graphics CollaborationTechniques (MIRAGE’09). 94--105. Google Scholar
Digital Library
- C.-H. Huang, B. Allain, J.-S. Franco, N. Navab, S. Ilic, and E. Boyer. 2016. Volumetric 3D tracking by detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).Google Scholar
- Yinghao Huang, Federica Bogo, Christoph Lassner, Angjoo Kanazawa, Peter V. Gehler, Javier Romero, Ijaz Akhter, et al. 2017. Towards accurate marker-less human shape and pose estimation over time. In Proceedings of the 2017 International Conference on 3D Vision (3DV’17).Google Scholar
Cross Ref
- Matthias Innmann, Michael Zollhöfer, Matthias Nießner, Christian Theobalt, and Marc Stamminger. 2016. VolumeDeform: Real-time volumetric non-rigid reconstruction. In Computer Vision—ECCV 2016. Springer, 17.Google Scholar
Cross Ref
- Shahram Izadi, David Kim, Otmar Hilliges, David Molyneaux, Richard Newcombe, Pushmeet Kohli, Jamie Shotton, et al. 2011. KinectFusion: Real-time 3D reconstruction and interaction using a moving depth camera. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (UIST’11). ACM, New York, NY, 559--568. Google Scholar
Digital Library
- Arjun Jain, Thorsten Thormählen, Hans-Peter Seidel, and Christian Theobalt. 2010. MovieReshape: Tracking and reshaping of humans in videos. ACM Transactions on Graphics 29, 6 (2010), Article 148. Google Scholar
Digital Library
- Hanbyul Joo, Tomas Simon, and Yaser Sheikh. 2018. Total capture: A 3D deformation model for tracking faces, hands, and bodies. arXiv:1801.01615.Google Scholar
- Petr Kadlecek, Alexandru-Eugen Ichim, Tiantian Liu, Jaroslav Krivanek, and Ladislav Kavan. 2016. Reconstructing personalized anatomical models for physics-based body animation. ACM Transactions on Graphics 35, 6 (Nov. 2016), Article 213. Google Scholar
Digital Library
- Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. 2018. End-to-end recovery of human shape and pose. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’18).Google Scholar
Cross Ref
- Ladislav Kavan, Steven Collins, Jiří Žára, and Carol O’Sullivan. 2007. Skinning with dual quaternions. In Proceedings of the 2007 Symposium on Interactive 3D Graphics and Games. ACM, New York, NY, 39--46. Google Scholar
Digital Library
- Meekyoung Kim, Gerard Pons-Moll, Sergi Pujades, Sungbae Bang, Jinwwok Kim, Michael Black, and Sung-Hee Lee. 2017. Data-driven physics for human soft tissue animation. ACM Transactions on Graphics 36, 4 (July 2017), Article 54. Google Scholar
Digital Library
- Adarsh Kowdle, Christoph Rhemann, Sean Fanello, Andrea Tagliasacchi, Jonathan Taylor, Philip Davidson, Mingsong Dou, et al. 2018. The need 4 speed in real-time dense visual tracking. ACM Transactions on Graphics 37, 6 (Nov. 2018), Article 220. Google Scholar
Digital Library
- Vladislav Kraevoy, Alla Sheffer, and Michiel van de Panne. 2009. Modeling from contour drawings. In Proceedings of the 6th Eurographics Symposium on Sketch-Based Interfaces and Modeling. ACM, New York, NY, 37--44. Google Scholar
Digital Library
- Christoph Lassner, Javier Romero, Martin Kiefel, Federica Bogo, Michael J. Black, and Peter V. Gehler. 2017. Unite the people: Closing the loop between 3D and 2D human representations. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google Scholar
- Vincent Leroy, Jean-Sébastien Franco, and Edmond Boyer. 2017. Multi-view dynamic shape refinement using local temporal integration. In Proceedings of the IEEE International Conference on Computer Vision. https://hal.archives-ouvertes.fr/hal-01567758Google Scholar
Cross Ref
- Yebin Liu, Carsten Stoll, Juergen Gall, Hans-Peter Seidel, and Christian Theobalt. 2011. Markerless motion capture of interacting characters using multi-view image segmentation. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). IEEE, Los Alamitos, CA, 1249--1256. Google Scholar
Digital Library
- Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A skinned multi-person linear model. ACM Transactions on Graphics 34, 6 (Nov. 2015), Article 248. Google Scholar
Digital Library
- Wojciech Matusik, Chris Buehler, Ramesh Raskar, Steven J. Gortler, and Leonard McMillan. 2000. Image-based visual hulls. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques. ACM, New York, NY, 369--374. Google Scholar
Digital Library
- Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, et al. 2017. VNect: Real-time 3D human pose estimation with a single RGB camera. ACM Transactions on Graphics 36, 4 (2017), 14. Google Scholar
Digital Library
- Dimitris Metaxas and Demetri Terzopoulos. 1993. Shape and nonrigid motion estimation through physics-based synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence 15, 6 (June 1993), 580--591. Google Scholar
Digital Library
- Armin Mustafa, Hansung Kim, Jean-Yves Guillemaut, and Adrian Hilton. 2016. Temporally coherent 4D reconstruction of complex dynamic scenes. In Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 4660--4669.Google Scholar
Cross Ref
- Richard A. Newcombe, Dieter Fox, and Steven M. Seitz. 2015. DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15).Google Scholar
- Richard A. Newcombe, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J. Davison, Pushmeet Kohi, et al. 2011. KinectFusion: Real-time dense surface mapping and tracking. In Proceedings of the 2011 10th International Symposium on Mixed and Augmented Reality (ISMAR’11). IEEE, Los Alamitos, CA, 127--136. Google Scholar
Digital Library
- Sergio Orts-Escolano, Christoph Rhemann, Sean Fanello, Wayne Chang, Adarsh Kowdle, Yury Degtyarev, David Kim, et al. 2016. Holoportation: Virtual 3D teleportation in real-time. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology. ACM, New York, NY, 741--754. Google Scholar
Digital Library
- Sang Il Park and Jessica K. Hodgins. 2008. Data-driven modeling of skin and muscle deformation. ACM Transactions on Graphics 27, 3 (Aug. 2008), Article 96. Google Scholar
Digital Library
- Ralf Plänkers and Pascal Fua. 2001. Tracking and modeling people in video sequences. Computer Vision and Image Understanding 81, 3 (2001), 285--302. Google Scholar
Digital Library
- Gerard Pons-Moll, Sergi Pujades, Sonny Hu, and Michael Black. 2017. ClothCap: Seamless 4D clothing capture and retargeting. ACM Transactions on Graphics 36, 4 (July 2017), Article 73. Google Scholar
Digital Library
- Gerard Pons-Moll, Javier Romero, Naureen Mahmood, and Michael J. Black. 2015. Dyna: A model of dynamic human shape in motion. ACM Transactions on Graphics 34, 4 (2015), 120. Google Scholar
Digital Library
- Alin-Ionut Popa, Mihai Zanfir, and Cristian Sminchisescu. 2017. Deep multitask architecture for integrated 2D and 3D human sensing. In Proceedings of the 2017IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google Scholar
Cross Ref
- Fabián Prada, Misha Kazhdan, Ming Chuang, Alvaro Collet, and Hugues Hoppe. 2017. Spatiotemporal atlas parameterization for evolving meshes. ACM Transactions on Graphics 36, 4 (2017), 58. Google Scholar
Digital Library
- Helge Rhodin, Nadia Robertini, Dan Casas, Christian Richardt, Hans-Peter Seidel, and Christian Theobalt. 2016. General automatic human shape and motion capture using volumetric contour cues. In Computer Vision—ECCV 2016. Lecture Notes in Computer Science, Vol. 9909. Springer, 509--526.Google Scholar
Cross Ref
- Nadia Robertini, Dan Casas, Helge Rhodin, Hans-Peter Seidel, and Christian Theobalt. 2016. Model-based outdoor performance capture. In Proceedings of the 2016 4th International Conference on 3D Vision (3DV’16).Google Scholar
Cross Ref
- Gregory Rogez, Philippe Weinzaepfel, and Cordelia Schmid. 2017. LCR-Net: Localization-classification-regression for human pose. In Proceedings of the 2017 IIEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google Scholar
Cross Ref
- Lorenz Rogge, Felix Klose, Michael Stengel, Martin Eisemann, and Marcus Magnor. 2014. Garment replacement in monocular video sequences. ACM Transactions on Graphics 34, 1 (2014), 6. Google Scholar
Digital Library
- Javier Romero, Dimitrios Tzionas, and Michael J. Black. 2017. Embodied hands: Modeling and capturing hands and bodies together. ACM Transactions on Graphics 36, 6 (Nov. 2017), Article 245, 17 pages. Google Scholar
Digital Library
- Chris Russell, Rui Yu, and Lourdes Agapito. 2014. Video Pop-Up: Monocular 3D Reconstruction of Dynamic Scenes. Springer, Cham, Switzerland, 583--598.Google Scholar
- Mathieu Salzmann and Pascal Fua. 2011. Linear local models for monocular reconstruction of deformable surfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 5 (2011), 931--944. Google Scholar
Digital Library
- J. M. Saragih, S. Lucey, and J. F. Cohn. 2009. Face alignment through subspace constrained mean-shifts. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. 1034--1041.Google Scholar
- M. Sekine, K. Sugita, F. Perbet, B. Stenger, and M. Nishiyama. 2014. Virtual fitting by single-shot body shape estimation. In Proceedings of the International Conference on 3D Body Scanning Technologies. 406--413.Google Scholar
- Leonid Sigal, Sidharth Bhatia, Stefan Roth, Michael J. Black, and Michael Isard. 2004. Tracking loose-limbed people. In Proceedings of the 2004 IEEE Computer Scoeity Conference on Computer Vision and Pattern Recognition (CVPR’04), Vol. 1. IEEE, Los Alamitos, CA.Google Scholar
Cross Ref
- Miroslava Slavcheva, Maximilian Baust, Daniel Cremers, and Slobodan Ilic. 2017. KillingFusion: Non-rigid 3D reconstruction without correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17), Vol. 3. 7.Google Scholar
Cross Ref
- Cristian Sminchisescu and Bill Triggs. 2003. Kinematic jump processes for monocular 3D human tracking. In Proceedings. of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1. IEEE, Los Alamitos, CA, I--69. Google Scholar
Digital Library
- Jonathan Starck and Adrian Hilton. 2007. Surface capture for performance-based animation. IEEE Computer Graphics and Applications 27, 3 (2007), 21--31. Google Scholar
Digital Library
- Xiao Sun, Jiaxiang Shang, Shuang Liang, and Yichen Wei. 2017. Compositional human pose regression. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV’17).Google Scholar
Cross Ref
- Andrea Tagliasacchi, Matthias Schroeder, Anastasia Tkach, Sofien Bouaziz, Mario Botsch, and Mark Pauly. 2015. Robust articulated-ICP for real-time hand tracking. Computer Graphics Forum 34, 5 (2015), Article 5.Google Scholar
- Bugra Tekin, Pablo Márquez-Neila, Mathieu Salzmann, and Pascal Fua. 2017. Learning to fuse 2D and 3D image cues for monocular body pose estimation. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV’17). IEEE, Los Alamitos, CA, 3961--3970.Google Scholar
Cross Ref
- B. Tekin, A. Rozantsev, V. Lepetit, and P. Fua. 2016. Direct prediction of 3D body poses from motion compensated sequences. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 991--1000.Google Scholar
- Denis Tome, Chris Russell, and Lourdes Agapito. 2017. Lifting from the deep: Convolutional 3D pose estimation from a single image. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- Gül Varol, Duygu Ceylan, Bryan Russell, Jimei Yang, Ersin Yumer, Ivan Laptev, and Cordelia Schmid. 2018. BodyNet: Volumetric inference of 3D human body shapes. In Proceedings of the 2018 15th European Conference on Computer Vision (ECCV’18).Google Scholar
Cross Ref
- Daniel Vlasic, Ilya Baran, Wojciech Matusik, and Jovan Popović. 2008. Articulated mesh animation from multi-view silhouettes. ACM Transactions on Graphics 27, 3 (2008), Article 97. Google Scholar
Digital Library
- Daniel Vlasic, Pieter Peers, Ilya Baran, Paul Debevec, Jovan Popović, Szymon Rusinkiewicz, and Wojciech Matusik. 2009. Dynamic shape capture using multi-view photometric stereo. ACM Transactions on Graphics 28, 5 (2009), 174. Google Scholar
Digital Library
- Ruizhe Wang, Lingyu Wei, Etienne Vouga, Qixing Huang, Duygu Ceylan, Gerard Medioni, and Hao Li. 2016. Capturing dynamic textured surfaces of moving targets. In Proceedings of the European Conference on Computer Vision (ECCV’16).Google Scholar
Cross Ref
- Michael Waschbüsch, Stephan Würmlin, Daniel Cotting, Filip Sadlo, and Markus Gross. 2005. Scalable 3D video of dynamic scenes. Visual Computer 21, 8-10 (2005), 629--638.Google Scholar
Cross Ref
- X. Wei, P. Zhang, and J. Chai. 2012. Accurate realtime full-body motion capture using a single depth camera. ACM Transactions on Graphics 31, 6 (2012), Article 188, 12 pages. Google Scholar
Digital Library
- Alexander Weiss, David Hirshberg, and Michael J. Black. 2011. Home 3D body scans from noisy image and range data. In Proceedings of the 2011 13th International Conference on Computer Vision (ICCV’11). IEEE, Los Alamitos, CA, 1951--1958. Google Scholar
Digital Library
- Chenglei Wu, Carsten Stoll, Levi Valgaerts, and Christian Theobalt. 2013. On-set performance capture of multiple actors with a stereo camera. ACM Transactions on Graphics 32, Article 161, 11 pages. Google Scholar
Digital Library
- Chenglei Wu, Kiran Varanasi, and Christian Theobalt. 2012. Full body performance capture under uncontrolled and varying illumination: A shading-based approach. In Proceedings of the 2012 European Conference on Computer Vision (ECCV’12). 757--770. Google Scholar
Digital Library
- Weipeng Xu, Avishek Chatterjee, Michael Zollöfer, Helge Rhodin, Dushyant Mehta, Hans-Peter Seidel, and Christian Theobalt. 2018. MonoPerfCap: Human performance capture from monocular video. ACM Transactions on Graphics 37, 2 (July 2018), Article 27. Google Scholar
Digital Library
- Jinlong Yang, Jean-Sébastien Franco, Franck Hétroy-Wheeler, and Stefanie Wuhrer. 2016. Estimation of human body shape in motion with wide clothing. In Proceedings of the 2016 European Conference on Computer Vision (ECCV’16).Google Scholar
Cross Ref
- Genzhi Ye, Yebin Liu, Nils Hasler, Xiangyang Ji, Qionghai Dai, and Christian Theobalt. 2012. Performance capture of interacting characters with handheld Kinects. In Computer Vision—ECCV 2012. Lecture Notes in Computer Science, Vol. 7573. Springer, 828--841.Google Scholar
Cross Ref
- Mao Ye and Ruigang Yang. 2014. Real-time simultaneous pose and shape estimation for articulated objects using a single depth camera. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2345--2352. Google Scholar
Digital Library
- Rui Yu, Chris Russell, Neill D. F. Campbell, and Lourdes Agapito. 2015. Direct, dense, and deformable: Template-based non-rigid 3D reconstruction from RGB video. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). Google Scholar
Digital Library
- Tao Yu, Kaiwen Guo, Feng Xu, Yuan Dong, Zhaoqi Su, Jianhui Zhao, Jianguo Li, et al. 2017. BodyFusion: Real-time capture of human motion and surface geometry using a single depth camera. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17).Google Scholar
Cross Ref
- Tao Yu, Zerong Zheng, Kaiwen Guo, Jianhui Zhao, Qionghai Dai, Hao Li, Gerard Pons-Moll, et al. 2018. DoubleFusion: Real-time capture of human performances with inner body shapes from a single depth sensor. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, Los Alamitos, CA.Google Scholar
Cross Ref
- Chao Zhang, Sergi Pujades, Michael Black, and Gerard Pons-Moll. 2017. Detailed, accurate, human shape estimation from clothed 3D scan sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google Scholar
Cross Ref
- Peizhao Zhang, Kristin Siu, Jianjie Zhang, C. Karen Liu, and Jinxiang Chai. 2014b. Leveraging depth cameras and wearable pressure sensors for full-body kinematics and dynamics capture. ACM Transactions on Graphics 33, 6 (2014), 14. Google Scholar
Digital Library
- Qing Zhang, Bo Fu, Mao Ye, and Ruigang Yang. 2014a. Quality dynamic human body modeling using a single low-cost depth camera. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 676--683. Google Scholar
Digital Library
- Qian-Yi Zhou and Vladlen Koltun. 2014. Color map optimization for 3D reconstruction with consumer depth cameras. ACM Transactions on Graphics 33, 4 (2014), 155. Google Scholar
Digital Library
- Shizhe Zhou, Hongbo Fu, Ligang Liu, Daniel Cohen-Or, and Xiaoguang Han. 2010. Parametric reshaping of human bodies in images. ACM Transactions on Graphics 29, 4 (2010), 126. Google Scholar
Digital Library
- Xingyi Zhou, Qixing Huang, Xiao Sun, Xiangyang Xue, and Yichen Wei. 2017. Towards 3D human pose estimation in the wild: A weakly-supervised approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 398--407.Google Scholar
Cross Ref
- Xiaowei Zhou, Menglong Zhu, Spyridon Leonardos, Konstantinos G Derpanis, and Kostas Daniilidis. 2016. Sparseness meets deepness: 3D human pose estimation from monocular video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4966--4975.Google Scholar
Cross Ref
- Zoran Zivkovic and Ferdinand van der Heijden. 2006. Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognition Letters 27, 7 (May 2006), 773--780. Google Scholar
Digital Library
- Michael Zollhöfer, Matthias Nießner, Shahram Izadi, Christoph Rhemann, Christopher Zach, Matthew Fisher, Chenglei Wu, et al. 2014. Real-time non-rigid reconstruction using an RGB-D camera. ACM Transactions on Graphics 33, 4 (July 2014), Article 156. Google Scholar
Digital Library
Index Terms
LiveCap: Real-Time Human Performance Capture From Monocular Video
Recommendations
MonoPerfCap: Human Performance Capture From Monocular Video
We present the first marker-less approach for temporally coherent 3D performance capture of a human with general clothing from monocular video. Our approach reconstructs articulated human skeleton motion as well as medium-scale non-rigid surface ...
3D motion estimation of human body from video with dynamic camera work
MPRSS'12: Proceedings of the First international conference on Multimodal Pattern Recognition of Social Signals in Human-Computer-InteractionOcclusion or camera setting produces a high degree of ambiguity when estimating human body motion from monocular video sequences. Good human motion models are an important means of addressing this problem. In this work, we propose a hierarchical motion ...
Tracking human walking in dynamic scenes
ICIP '97: Proceedings of the 1997 International Conference on Image Processing (ICIP '97) 3-Volume Set-Volume 1 - Volume 1Extracting from a video sequence a representation for humans in motion has numerous applications. This task is difficult due to the complex nature of the human body which is non-rigid and capable of performing a wide variety of actions. We propose a ...





Comments