Abstract
Three-dimensional (3D) human-like body reconstruction via a single RGB image has attracted significant research attention recently. Most of the existing methods rely on the Skinned Multi-Person Linear model and thus can only predict unified human bodies. Moreover, meshes reconstructed by current methods sometimes perform well from a canonical view but not from other views, as the reconstruction process is commonly supervised by only a single view. To address these limitations, this article proposes a multi-view shape generation network for a 3D human-like body. Particularly, we propose a coarse-to-fine learning model that gradually deforms a template body toward the ground truth body. Our model utilizes the information of multi-view renderings and corresponding 3D vertex transformation as supervision. Such supervision will help to generate 3D bodies well aligned to all views. To accurately operate mesh deformation, a graph convolutional network structure is introduced to support the shape generation from 3D vertex representation. Additionally, a graph up-pooling operation is designed over the intermediate representations of the graph convolutional network, and thus our model can generate 3D shapes with higher resolution. Novel loss functions are employed to help optimize the whole multi-view generation model, resulting in smoother surfaces. In addition, two multi-view human body datasets are produced and contributed to the community. Extensive experiments conducted on the benchmark datasets demonstrate the efficacy of our model over the competitors.
- [1] . 2019. Learning to reconstruct people in clothing from a single RGB camera. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 1175–1186.Google Scholar
Cross Ref
- [2] . 2018. DensePose: Dense human pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR’19). 7297–7306.Google Scholar
Cross Ref
- [3] . 2014. 2D human pose estimation: New benchmark and state of the art analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14).Google Scholar
Digital Library
- [4] . 2019. Multi-Garment net: Learning to dress 3D people from images. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19). IEEE, Los Alamitos, CA.Google Scholar
Cross Ref
- [5] . 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In European Conference on Computer Vision. Springer, 561–578.Google Scholar
Cross Ref
- [6] . 2014. FAUST: Dataset and evaluation for 3D mesh registration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 3794–3801.Google Scholar
Digital Library
- [7] . 2008. Numerical Geometry of Non-rigid Shapes. Springer Science & Business Media.Google Scholar
Cross Ref
- [8] . 2017. Geometric deep learning: Going beyond Euclidean data. IEEE Signal Processing Magazine 34, 4 (2017), 18–42.Google Scholar
Cross Ref
- [9] . 2013. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013).Google Scholar
- [10] . 2019. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (2019), 172–186.Google Scholar
Digital Library
- [11] . 2015. ShapeNet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015).Google Scholar
- [12] . 2019. Pixel2Mesh++: Multi-View 3D mesh generation via deformation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV’19).Google Scholar
- [13] . 2020. Pose2Mesh: Graph convolutional network for 3D human pose and mesh recovery from a 2D human pose. In Proceedings of the European Conference on Computer Vision (ECCV’20).Google Scholar
Digital Library
- [14] . 2016. 3D-R2N2: A unified approach for single and multi-view 3d object reconstruction. In European Conference on Computer Vision. Springer, 628–644.Google Scholar
Cross Ref
- [15] . 2018. Blender—A 3D Modelling and Rendering Package. Stichting Blender Foundation, Amsterdam. http://www.blender.org.Google Scholar
- [16] . 2021. SMPLicit: Topology-aware generative model for clothed people. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11875–11885.Google Scholar
Cross Ref
- [17] . 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems. 3844–3852.Google Scholar
- [18] . 2017. A point set generation network for 3D object reconstruction from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 605–613.Google Scholar
Cross Ref
- [19] . n.d. Civilian American and European Surface Anthropometry Resource Project—CAESAR. Retrieved February 25, 2022 from http://store.sae.org/caesar/.Google Scholar
- [20] . 2002. Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99, 12 (2002), 7821–7826.Google Scholar
Cross Ref
- [21] . 1975. Generalized procrustes analysis. Psychometrika 40, 1 (1975), 33–51.Google Scholar
Cross Ref
- [22] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770–778.Google Scholar
Cross Ref
- [23] . 2016. DeeperCut: A deeper, stronger, and faster multi-person pose estimation model. In European Conference on Computer Vision. Springer, 34–50.Google Scholar
Cross Ref
- [24] . 2014. Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 7 (2014), 1325–1339.Google Scholar
Digital Library
- [25] . 2010. Clustered pose and nonlinear appearance models for human pose estimation. In Proceedings of the British Machine Vision Conference.
Google Scholar Cross Ref
- [26] . 2018. End-to-end recovery of human shape and pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 7122–7131.Google Scholar
Cross Ref
- [27] . 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).Google Scholar
- [28] . 2019. Convolutional mesh regression for single-image human shape reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 4501–4510.Google Scholar
Cross Ref
- [29] . 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097–1105.Google Scholar
Digital Library
- [30] . 2017. Unite the people: Closing the loop between 3D and 2D human representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 6050–6059.Google Scholar
Cross Ref
- [31] . 2003. Efficient implementation of Marching Cubes’ cases with topological guarantees. Journal of Graphics Tools 8, 2 (2003), 1–15.Google Scholar
Cross Ref
- [32] . 2019. Shape-aware human pose and shape reconstruction using multi-view images. In Proceedings of the International Conference on Computer Vision (ICCV’19).Google Scholar
Cross Ref
- [33] . 2015. SMPL: A skinned multi-person linear model. ACM Transactions on Graphics 34, 6 (2015), 248.Google Scholar
Digital Library
- [34] . 1987. Marching Cubes: A high resolution 3D surface construction algorithm. ACM SIGGRAPH Computer Graphics 21, 4 (1987), 163–169.Google Scholar
Digital Library
- [35] . 2016. Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision. Springer, 483–499.Google Scholar
Cross Ref
- [36] . 2016. Learning convolutional neural networks for graphs. In Proceedings of the International Conference on Machine Learning. 2014–2023.Google Scholar
- [37] . 2018. Neural body fitting: Unifying deep learning and model based human pose and shape estimation. In Proceedings of the 2018 International Conference on 3D Vision (3DV’18). IEEE, Los Alamitos, CA, 484–494.Google Scholar
Cross Ref
- [38] . 2019. Expressive body capture: 3D hands, face, and body from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19).Google Scholar
Cross Ref
- [39] . 2018. Learning to estimate 3D human pose and shape from a single color image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR’18). 459–468.Google Scholar
Cross Ref
- [40] . 2021. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9054–9063.Google Scholar
Cross Ref
- [41] . 2016. DeepCut: Joint subset partition and labeling for multi person pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR’16). 4929–4937.Google Scholar
Cross Ref
- [42] . 2019. 3DPeople: Modeling the geometry of dressed humans. In Proceedings of the International Conference on Computer Vision (ICCV’19).Google Scholar
Cross Ref
- [43] . 2019. PIFu: Pixel-Aligned implicit function for high-resolution clothed human digitization. arXiv preprint arXiv:1905.05172 (2019).Google Scholar
- [44] . 2020. PIFuHD: Multi-level pixel-aligned implicit function for high-resolution 3D human digitization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).Google Scholar
Cross Ref
- [45] . 2010. HumanEva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. International Journal of Computer Vision 87, 1-2 (2010), 4.Google Scholar
Digital Library
- [46] . 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- [47] . 2017. Self-supervised learning of motion capture. In Advances in Neural Information Processing Systems. 5236–5246.Google Scholar
- [48] . 2018. BodyNet: Volumetric inference of 3D human body shapes. In Proceedings of the European Conference on Computer Vision (ECCV’18). 20–36.Google Scholar
Digital Library
- [49] . 2018. Pixel2Mesh: Generating 3D mesh models from single RGB images. In Proceedings of the European Conference on Computer Vision (ECCV’18). 52–67.Google Scholar
Digital Library
- [50] . 2021. Survey on deep multi-modal data analytics: Collaboration, rivalry, and fusion. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 1s (2021), 1–25.Google Scholar
Digital Library
- [51] . 1994. Iterative point matching for registration of free-form curves and surfaces. International Journal of Computer Vision 13, 2 (1994), 119–152.Google Scholar
Digital Library
- [52] . 2019. Semantic graph convolutional networks for 3D human pose regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3425–3435.Google Scholar
Cross Ref
- [53] . 2021. PaMIR: Parametric model-conditioned implicit representation for image-based human reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence PP, 99 (2021), 1.Google Scholar
- [54] . 2019. Detailed human shape estimation from a single image by hierarchical mesh deformation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 4491–4500.Google Scholar
Cross Ref
Index Terms
Multi-view Shape Generation for a 3D Human-like Body
Recommendations
Multi-view Canonical Pose 3D Human Body Reconstruction Based on Volumetric TSDF
Computer Vision – ECCV 2022 WorkshopsAbstractIn this report, we present our solution for track1, multi-view based 3D human body reconstruction, of the ECCV 2022 WCPA Challenge: From Face, Body and Fashion to 3D Virtual Avatars 1. We developed a variant network based on TetraTSDF to ...
Unsupervised multi-view stereo network based on multi-stage depth estimation
AbstractIn current years, supervised learning multi-view stereo (MVS) methods have achieved impressive performance. However, these methods still suffer the limitation of hard to acquire large-scale depth supervision data, which hinders the ...
Highlights- We propose a novel unsupervised MVS network for dense 3D reconstruction.
- A ...
3D Human Body Reconstruction from a Single Image via Volumetric Regression
Computer Vision – ECCV 2018 WorkshopsAbstractThis paper proposes the use of an end-to-end Convolutional Neural Network for direct reconstruction of the 3D geometry of humans via volumetric regression. The proposed method does not require the fitting of a shape model and can be trained to ...






Comments