Abstract
Recovering the geometry of an object from a single depth image is an interesting yet challenging problem. While previous learning based approaches have demonstrated promising performance, they don’t fully explore spatial relationships of objects, which leads to unfaithful and incomplete 3D reconstruction. To address these issues, we propose a Spatial Relationship Preserving Adversarial Network (SRPAN) consisting of 3D Capsule Attention Generative Adversarial Network (3DCAGAN) and 2D Generative Adversarial Network (2DGAN) for coarse-to-fine 3D reconstruction from a single depth view of an object. Firstly, 3DCAGAN predicts the coarse geometry using an encoder-decoder based generator and a discriminator. The generator encodes the input as latent capsules represented as stacked activity vectors with local-to-global relationships (i.e., the contribution of components to the whole shape), and then decodes the capsules by modeling local-to-local relationships (i.e., the relationships among components) in an attention mechanism. Afterwards, 2DGAN refines the local geometry slice-by-slice, by using a generator learning a global structure prior as guidance, and stacked discriminators enforcing local geometric constraints. Experimental results show that SRPAN not only outperforms several state-of-the-art methods by a large margin on both synthetic datasets and real-world datasets, but also reconstructs unseen object categories with a higher accuracy.
- [1] . 2018. Full 3D reconstruction of non-rigidly deforming objects. ACM Trans. Multim. Comput. Commun. Appl. 14, 1s (2018), 1–23.Google Scholar
Digital Library
- [2] . 2021. Fostering generalization in single-view 3D reconstruction by learning a hierarchy of local and global shape priors. In CVPR. 15880–15889.Google Scholar
- [3] . 2019. Learning implicit fields for generative shape modeling. In CVPR. 5939–5948.Google Scholar
- [4] . 2016. 3D-R2N2: A unified approach for single and multi-view 3D object reconstruction. In ECCV. 628–644.Google Scholar
- [5] . 2017. Shape completion using 3D-encoder-predictor CNNs and shape synthesis. In CVPR. 6545–6554.Google Scholar
- [6] . 2017. Color-guided depth recovery via joint local structural and nonlocal low-rank regularization. IEEE Trans. Multim. 19, 2 (2017), 293–301.Google Scholar
Digital Library
- [7] . 2017. A point set generation network for 3D object reconstruction from a single image. In CVPR. 2463–2471.Google Scholar
- [8] . 2016. Structured prediction of unobserved voxels from a single depth image. In CVPR. 5431–5440.Google Scholar
- [9] . 2019. Point cloud and 3-D surface reconstruction using cylindrical millimeter-wave holography. IEEE Trans. Instrum. Meas. 68, 12 (2019), 4765–4778.Google Scholar
Cross Ref
- [10] . 2020. Complete scene reconstruction by merging images and laser scans. IEEE Trans. Circuits Syst. Video Technol. 30, 10 (2020), 3688–3701.Google Scholar
Cross Ref
- [11] . 2016. Learning a predictable and generative vector representation for objects. In ECCV. 484–499.Google Scholar
- [12] . 2018. A papier-Mâché approach to learning 3D surface generation. In CVPR. 216–224.Google Scholar
- [13] . 2020. Realistic procedural plant modeling from multiple view images. IEEE Trans. Vis. Comput. Graph 26, 2 (2020), 1372–1384.Google Scholar
Cross Ref
- [14] . 2015. An integrated framework for 3-D modeling, object detection, and pose estimation from point-clouds. IEEE Trans. Instrum. Meas. 64, 3 (2015), 683–693.Google Scholar
Cross Ref
- [15] . 2017. Weakly supervised 3D reconstruction with adversarial constraint. In 3DV. 263–272.Google Scholar
- [16] . 2017. High-resolution shape completion using deep neural networks for global structure and local geometry inference. In ICCV. 85–93.Google Scholar
- [17] . 2017. Video-based outdoor human reconstruction. IEEE Trans. Circuits Syst. Video Technol. 27, 4 (2017), 760–770.Google Scholar
Digital Library
- [18] . 2018. End-to-end recovery of human shape and pose. In CVPR. 7122–7131.Google Scholar
- [19] . 2018. Learning category-specific mesh reconstruction from image collections. In ECCV, Vol. 15. 386–402.Google Scholar
- [20] . 2015. Adam: A method for stochastic optimization. In ICLR. 1–15.Google Scholar
- [21] . 2017. Using locally corresponding CAD models for dense 3D reconstructions from a single image. In CVPR. 5603–5611.Google Scholar
- [22] . 2017. Shape completion from a single RGBD image. IEEE Trans. Vis. Comput. Graph 23, 7 (2017), 1809–1822.Google Scholar
Digital Library
- [23] . 2019. PU-GAN: A point cloud upsampling adversarial network. In ICCV. 7202–7211.Google Scholar
- [24] . 2015. Database-assisted object retrieval for real-time 3D reconstruction. Computer Graphics Forum 34, 2 (2015), 435–446.Google Scholar
Digital Library
- [25] . 2018. Learning efficient point cloud generation for dense 3D object reconstruction. In AAAI. 7114–7121.Google Scholar
- [26] . 2021. High-quality textured 3Dshape reconstruction with cascaded fully convolutional networks. IEEE Trans. Vis. Comput. Graph 21, 1 (2021), 83–97.Google Scholar
Digital Library
- [27] . 2021. Voxel structure-based mesh reconstruction from a 3D point cloud. IEEE Trans. Multim. (2021), 1–15.
DOI: .Google ScholarCross Ref
- [28] . 2014. Object detection and classification from large-scale cluttered indoor scans. Computer Graphics Forum 33, 2 (2014), 11–21.Google Scholar
Digital Library
- [29] . 2019. Occupancy networks: Learning 3D reconstruction in function space. In CVPR. 4460–4470.Google Scholar
- [30] . 2013. Screened Poisson surface reconstruction. ACM TOG 32, 3 (2013), 29:1–29:13.Google Scholar
Digital Library
- [31] . 2020. Few-shot single-view 3-D object reconstruction with compositional priors. In ECCV, Vol. 25. 614–630.Google Scholar
- [32] . 2015. RAPter: Rebuilding man-made scenes with regular arrangements of planes. ACM TOG 34, 4 (2015), 1–12.Google Scholar
Digital Library
- [33] . 2012. A search-classify approach for cluttered indoor scene understanding. ACM TOG 31, 6 (2012), 137:1-137:10.Google Scholar
Digital Library
- [34] . 2017. The shape variational autoencoder: A deep generative model of part-segmented 3D objects. Computer Graphics Forum 36, 5 (2017), 1–12.Google Scholar
Digital Library
- [35] . 2011. KinectFusion: Real-time dense surface mapping and tracking. In ISMAR. 127–136.Google Scholar
- [36] . 2019. DeepSDF: Learning continuous signed distance functions for shape representation. In CVPR. 165–174.Google Scholar
- [37] . 2019. Domain-adaptive single-view 3D reconstruction. In ICCV. 7637–7646.Google Scholar
- [38] . 2018. Image2Mesh: A learning framework for single image 3D reconstruction. In ACCV. 365–381.Google Scholar
- [39] . 2018. Generating 3D faces using convolutional mesh autoencoders. In ECCV, Vol. 3. 725–741.Google Scholar
- [40] . 2016. Unsupervised learning of 3D structure from images. In NIPS. 4996–5004.Google Scholar
- [41] . 2017. OctNetFusion: Learning depth fusion from data. In 3DV. 57–66.Google Scholar
- [42] . 2017. Dynamic routing between capsules. In NIPS. 3856–3866.Google Scholar
- [43] . 2016. VCONV-DAE: Deep volumetric shape learning without object labels. In ECCV. 236–250.Google Scholar
- [44] . 2016. Data-driven contextual modeling for 3D scene understanding. Comput. Graph. 55 (2016), 55–67.Google Scholar
Digital Library
- [45] . 2014. Approximate symmetry detection in partial 3D meshes. Computer Graphics Forum 33, 7 (2014), 131–140.Google Scholar
Digital Library
- [46] . 2017. Improved adversarial systems for 3D object generation and reconstruction. In CoRL. 87–96.Google Scholar
- [47] . 2017. Synthesizing 3D shapes via modeling multi-view depth maps and silhouettes with deep generative networks. In CVPR. 2511–2519.Google Scholar
- [48] . 2017. Semantic scene completion from a single depth image. In CVPR. 1746–1754.Google Scholar
- [49] . 2016. A symmetry prior for convex variational 3D reconstruction. In ECCV, Vol. 8. 313–328.Google Scholar
- [50] . 2019. A skeleton-bridged deep learning approach for generating meshes of complex topologies from single RGB images. In CVPR. 4541–4550.Google Scholar
- [51] . 2019. TopNet: Structural point cloud decoder. In CVPR. 383–392.Google Scholar
- [52] . 2017. Multi-view supervision for single-view reconstruction via differentiable ray consistency. In CVPR. 209–217.Google Scholar
- [53] . 2017. Shape completion enabled robotic grasping. In IROS. 2442–2447.Google Scholar
- [54] . 2017. Unsupervised 3D reconstruction from a single image via adversarial learning. arXiv:1711.09312.Google Scholar
- [55] . 2018. Pixel2Mesh: Generating 3D mesh models from single RGB images. In ECCV, Vol. 11. 55–71.Google Scholar
- [56] . 2017. Shape inpainting using 3D generative adversarial network and recurrent convolutional networks. In ICCV. 2298–2306.Google Scholar
- [57] . 2020. Cascaded refinement network for point cloud completion. In CVPR. 787–796.Google Scholar
- [58] . 2020. Point cloud completion by skip-attention network with hierarchical folding. In CVPR. 1936–1945.Google Scholar
- [59] . 2015. ElasticFusion: Dense SLAM without a pose graph. In Robotics. 1–9.Google Scholar
Cross Ref
- [60] . 2020. Voxel2Mesh: 3D mesh model generation from volumetric data. In MICCAI. 299–308.Google Scholar
- [61] . 2016. Single image 3D interpreter network. In ECCV. 365–382.Google Scholar
- [62] . 2016. Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In NIPS. 82–90.Google Scholar
- [63] . 2018. Learning shape priors for single-view 3D completion and reconstruction. In ECCV, Vol. 11. 673–691.Google Scholar
- [64] . 2015. 3D ShapeNets: A deep representation for volumetric shapes. In CVPR. 1912–1920.Google Scholar
- [65] . 2019. Pix2Vox: Context-aware 3D reconstruction from single and multi-view images. In ICCV. 2690–2698.Google Scholar
- [66] . 2020. Pix2Vox++: Multi-scale context-aware 3D object reconstruction from single and multiple images. Int. J. Comput. Vis. 128, 12 (2020), 2919–2935.Google Scholar
Digital Library
- [67] . 2020. GRNet: Gridding residual network for dense point cloud completion. In ECCV, Vol. 9. 365–381.Google Scholar
- [68] . 2019. DISN: Deep implicit surface network for high-quality single-view 3D reconstruction. In NIPS. 490–500.Google Scholar
- [69] . 2019. 3D object dense reconstruction from a single depth view. IEEE Trans. Pattern Anal. Mach. Intell. 41, 12 (2019), 2820–2834.Google Scholar
Cross Ref
- [70] . 2020. Robust attentional aggregation of deep feature sets for multi-view 3D reconstruction. Int. J. Comput. Vis. (2020), 53–73.Google Scholar
Digital Library
- [71] . 2017. 3D object reconstruction from a single depth view with adversarial learning. ICCV Workshop 112, 518 (2017), 679–688.Google Scholar
- [72] . 2018. Learning single-view 3D reconstruction with limited pose supervision. In ECCV, Vol. 15. 90–105.Google Scholar
- [73] . 2021. Single-view 3D object reconstruction from shape priors in memory. In CVPR. 3152–3161.Google Scholar
- [74] . 2020. Front2Back: Single view 3D shape reconstruction via front to back prediction. In CVPR. 528–537.Google Scholar
- [75] . 2018. PCN: Point completion network. In 3DV. 728–737.Google Scholar
- [76] . 2019. Self-attention generative adversarial networks. In ICML, Vol. 9. 7354–7363.Google Scholar
- [77] . 2020. PGNet: A part-based generative network for 3D object reconstruction. Knowledge Based System 194 (2020), 105574.Google Scholar
Cross Ref
- [78] . 2021. Learning anchored unsigned distance functions with gradient direction alignment for single-view garment reconstruction. In ICCV. 12674–12683.Google Scholar
- [79] . 2021. 3D-RVP: A method for 3D object reconstruction from a single depth view using voxel and point. Neurocomputing 430 (2021), 94–103.Google Scholar
Cross Ref
- [80] . 2019. 3-D reconstruction of human body shape from a single commodity depth camera. IEEE Trans. Multim. 21, 1 (2019), 114–123.Google Scholar
Digital Library
- [81] . 2019. 3D point capsule networks. In CVPR. 1009–1018.Google Scholar
- [82] . 2020. Quaternion equivariant capsule networks for 3D point clouds. In ECCV. 1–19.Google Scholar
- [83] . 2017. 3D-PRNN: Generating shape primitives with recurrent neural networks. In ICCV, Vol. 2. 900–909.Google Scholar
- [84] . 2021. An effective loss function for generating 3D models from single 2D image without rendering. In Artificial Intelligence Applications and Innovations (AIAI). 309–322.Google Scholar
Index Terms
A Spatial Relationship Preserving Adversarial Network for 3D Reconstruction from a Single Depth View
Recommendations
Dense 3D reconstruction combining depth and RGB information
Dense 3D reconstruction has important applications in many fields. The existing depth information based methods are typically constrained in their effective camera-object distance which should be from 0.4m to 4m. We present a novel method that can ...
Self-supervised reflectance-guided 3d shape reconstruction from single-view images
Abstract3D shape reconstruction from a single-view image is an utterly ill-posed and challenging problem, while multi-view methods can reconstruct an object’s shape only from raw images. However, these raw images should be shot in a static scene, to ...
Single-View RGBD-Based Reconstruction of Dynamic Human Geometry
ICCVW '13: Proceedings of the 2013 IEEE International Conference on Computer Vision WorkshopsWe present a method for reconstructing the geometry and appearance of indoor scenes containing dynamic human subjects using a single (optionally moving) RGBD sensor. We introduce a framework for building a representation of the articulated scene ...






Comments