Abstract
Single-view three-dimensional (3D) object reconstruction has always been a long-term challenging task. Objects with complex topologies are hard to accurately reconstruct, which makes existing methods suffer from blurring of shape boundaries between multiple components in the object. Moreover, most of them cannot balance learning between global geometric structure information and local detail information. In this article, we propose a multi-scale edge-guided learning network (MEGLN) to utilize the global edge information guiding the network to better capture and recover local details. The goal is to exploit the multi-scale learning strategy to learn global edge information and local details, thus achieving robust 3D object reconstruction. We first design a multi-scale Gaussian difference block (MGDB) to extract global edge geometry features for input images of different scales and adopt the attention mechanism to aggregate the extracted global edge geometry features of different scales. Second, we design a multi-scale feature interaction block (MFIB) to learn local details, which utilizes the multi-scale feature interaction to capture the features of multiple objects or components at multiple scales. The MFIB can learn and capture better as much local detail information as possible under the guidance of global edge information. Finally, we dynamically fuse the predicted probabilities of the MGDB and MFIB to obtain the final predicted result, which makes our MEGLN able to recover 3D shapes with global complex topological structures and rich local details via the multi-scale learning strategy. Extensive qualitative and quantitative experimental results on the ShapeNet dataset demonstrate that our approach achieves competitive performance compared with state-of-the-art methods. Code is available at https://github.com/Ray-tju/MEGLN.
- [1] . 2015. ShapeNet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015).Google Scholar
- [2] . 2017. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 4 (2017), 834–848.Google Scholar
- [3] . 2017. Rethinking atrous convolution for semantic image segmentation. 6 (2017). arXiv preprint arXiv:1706.05587.Google Scholar
- [4] . 2019. Learning implicit fields for generative shape modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE, Long Beach, CA, 5932–5941. Google Scholar
Cross Ref
- [5] . 2016. 3D-R2N2: A unified approach for single and multi-view 3D object reconstruction. In European Conference on Computer Vision. Springer International Publishing, Amsterdam, 628–644.Google Scholar
Cross Ref
- [6] . 2017. Modulating early visual processing by language. Advances in Neural Information Processing Systems 30 (2017).Google Scholar
- [7] . 1992. Improving generalization performance using double backpropagation. IEEE Transactions on Neural Networks 3, 6 (1992), 991–997.Google Scholar
Digital Library
- [8] . 2016. Adversarially learned inference. arXiv preprint arXiv:1606.00704 (2016).Google Scholar
- [9] . 2017. A point set generation network for 3D object reconstruction from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, Honolulu, HI, 2463–2471. Google Scholar
Cross Ref
- [10] . 1998. Simplifying surfaces with color and texture using quadric error metrics. In Proceedings Visualization’98 (Cat. No. 98CB36276). IEEE, Research Triangle Park, NC, USA, 263–269. Google Scholar
Cross Ref
- [11] . 2018. A papier-mâché approach to learning 3D surface generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, Salt Lake City, UT, 216–224. Google Scholar
Cross Ref
- [12] . 1998. 3D road reconstruction from a single view. Computer Vision and Image Understanding 70, 2 (1998), 212–226.Google Scholar
Digital Library
- [13] . 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 9 (2015), 1904–1916.Google Scholar
Digital Library
- [14] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE, Las Vegas, NV, 770–778. Google Scholar
Cross Ref
- [15] Emmanuel Prados and Olivier Faugeras. 2006. Shape from shading. In Proceeding of the Handbook of Mathematical Models in Computer Vision. Springer, 375–388.Google Scholar
- [16] . 2016. WarpNet: Weakly supervised matching for single-view reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE, Las Vegas, NV, 3253–3261. Google Scholar
Cross Ref
- [17] . 2015. Category-specific object reconstruction from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE, Boston, MA, 1966–1974. Google Scholar
Cross Ref
- [18] . 2014. Depth transfer: Depth extraction from video using non-parametric sampling. In Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 11 (2014), 2144–2158.Google Scholar
Cross Ref
- [19] . 2018. Neural 3D mesh renderer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, Salt Lake City, UT, 3907–3916. Google Scholar
Cross Ref
- [20] . 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- [21] . 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84–90.Google Scholar
- [22] . 2015. Deep learning. Nature 521, 7553 (2015), 436–444.Google Scholar
Cross Ref
- [23] Lei Li, Hao Xu, and Su-Ping Wu. 2022. Fuzzy probability points reasoning for 3D reconstruction via deep deterministic policy gradient. Acta Automatica Sinica 48, 4 (2022), 1105–1118. Google Scholar
Cross Ref
- [24] . 2021. Multi-granularity feature interaction and relation reasoning for 3D dense alignment and face reconstruction. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’21). IEEE, Toronto, ON, 4265–4269. Google Scholar
Cross Ref
- [25] . 2021. DmifNet: 3D shape reconstruction based on dynamic multi-branch information fusion. In Proceedings of the 25th International Conference on Pattern Recognition (ICPR’20). IEEE, Milan, 7219–7225. Google Scholar
Cross Ref
- [26] . 2020. Overcoming classifier imbalance for long-tail object detection with balanced group softmax. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, 10988–10997. Google Scholar
Cross Ref
- [27] . 2018. Learning efficient point cloud generation for dense 3D object reconstruction. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. AAAI, New Orleans.Google Scholar
Cross Ref
- [28] . 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, Honolulu, HI, 936–944. Google Scholar
Cross Ref
- [29] . 2018. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, Salt Lake City, UT, 8759–8768. Google Scholar
Cross Ref
- [30] . 1987. Marching cubes: A high resolution 3D surface construction algorithm. ACM Siggraph Computer Graphics 21, 4 (1987), 163–169.Google Scholar
Digital Library
- [31] . 2019. Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE, Long Beach, CA, 4455–4465. Google Scholar
Cross Ref
- [32] . 2020. A simple and scalable shape representation for 3D reconstruction. arXiv preprint arXiv:2005.04623 (2020).Google Scholar
- [33] . 1992. 3-D reconstruction using mirror images based on a plane symmetry recovering method. IEEE Computer Architecture Letters 14, 09 (1992), 941–946.Google Scholar
- [34] . 2016. Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision. Springer, Amsterdam, 483–499.Google Scholar
Cross Ref
- [35] . 2016. Deep metric learning via lifted structured feature embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE, Las Vegas, NV, 4004–4012. Google Scholar
Cross Ref
- [36] . 2012. Fast and globally optimal single view reconstruction of curved objects. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). IEEE, Providence, RI, 534–541.Google Scholar
Cross Ref
- [37] . 2018. Image2Mesh: A learning framework for single image 3D reconstruction. In Asian Conference on Computer Vision. Springer, Perth, 365–381.Google Scholar
- [38] . 2018. Matryoshka networks: Predicting 3D geometry via nested shape layers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, Salt Lake City, UT, 1936–1944. Google Scholar
Cross Ref
- [39] . 2005. Learning depth from single monocular images. In Advances in Neural Information Processing Systems, Vol. 18. MIT Press, Vancouver, 1–8.Google Scholar
- [40] . 2008. The graph neural network model. IEEE Transactions on Neural Networks 20, 1 (2008), 61–80.Google Scholar
Digital Library
- [41] . 2017. Geographic, geometrical and semantic reconstruction of urban scene from high resolution oblique aerial images. IEEE/CAA Journal of Automatica Sinica 6, 1 (2017), 118–130.Google Scholar
Cross Ref
- [42] . 2018. Im2Avatar: Colorful 3D reconstruction from a single image. arXiv preprint arXiv:1804.06375 abs/1804.06375 (2018).Google Scholar
- [43] . 2017. Octree generating networks: Efficient convolutional architectures for high-resolution 3D outputs. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, Venice, 2107–2115. Google Scholar
Cross Ref
- [44] . 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017).Google Scholar
- [45] . 2018. Pixel2mesh: Generating 3D mesh models from single RGB images. In Proceedings of the European Conference on Computer Vision (ECCV’18). Springer, Munich, 52–67.Google Scholar
Digital Library
- [46] . 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, Salt Lake City, UT, 7794–7803. Google Scholar
Cross Ref
- [47] . 2017. MarrNet: 3D shape reconstruction via 2.5 D sketches. Advances in Neural Information Processing Systems 30 (2017).Google Scholar
- [48] . 2016. Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. Advances in Neural Information Processing Systems 29 (2016).Google Scholar
- [49] . 2018. Learning shape priors for single-view 3D completion and reconstruction. In Proceedings of the European Conference on Computer Vision (ECCV’18). Springer, Munich, 646–662.Google Scholar
Digital Library
- [50] . 2015. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE, Boston, MA, USA, 1912–1920. Google Scholar
Cross Ref
- [51] . 2019. DISN: Deep implicit surface network for high-quality single-view 3D reconstruction. Advances in Neural Information Processing Systems 32 (2019).Google Scholar
- [52] . 2020. Concrete defects inspection and 3D mapping using CityFlyer quadrotor robot. IEEE/CAA Journal of Automatica Sinica 7, 4 (2020), 991–1002.Google Scholar
Cross Ref
- [53] . 2020. Feature pyramid transformer. In European Conference on Computer Vision. Springer, Glasgow, UK, 323–339.Google Scholar
Digital Library
- [54] . 2002. Single-view modelling of free-form scenes. The Journal of Visualization and Computer Animation 13, 4 (2002), 225–235.Google Scholar
Cross Ref
- [55] . 1999. Shape-from-shading: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 21, 8 (1999), 690–706.Google Scholar
Digital Library
- [56] . 2018. Learning to reconstruct shapes from unseen classes. Advances in Neural Information Processing Systems (2018), 2263–2274.Google Scholar
- [57] . 2017. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, Honolulu, HI, 6230–6239. Google Scholar
Cross Ref
Index Terms
Multi-scale Edge-guided Learning for 3D Reconstruction
Recommendations
3D surface reconstruction from multiview photographic images using 2D edge contours
Most techniques for reconstructing 3D shapes from multi-view 2D photographic images require a large number of images. In this paper, we present a new method for reconstructing 3D surfaces, represented by sets of polygons, using a small number, e.g. 10, ...
High-fidelity 3D face reconstruction with multi-scale details
Highlights- We propose a novel cascaded framework to reconstruct 3D faces with subtle details.
AbstractDespite tremendous success has been achieved in faithfully reconstructing face shapes from single images, recovering accurate local details still remains challenging. Previous works propose reprojection-based methods to improve the ...
Digging into the multi-scale structure for a more refined depth map and 3D reconstruction
AbstractExtracting dense depth from a single image is an important yet challenging computer vision task. Compared with stereo depth estimation, sensing the depth of a scene from monocular images is much more difficult and ambiguous because the epipolar ...






Comments