skip to main content
research-article

Multi-scale Edge-guided Learning for 3D Reconstruction

Published:25 February 2023Publication History
Skip Abstract Section

Abstract

Single-view three-dimensional (3D) object reconstruction has always been a long-term challenging task. Objects with complex topologies are hard to accurately reconstruct, which makes existing methods suffer from blurring of shape boundaries between multiple components in the object. Moreover, most of them cannot balance learning between global geometric structure information and local detail information. In this article, we propose a multi-scale edge-guided learning network (MEGLN) to utilize the global edge information guiding the network to better capture and recover local details. The goal is to exploit the multi-scale learning strategy to learn global edge information and local details, thus achieving robust 3D object reconstruction. We first design a multi-scale Gaussian difference block (MGDB) to extract global edge geometry features for input images of different scales and adopt the attention mechanism to aggregate the extracted global edge geometry features of different scales. Second, we design a multi-scale feature interaction block (MFIB) to learn local details, which utilizes the multi-scale feature interaction to capture the features of multiple objects or components at multiple scales. The MFIB can learn and capture better as much local detail information as possible under the guidance of global edge information. Finally, we dynamically fuse the predicted probabilities of the MGDB and MFIB to obtain the final predicted result, which makes our MEGLN able to recover 3D shapes with global complex topological structures and rich local details via the multi-scale learning strategy. Extensive qualitative and quantitative experimental results on the ShapeNet dataset demonstrate that our approach achieves competitive performance compared with state-of-the-art methods. Code is available at https://github.com/Ray-tju/MEGLN.

REFERENCES

  1. [1] Chang Angel X., Funkhouser Thomas, Guibas Leonidas, Hanrahan Pat, Huang Qixing, Li Zimo, Savarese Silvio, Savva Manolis, Song Shuran, Su Hao, et al. 2015. ShapeNet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015).Google ScholarGoogle Scholar
  2. [2] Chen Liang-Chieh, Papandreou George, Kokkinos Iasonas, Murphy Kevin, and Yuille Alan L.. 2017. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 4 (2017), 834848.Google ScholarGoogle Scholar
  3. [3] Chen Liang-Chieh, Papandreou George, Schroff Florian, and Adam Hartwig. 2017. Rethinking atrous convolution for semantic image segmentation. 6 (2017). arXiv preprint arXiv:1706.05587.Google ScholarGoogle Scholar
  4. [4] Chen Zhiqin and Zhang Hao. 2019. Learning implicit fields for generative shape modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE, Long Beach, CA, 5932–5941. Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Choy Christopher B., Xu Danfei, Gwak JunYoung, Chen Kevin, and Savarese Silvio. 2016. 3D-R2N2: A unified approach for single and multi-view 3D object reconstruction. In European Conference on Computer Vision. Springer International Publishing, Amsterdam, 628–644.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Vries Harm De, Strub Florian, Mary Jérémie, Larochelle Hugo, Pietquin Olivier, and Courville Aaron. 2017. Modulating early visual processing by language. Advances in Neural Information Processing Systems 30 (2017).Google ScholarGoogle Scholar
  7. [7] Drucker Harris and Cun Yann Le. 1992. Improving generalization performance using double backpropagation. IEEE Transactions on Neural Networks 3, 6 (1992), 991997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Dumoulin Vincent, Belghazi Ishmael, Poole Ben, Mastropietro Olivier, Lamb Alex, Arjovsky Martin, and Courville Aaron. 2016. Adversarially learned inference. arXiv preprint arXiv:1606.00704 (2016).Google ScholarGoogle Scholar
  9. [9] Fan Haoqiang, Su Hao, and Guibas Leonidas. 2017. A point set generation network for 3D object reconstruction from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, Honolulu, HI, 2463–2471. Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Garland Michael and Heckbert Paul S.. 1998. Simplifying surfaces with color and texture using quadric error metrics. In Proceedings Visualization’98 (Cat. No. 98CB36276). IEEE, Research Triangle Park, NC, USA, 263–269. Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Groueix Thibault, Fisher Matthew, Kim Vladimir G., Russell Bryan C., and Aubry Mathieu. 2018. A papier-mâché approach to learning 3D surface generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, Salt Lake City, UT, 216–224. Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Guiducci Antonio. 1998. 3D road reconstruction from a single view. Computer Vision and Image Understanding 70, 2 (1998), 212226.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 9 (2015), 19041916.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE, Las Vegas, NV, 770–778. Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Emmanuel Prados and Olivier Faugeras. 2006. Shape from shading. In Proceeding of the Handbook of Mathematical Models in Computer Vision. Springer, 375–388.Google ScholarGoogle Scholar
  16. [16] Kanazawa Angjoo, Jacobs David W., and Chandraker Manmohan. 2016. WarpNet: Weakly supervised matching for single-view reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE, Las Vegas, NV, 3253–3261. Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Kar Abhishek, Tulsiani Shubham, Carreira Joao, and Malik Jitendra. 2015. Category-specific object reconstruction from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE, Boston, MA, 1966–1974. Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Karsch Kevin, Liu Ce, and Kang Sing Bing. 2014. Depth transfer: Depth extraction from video using non-parametric sampling. In Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 11 (2014), 21442158.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Kato Hiroharu, Ushiku Yoshitaka, and Harada Tatsuya. 2018. Neural 3D mesh renderer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, Salt Lake City, UT, 3907–3916. Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Kingma Diederik P. and Ba Jimmy. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  21. [21] Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E.. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84–90.Google ScholarGoogle Scholar
  22. [22] LeCun Yann, Bengio Yoshua, and Hinton Geoffrey. 2015. Deep learning. Nature 521, 7553 (2015), 436444.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Lei Li, Hao Xu, and Su-Ping Wu. 2022. Fuzzy probability points reasoning for 3D reconstruction via deep deterministic policy gradient. Acta Automatica Sinica 48, 4 (2022), 1105–1118. Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Li Lei, Li Xiangzheng, Wu Kangbo, Lin Kui, and Wu Suping. 2021. Multi-granularity feature interaction and relation reasoning for 3D dense alignment and face reconstruction. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’21). IEEE, Toronto, ON, 4265–4269. Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Li Lei and Wu Suping. 2021. DmifNet: 3D shape reconstruction based on dynamic multi-branch information fusion. In Proceedings of the 25th International Conference on Pattern Recognition (ICPR’20). IEEE, Milan, 7219–7225. Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Li Yu, Wang Tao, Kang Bingyi, Tang Sheng, Wang Chunfeng, Li Jintao, and Feng Jiashi. 2020. Overcoming classifier imbalance for long-tail object detection with balanced group softmax. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, 10988–10997. Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Lin Chen-Hsuan, Kong Chen, and Lucey Simon. 2018. Learning efficient point cloud generation for dense 3D object reconstruction. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. AAAI, New Orleans.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Lin Tsung-Yi, Dollár Piotr, Girshick Ross, He Kaiming, Hariharan Bharath, and Belongie Serge. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, Honolulu, HI, 936–944. Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Liu Shu, Qi Lu, Qin Haifang, Shi Jianping, and Jia Jiaya. 2018. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, Salt Lake City, UT, 8759–8768. Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Lorensen William E. and Cline Harvey E.. 1987. Marching cubes: A high resolution 3D surface construction algorithm. ACM Siggraph Computer Graphics 21, 4 (1987), 163169.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Mescheder Lars, Oechsle Michael, Niemeyer Michael, Nowozin Sebastian, and Geiger Andreas. 2019. Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE, Long Beach, CA, 4455–4465. Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Michalkiewicz Mateusz, Belilovsky Eugene, Baktashmotlagh Mahsa, and Eriksson Anders. 2020. A simple and scalable shape representation for 3D reconstruction. arXiv preprint arXiv:2005.04623 (2020).Google ScholarGoogle Scholar
  33. [33] Mitsumoto Hiroshi, Tamura Shinichi, Okazaki Kozo, Kajimi Naoki, and Fukui Yutaka. 1992. 3-D reconstruction using mirror images based on a plane symmetry recovering method. IEEE Computer Architecture Letters 14, 09 (1992), 941946.Google ScholarGoogle Scholar
  34. [34] Newell Alejandro, Yang Kaiyu, and Deng Jia. 2016. Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision. Springer, Amsterdam, 483–499.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Song Hyun Oh, Xiang Yu, Jegelka Stefanie, and Savarese Silvio. 2016. Deep metric learning via lifted structured feature embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE, Las Vegas, NV, 4004–4012. Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Oswald Martin R., Töppe Eno, and Cremers Daniel. 2012. Fast and globally optimal single view reconstruction of curved objects. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). IEEE, Providence, RI, 534–541.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Pontes Jhony K., Kong Chen, Sridharan Sridha, Lucey Simon, Eriksson Anders, and Fookes Clinton. 2018. Image2Mesh: A learning framework for single image 3D reconstruction. In Asian Conference on Computer Vision. Springer, Perth, 365–381.Google ScholarGoogle Scholar
  38. [38] Richter Stephan R. and Roth Stefan. 2018. Matryoshka networks: Predicting 3D geometry via nested shape layers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, Salt Lake City, UT, 1936–1944. Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Saxena Ashutosh, Chung Sung H., Ng Andrew Y., et al. 2005. Learning depth from single monocular images. In Advances in Neural Information Processing Systems, Vol. 18. MIT Press, Vancouver, 1–8.Google ScholarGoogle Scholar
  40. [40] Scarselli Franco, Gori Marco, Tsoi Ah Chung, Hagenbuchner Markus, and Monfardini Gabriele. 2008. The graph neural network model. IEEE Transactions on Neural Networks 20, 1 (2008), 6180.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Sun Xiaofeng, Shen Shuhan, Cui Hainan, Hu Lihua, and Hu Zhanyi. 2017. Geographic, geometrical and semantic reconstruction of urban scene from high resolution oblique aerial images. IEEE/CAA Journal of Automatica Sinica 6, 1 (2017), 118130.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Sun Yongbin, Liu Ziwei, Wang Yue, and Sarma Sanjay E.. 2018. Im2Avatar: Colorful 3D reconstruction from a single image. arXiv preprint arXiv:1804.06375 abs/1804.06375 (2018).Google ScholarGoogle Scholar
  43. [43] Tatarchenko Maxim, Dosovitskiy Alexey, and Brox Thomas. 2017. Octree generating networks: Efficient convolutional architectures for high-resolution 3D outputs. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, Venice, 2107–2115. Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017).Google ScholarGoogle Scholar
  45. [45] Wang Nanyang, Zhang Yinda, Li Zhuwen, Fu Yanwei, Liu Wei, and Jiang Yu-Gang. 2018. Pixel2mesh: Generating 3D mesh models from single RGB images. In Proceedings of the European Conference on Computer Vision (ECCV’18). Springer, Munich, 52–67.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Wang Xiaolong, Girshick Ross, Gupta Abhinav, and He Kaiming. 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, Salt Lake City, UT, 7794–7803. Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Wu Jiajun, Wang Yifan, Xue Tianfan, Sun Xingyuan, Freeman William T., and Tenenbaum Joshua B.. 2017. MarrNet: 3D shape reconstruction via 2.5 D sketches. Advances in Neural Information Processing Systems 30 (2017).Google ScholarGoogle Scholar
  48. [48] Wu Jiajun, Zhang Chengkai, Xue Tianfan, Freeman William T., and Tenenbaum Joshua B.. 2016. Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. Advances in Neural Information Processing Systems 29 (2016).Google ScholarGoogle Scholar
  49. [49] Wu Jiajun, Zhang Chengkai, Zhang Xiuming, Zhang Zhoutong, Freeman William T., and Tenenbaum Joshua B.. 2018. Learning shape priors for single-view 3D completion and reconstruction. In Proceedings of the European Conference on Computer Vision (ECCV’18). Springer, Munich, 646–662.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Wu Zhirong, Song Shuran, Khosla Aditya, Yu Fisher, Zhang Linguang, Tang Xiaoou, and Xiao Jianxiong. 2015. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE, Boston, MA, USA, 1912–1920. Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Xu Qiangeng, Wang Weiyue, Ceylan Duygu, Mech Radomir, and Neumann Ulrich. 2019. DISN: Deep implicit surface network for high-quality single-view 3D reconstruction. Advances in Neural Information Processing Systems 32 (2019).Google ScholarGoogle Scholar
  52. [52] Yang Liang, Li Bing, Li Wei, Brand Howard, Jiang Biao, and Xiao Jizhong. 2020. Concrete defects inspection and 3D mapping using CityFlyer quadrotor robot. IEEE/CAA Journal of Automatica Sinica 7, 4 (2020), 9911002.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Zhang Dong, Zhang Hanwang, Tang Jinhui, Wang Meng, Hua Xiansheng, and Sun Qianru. 2020. Feature pyramid transformer. In European Conference on Computer Vision. Springer, Glasgow, UK, 323–339.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Zhang Li, Dugas-Phocion Guillaume, Samson Jean-Sebastien, and Seitz Steven M.. 2002. Single-view modelling of free-form scenes. The Journal of Visualization and Computer Animation 13, 4 (2002), 225235.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Zhang Ruo, Tsai Ping-Sing, Cryer James Edwin, and Shah Mubarak. 1999. Shape-from-shading: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 21, 8 (1999), 690706.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Zhang Xiuming, Zhang Zhoutong, Zhang Chengkai, Tenenbaum Joshua B., Freeman William T., and Wu Jiajun. 2018. Learning to reconstruct shapes from unseen classes. Advances in Neural Information Processing Systems (2018), 2263–2274.Google ScholarGoogle Scholar
  57. [57] Zhao Hengshuang, Shi Jianping, Qi Xiaojuan, Wang Xiaogang, and Jia Jiaya. 2017. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, Honolulu, HI, 6230–6239. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Multi-scale Edge-guided Learning for 3D Reconstruction

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 3
      May 2023
      514 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3582886
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 February 2023
      • Online AM: 20 October 2022
      • Accepted: 8 October 2022
      • Revised: 17 August 2022
      • Received: 30 November 2021
      Published in tomm Volume 19, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!