skip to main content
research-article

Multi-view Shape Generation for a 3D Human-like Body

Authors Info & Claims
Published:05 January 2023Publication History
Skip Abstract Section

Abstract

Three-dimensional (3D) human-like body reconstruction via a single RGB image has attracted significant research attention recently. Most of the existing methods rely on the Skinned Multi-Person Linear model and thus can only predict unified human bodies. Moreover, meshes reconstructed by current methods sometimes perform well from a canonical view but not from other views, as the reconstruction process is commonly supervised by only a single view. To address these limitations, this article proposes a multi-view shape generation network for a 3D human-like body. Particularly, we propose a coarse-to-fine learning model that gradually deforms a template body toward the ground truth body. Our model utilizes the information of multi-view renderings and corresponding 3D vertex transformation as supervision. Such supervision will help to generate 3D bodies well aligned to all views. To accurately operate mesh deformation, a graph convolutional network structure is introduced to support the shape generation from 3D vertex representation. Additionally, a graph up-pooling operation is designed over the intermediate representations of the graph convolutional network, and thus our model can generate 3D shapes with higher resolution. Novel loss functions are employed to help optimize the whole multi-view generation model, resulting in smoother surfaces. In addition, two multi-view human body datasets are produced and contributed to the community. Extensive experiments conducted on the benchmark datasets demonstrate the efficacy of our model over the competitors.

REFERENCES

  1. [1] Alldieck Thiemo, Magnor Marcus, Bhatnagar Bharat Lal, Theobalt Christian, and Pons-Moll Gerard. 2019. Learning to reconstruct people in clothing from a single RGB camera. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 11751186.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Güler Rıza Alp, Neverova Natalia, and Kokkinos Iasonas. 2018. DensePose: Dense human pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR’19). 72977306.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Andriluka Mykhaylo, Pishchulin Leonid, Gehler Peter, and Schiele Bernt. 2014. 2D human pose estimation: New benchmark and state of the art analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14).Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Bhatnagar Bharat Lal, Tiwari Garvita, Theobalt Christian, and Pons-Moll Gerard. 2019. Multi-Garment net: Learning to dress 3D people from images. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19). IEEE, Los Alamitos, CA.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Bogo Federica, Kanazawa Angjoo, Lassner Christoph, Gehler Peter, Romero Javier, and Black Michael J.. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In European Conference on Computer Vision. Springer, 561578.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Bogo Federica, Romero Javier, Loper Matthew, and Black Michael J.. 2014. FAUST: Dataset and evaluation for 3D mesh registration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 37943801.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Bronstein Alexander M., Bronstein Michael M., and Kimmel Ron. 2008. Numerical Geometry of Non-rigid Shapes. Springer Science & Business Media.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Bronstein Michael M., Bruna Joan, LeCun Yann, Szlam Arthur, and Vandergheynst Pierre. 2017. Geometric deep learning: Going beyond Euclidean data. IEEE Signal Processing Magazine 34, 4 (2017), 1842.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Bruna Joan, Zaremba Wojciech, Szlam Arthur, and LeCun Yann. 2013. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013).Google ScholarGoogle Scholar
  10. [10] Cao Z., Martinez G. Hidalgo, Simon T., Wei S., and Sheikh Y. A.. 2019. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (2019), 172–186.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Chang Angel X., Funkhouser Thomas, Guibas Leonidas, Hanrahan Pat, Huang Qixing, Li Zimo, Savarese Silvio, et al. 2015. ShapeNet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015).Google ScholarGoogle Scholar
  12. [12] Zhang Zhuwen Li Yanwei Fu Chao Wen, Yinda. 2019. Pixel2Mesh++: Multi-View 3D mesh generation via deformation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV’19).Google ScholarGoogle Scholar
  13. [13] Choi Hongsuk, Moon Gyeongsik, and Lee Kyoung Mu. 2020. Pose2Mesh: Graph convolutional network for 3D human pose and mesh recovery from a 2D human pose. In Proceedings of the European Conference on Computer Vision (ECCV’20).Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Choy Christopher B., Xu Danfei, Gwak JunYoung, Chen Kevin, and Savarese Silvio. 2016. 3D-R2N2: A unified approach for single and multi-view 3d object reconstruction. In European Conference on Computer Vision. Springer, 628644.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Community Blender Online. 2018. Blender—A 3D Modelling and Rendering Package. Stichting Blender Foundation, Amsterdam. http://www.blender.org.Google ScholarGoogle Scholar
  16. [16] Corona Enric, Pumarola Albert, Alenya Guillem, Pons-Moll Gerard, and Moreno-Noguer Francesc. 2021. SMPLicit: Topology-aware generative model for clothed people. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1187511885.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Defferrard Michaël, Bresson Xavier, and Vandergheynst Pierre. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems. 38443852.Google ScholarGoogle Scholar
  18. [18] Fan Haoqiang, Su Hao, and Guibas Leonidas J.. 2017. A point set generation network for 3D object reconstruction from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 605613.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Foundation The SAE. n.d. Civilian American and European Surface Anthropometry Resource Project—CAESAR. Retrieved February 25, 2022 from http://store.sae.org/caesar/.Google ScholarGoogle Scholar
  20. [20] Girvan Michelle and Newman Mark E. J.. 2002. Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99, 12 (2002), 78217826.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Gower John C.. 1975. Generalized procrustes analysis. Psychometrika 40, 1 (1975), 3351.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770778.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Insafutdinov Eldar, Pishchulin Leonid, Andres Bjoern, Andriluka Mykhaylo, and Schiele Bernt. 2016. DeeperCut: A deeper, stronger, and faster multi-person pose estimation model. In European Conference on Computer Vision. Springer, 3450.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Ionescu Catalin, Papava Dragos, Olaru Vlad, and Sminchisescu Cristian. 2014. Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 7 (2014), 13251339.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Johnson Sam and Everingham Mark. 2010. Clustered pose and nonlinear appearance models for human pose estimation. In Proceedings of the British Machine Vision Conference. Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Kanazawa Angjoo, Black Michael J., Jacobs David W., and Malik Jitendra. 2018. End-to-end recovery of human shape and pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 71227131.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Kipf Thomas N. and Welling Max. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).Google ScholarGoogle Scholar
  28. [28] Kolotouros Nikos, Pavlakos Georgios, and Daniilidis Kostas. 2019. Convolutional mesh regression for single-image human shape reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 45014510.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E.. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 10971105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Lassner Christoph, Romero Javier, Kiefel Martin, Bogo Federica, Black Michael J., and Gehler Peter V.. 2017. Unite the people: Closing the loop between 3D and 2D human representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 60506059.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Lewiner Thomas, Lopes Hélio, Vieira Antônio Wilson, and Tavares Geovan. 2003. Efficient implementation of Marching Cubes’ cases with topological guarantees. Journal of Graphics Tools 8, 2 (2003), 115.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Liang Junbang and Lin Ming C.. 2019. Shape-aware human pose and shape reconstruction using multi-view images. In Proceedings of the International Conference on Computer Vision (ICCV’19).Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Loper Matthew, Mahmood Naureen, Romero Javier, Pons-Moll Gerard, and Black Michael J.. 2015. SMPL: A skinned multi-person linear model. ACM Transactions on Graphics 34, 6 (2015), 248.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Lorensen William E. and Cline Harvey E.. 1987. Marching Cubes: A high resolution 3D surface construction algorithm. ACM SIGGRAPH Computer Graphics 21, 4 (1987), 163169.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Newell Alejandro, Yang Kaiyu, and Deng Jia. 2016. Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision. Springer, 483499.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Niepert Mathias, Ahmed Mohamed, and Kutzkov Konstantin. 2016. Learning convolutional neural networks for graphs. In Proceedings of the International Conference on Machine Learning. 20142023.Google ScholarGoogle Scholar
  37. [37] Omran Mohamed, Lassner Christoph, Pons-Moll Gerard, Gehler Peter, and Schiele Bernt. 2018. Neural body fitting: Unifying deep learning and model based human pose and shape estimation. In Proceedings of the 2018 International Conference on 3D Vision (3DV’18). IEEE, Los Alamitos, CA, 484494.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Pavlakos Georgios, Choutas Vasileios, Ghorbani Nima, Bolkart Timo, Osman Ahmed A. A., Tzionas Dimitrios, and Black Michael J.. 2019. Expressive body capture: 3D hands, face, and body from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19).Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Pavlakos Georgios, Zhu Luyang, Zhou Xiaowei, and Daniilidis Kostas. 2018. Learning to estimate 3D human pose and shape from a single color image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR’18). 459468.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Peng Sida, Zhang Yuanqing, Xu Yinghao, Wang Qianqian, Shuai Qing, Bao Hujun, and Zhou Xiaowei. 2021. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 90549063.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Pishchulin Leonid, Insafutdinov Eldar, Tang Siyu, Andres Bjoern, Andriluka Mykhaylo, Gehler Peter V., and Schiele Bernt. 2016. DeepCut: Joint subset partition and labeling for multi person pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR’16). 49294937.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Pumarola Albert, Sanchez Jordi, Choi Gary, Sanfeliu Alberto, and Moreno-Noguer Francesc. 2019. 3DPeople: Modeling the geometry of dressed humans. In Proceedings of the International Conference on Computer Vision (ICCV’19).Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Saito Shunsuke, Huang Zeng, Natsume Ryota, Morishima Shigeo, Kanazawa Angjoo, and Li Hao. 2019. PIFu: Pixel-Aligned implicit function for high-resolution clothed human digitization. arXiv preprint arXiv:1905.05172 (2019).Google ScholarGoogle Scholar
  44. [44] Saito Shunsuke, Simon Tomas, Saragih Jason, and Joo Hanbyul. 2020. PIFuHD: Multi-level pixel-aligned implicit function for high-resolution 3D human digitization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Sigal Leonid, Balan Alexandru O., and Black Michael J.. 2010. HumanEva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. International Journal of Computer Vision 87, 1-2 (2010), 4.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Simonyan Karen and Zisserman Andrew. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  47. [47] Tung Hsiao-Yu, Tung Hsiao-Wei, Yumer Ersin, and Fragkiadaki Katerina. 2017. Self-supervised learning of motion capture. In Advances in Neural Information Processing Systems. 52365246.Google ScholarGoogle Scholar
  48. [48] Varol Gul, Ceylan Duygu, Russell Bryan, Yang Jimei, Yumer Ersin, Laptev Ivan, and Schmid Cordelia. 2018. BodyNet: Volumetric inference of 3D human body shapes. In Proceedings of the European Conference on Computer Vision (ECCV’18). 2036.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Wang Nanyang, Zhang Yinda, Li Zhuwen, Fu Yanwei, Liu Wei, and Jiang Yu-Gang. 2018. Pixel2Mesh: Generating 3D mesh models from single RGB images. In Proceedings of the European Conference on Computer Vision (ECCV’18). 5267.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Wang Yang. 2021. Survey on deep multi-modal data analytics: Collaboration, rivalry, and fusion. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 1s (2021), 125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Zhang Zhengyou. 1994. Iterative point matching for registration of free-form curves and surfaces. International Journal of Computer Vision 13, 2 (1994), 119152.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Zhao Long, Peng Xi, Tian Yu, Kapadia Mubbasir, and Metaxas Dimitris N.. 2019. Semantic graph convolutional networks for 3D human pose regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 34253435.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Zheng Zerong, Yu Tao, Liu Yebin, and Dai Qionghai. 2021. PaMIR: Parametric model-conditioned implicit representation for image-based human reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence PP, 99 (2021), 1.Google ScholarGoogle Scholar
  54. [54] Zhu Hao, Zuo Xinxin, Wang Sen, Cao Xun, and Yang Ruigang. 2019. Detailed human shape estimation from a single image by hierarchical mesh deformation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 44914500.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Multi-view Shape Generation for a 3D Human-like Body

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 1
      January 2023
      505 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3572858
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 January 2023
      • Online AM: 18 February 2022
      • Accepted: 28 January 2022
      • Revised: 26 January 2022
      • Received: 21 July 2021
      Published in tomm Volume 19, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!