skip to main content
research-article

PGNet: Progressive Feature Guide Learning Network for Three-dimensional Shape Recognition

Authors Info & Claims
Published:22 July 2021Publication History
Skip Abstract Section

Abstract

Three-dimensional (3D) shape recognition is a popular topic and has potential application value in the field of computer vision. With the recent proliferation of deep learning, various deep learning models have achieved state-of-the-art performance. Among them, multiview-based 3D shape representation has received increased attention in recent years, and related approaches have shown significant improvement in 3D shape recognition. However, these methods focus on feature learning based on the design of the network and ignore the correlation among views. In this article, we propose a novel progressive feature guide learning network (PGNet) that focuses on the correlation among multiple views and integrates multiple modalities for 3D shape recognition. In particular, we propose two information fusion schemes from visual and feature aspects. The visual fusion scheme focuses on the view level and employs the soft-attention model to define the weights of views for visual information fusion. The feature fusion scheme focuses on the feature dimension information and employs the quantified feature as the mask to further optimize the feature. These two schemes jointly construct a PGNet for 3D shape representation. The classic ModelNet40 and ShapeNetCore55 datasets are applied to demonstrate the performance of our approach. The corresponding experiment also demonstrates the superiority of our approach.

References

  1. Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller. 2015. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE International Conference on Computer Vision. 945–953.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Chu Wang, Marcello Pelillo, and Kaleem Siddiqi. 2019. Dominant set clustering and pooling for multi-view 3d object recognition. arXiv:1906.01592. Retrieved from https://arxiv.org/abs/1906.01592.Google ScholarGoogle Scholar
  3. Z. Zhang, H. Lin, X. Zhao, R. Ji, and Y. Gao. 2018. Inductive multi-hypergraph learning and its application on view-based 3d object classification. IEEE Trans. Image Process. 27, 12 (Dec. 2018), 5957–5968. DOI:http://dx.doi.org/10.1109/TIP.2018.2862625Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jianwen Jiang, Di Bao, Ziqiang Chen, Xibin Zhao, and Yue Gao. 2019. MLVCNN: Multi-loop-view convolutional neural network for 3D shape retrieval. Proceedings of the AAAI Conference on Artificial Intelligence 33, 01 (2019), 8513–8520.Google ScholarGoogle ScholarCross RefCross Ref
  5. Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 652–660.Google ScholarGoogle Scholar
  6. Wei-Zhi Nie, An-An Liu, Yue Gao, and Yu-Ting Su. 2018. Hyper-clique graph matching and applications. IEEE Trans. Circ. Syst. Vid. Technol. 29, 6 (2018), 1619–1630.Google ScholarGoogle ScholarCross RefCross Ref
  7. Yu-Ting Su, Yu-Qian Li, Wei-Zhi Nie, Dan Song, and An-An Liu. 2019. Joint heterogeneous feature learning and distribution alignment for 2D image-based 3D object retrieval. IEEE Transactions on Circuits and Systems for Video Technology 30, 10 (2019), 3765–3776.Google ScholarGoogle ScholarCross RefCross Ref
  8. Richard Socher, Brody Huval, Bharath Putta Bath, Christopher D. Manning, and Andrew Y. Ng. 2012. Convolutional-recursive deep learning for 3d object classification. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’12). 665–673.Google ScholarGoogle Scholar
  9. Zhizhong Han, Zhenbao Liu, Junwei Han, Chi Man Vong, Shuhui Bu, and C. L. Philip Chen. 2017. Unsupervised learning of 3-d local features from raw voxels based on a novel permutation voxelization strategy. IEEE Trans. Cybernet.99 (2017), 1–14.Google ScholarGoogle Scholar
  10. Zhizhong Han, Zhenbao Liu, Chi-Man Vong, Yu-Shen Liu, Shuhui Bu, Junwei Han, and C. L. Philip Chen. 2018. Deep spatiality: Unsupervised learning of spatially-enhanced global and local 3D features by deep neural network with coupled softmax. IEEE Trans. Image Process. 27, 6 (2018), 3049–3063.Google ScholarGoogle ScholarCross RefCross Ref
  11. Yutong Feng, Yifan Feng, Haoxuan You, Xibin Zhao, and Yue Gao. 2018. MeshNet: Mesh neural network for 3d shape representation. arxiv:1811.11424. Retrieved from http://arxiv.org/abs/1811.11424.Google ScholarGoogle Scholar
  12. Mohcine Bouksim, F. Rafii Zakani, K. Arhid, M. Aboulfatah, and T. Gadi. 2018. New approach for 3D Mesh Retrieval using data envelopment analysis. Int. J. Intell. Eng. Syst. 11, 1 (2018), 98–107.Google ScholarGoogle Scholar
  13. Hiroharu Kato, Yoshitaka Ushiku, and Tatsuya Harada. 2018. Neural 3d mesh renderer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3907–3916.Google ScholarGoogle ScholarCross RefCross Ref
  14. Ran Song and Liping Wang. 2019. Multiscale representation of 3d surfaces via stochastic mesh laplacian. Comput.-Aid. Des. 115 (2019), 98–110.Google ScholarGoogle ScholarCross RefCross Ref
  15. Konstantinos Sfikas, Theoharis Theoharis, and Ioannis Pratikakis. 2017. Exploiting the PANORAMA Representation for convolutional neural network classification and retrieval. In Proceedings of the Eurographics Workshop on 3D Object Retrieval, Ioannis Pratikakis, Florent Dupont, and Maks Ovsjanikov (Eds.). The Eurographics Association. DOI:http://dx.doi.org/10.2312/3dor.20171045Google ScholarGoogle Scholar
  16. Chao Ma, Yulan Guo, Jungang Yang, and Wei An. 2018. Learning multi-view representation with LSTM for 3-D shape recognition and retrieval. IEEE Trans. Multimedia 21, 5 (2018), 1169–1182.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Xinwei He, Yang Zhou, Zhichao Zhou, Song Bai, and Xiang Bai. 2018. Triplet-center loss for multi-view 3d object retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1945–1954.Google ScholarGoogle ScholarCross RefCross Ref
  18. Alexander Grabner, Peter M. Roth, and Vincent Lepetit. 2018. 3d pose estimation and 3d model retrieval for objects in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3022–3031.Google ScholarGoogle ScholarCross RefCross Ref
  19. Yifan Feng, Zizhao Zhang, Xibin Zhao, Rongrong Ji, and Yue Gao. 2018. GVCNN: Group-view convolutional neural networks for 3D shape recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 264–272.Google ScholarGoogle ScholarCross RefCross Ref
  20. Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, and Tian Xia. 2017. Multi-view 3d object detection network for autonomous driving. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 1907–1915.Google ScholarGoogle ScholarCross RefCross Ref
  21. Haoxuan You, Yifan Feng, Rongrong Ji, and Yue Gao. 2018. Pvnet: A joint convolutional network of point cloud and multi-view for 3d shape recognition. In Proceedings of the 26th ACM International Conference on Multimedia. 1310–1318.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097–1105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Panagiotis Papadakis, Ioannis Pratikakis, Stavros Perantonis, and Theoharis Theoharis. 2007. Efficient 3D shape matching and retrieval using a concrete radialized spherical projection representation. Pattern Recogn. 40, 9 (2007), 2437–2452.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Pankaj Malhotra, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal. 2015. Long short term memory networks for anomaly detection in time series. In Proceedings, Vol. 89. Presses universitaires de Louvain, 89–94.Google ScholarGoogle Scholar
  25. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473. Retrieved from https://arxiv.org/abs/1409.0473.Google ScholarGoogle Scholar
  26. Liang-Chieh Chen, Yi Yang, Jiang Wang, Wei Xu, and Alan L. Yuille. 2016. Attention to scale: Scale-aware semantic image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3640–3649.Google ScholarGoogle Scholar
  27. Michael Kazhdan, Thomas Funkhouser, and Szymon Rusinkiewicz. 2003. Rotation invariant spherical harmonic representation of 3 d shape descriptors. In Proceedings of the Symposium on Geometry Processing, Vol. 6. 156–164.Google ScholarGoogle Scholar
  28. Ding-Yun Chen, Xiao-Pei Tian, Yu-Te Shen, and Ming Ouhyoung. 2003. On visual similarity based 3D model retrieval. In Computer Graphics Forum, Vol. 22. Wiley Online Library, 223–232.Google ScholarGoogle Scholar
  29. Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, Jianxiong Xiao, Zhirong Wu, Shuran Song, and Aditya Khosla. 2015. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  30. Michael Allen, Lewis Girod, Ryan Newton, Samuel Madden, and Deborah Estrin. 2008. VoxNet: An interactive, rapidly-deployable acoustic monitoring platform. In Proceedings of the International Conference on Information Processing in Sensor Networks.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Andrew Brock, Theodore Lim, J. M. Ritchie, and Nick Weston. 2016. Generative and discriminative voxel modeling with convolutional neural networks. arXiv:1608.04236. Retrieved from http://arxiv.org/abs/1608.04236.Google ScholarGoogle Scholar
  32. Charles R. Qi, Hao Su, Matthias Nießner, Angela Dai, Mengyuan Yan, and Leonidas J. Guibas. 2016. Volumetric and multi-view cnns for object classification on 3d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5648–5656.Google ScholarGoogle Scholar
  33. Qian Yu, Chengzhuan Yang, Honghui Fan, and Hui Wei. 2020. Latent-MVCNN: 3D shape recognition using multiple views from pre-defined or random viewpoints. Neural Processing Letters 52 (2020), 581–602.Google ScholarGoogle ScholarCross RefCross Ref
  34. Yanxin Ma, Bin Zheng, Yulan Guo, Yinjie Lei, and Jun Zhang. 2017. Boosting multi-view convolutional neural networks for 3d object recognition via view saliency. In Proceedings of the Chinese Conference on Image and Graphics Technologies. Springer, 199–209.Google ScholarGoogle Scholar
  35. Asako Kanezaki, Yasuyuki Matsushita, and Yoshifumi Nishida. 2018. Rotationnet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5010–5019.Google ScholarGoogle ScholarCross RefCross Ref
  36. Zizhao Zhang, Haojie Lin, Xibin Zhao, Rongrong Ji, and Yue Gao. 2018. Inductive multi-hypergraph learning and its application on view-based 3D object classification. IEEE Trans. Image Process. 27, 12 (2018), 5957–5968.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J. Guibas. 2017. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems. 5099–5108.Google ScholarGoogle Scholar
  38. Roman Klokov and Victor Lempitsky. 2017. Escape from cells: Deep kd-networks for the recognition of 3d point cloud models. In Proceedings of the IEEE International Conference on Computer Vision. 863–872.Google ScholarGoogle ScholarCross RefCross Ref
  39. Yangyan Li, Rui Bu, Mingchao Sun, and Baoquan Chen. 2018. PointCNN. arXiv:1801.07791. Retrieved from https://arxiv.org/abs/1801.07791.Google ScholarGoogle Scholar
  40. Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. 2018. Dynamic graph cnn for learning on point clouds. arXiv:1801.07829. Retrieved from https://arxiv.org/abs/1801.07829.Google ScholarGoogle Scholar
  41. Haoxuan You, Yifan Feng, Rongrong Ji, and Yue Gao. 2018. Pvnet: A joint convolutional network of point cloud and multi-view for 3d shape recognition. In Proceedings of the 26th ACM International Conference on Multimedia. 1310–1318.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Xinwei He, Tengteng Huang, Song Bai, and Xiang Bai. 2019. View n-gram network for 3D object retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 7515–7524.Google ScholarGoogle ScholarCross RefCross Ref
  43. Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, Jianxiong Xiao, Zhirong Wu, Shuran Song, and Aditya Khosla. 2015. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition.Google ScholarGoogle Scholar
  44. Manolis Savva, Fisher Yu, Hao Su, M. Aono, B. Chen, D. Cohen-Or, W. Deng, Hang Su, Song Bai, Xiang Bai, et al. 2016. Shrec16 track: largescale 3d shape retrieval from shapenet core55. In Proceedings of the Eurographics Workshop on 3D Object Retrieval. 89–98.Google ScholarGoogle Scholar
  45. Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. 2019. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. 38, 5 (2019), 1–12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Manolis Savva and Yu Fisher. 2017. SHREC’17 Track large-scale 3d shape retrieval from shapenet core55. In Proceedings of the Eurographics Workshop on 3D Object Retrieval. 5010–5019.Google ScholarGoogle Scholar

Index Terms

  1. PGNet: Progressive Feature Guide Learning Network for Three-dimensional Shape Recognition

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Multimedia Computing, Communications, and Applications
            ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 3
            August 2021
            443 pages
            ISSN:1551-6857
            EISSN:1551-6865
            DOI:10.1145/3476118
            Issue’s Table of Contents

            Copyright © 2021 Association for Computing Machinery.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 22 July 2021
            • Revised: 1 December 2020
            • Accepted: 1 December 2020
            • Received: 1 June 2020
            Published in tomm Volume 17, Issue 3

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Refereed
          • Article Metrics

            • Downloads (Last 12 months)43
            • Downloads (Last 6 weeks)4

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!