Abstract
Three-dimensional (3D) shape recognition is a popular topic and has potential application value in the field of computer vision. With the recent proliferation of deep learning, various deep learning models have achieved state-of-the-art performance. Among them, multiview-based 3D shape representation has received increased attention in recent years, and related approaches have shown significant improvement in 3D shape recognition. However, these methods focus on feature learning based on the design of the network and ignore the correlation among views. In this article, we propose a novel progressive feature guide learning network (PGNet) that focuses on the correlation among multiple views and integrates multiple modalities for 3D shape recognition. In particular, we propose two information fusion schemes from visual and feature aspects. The visual fusion scheme focuses on the view level and employs the soft-attention model to define the weights of views for visual information fusion. The feature fusion scheme focuses on the feature dimension information and employs the quantified feature as the mask to further optimize the feature. These two schemes jointly construct a PGNet for 3D shape representation. The classic ModelNet40 and ShapeNetCore55 datasets are applied to demonstrate the performance of our approach. The corresponding experiment also demonstrates the superiority of our approach.
- Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller. 2015. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE International Conference on Computer Vision. 945–953.Google Scholar
Digital Library
- Chu Wang, Marcello Pelillo, and Kaleem Siddiqi. 2019. Dominant set clustering and pooling for multi-view 3d object recognition. arXiv:1906.01592. Retrieved from https://arxiv.org/abs/1906.01592.Google Scholar
- Z. Zhang, H. Lin, X. Zhao, R. Ji, and Y. Gao. 2018. Inductive multi-hypergraph learning and its application on view-based 3d object classification. IEEE Trans. Image Process. 27, 12 (Dec. 2018), 5957–5968. DOI:http://dx.doi.org/10.1109/TIP.2018.2862625Google Scholar
Digital Library
- Jianwen Jiang, Di Bao, Ziqiang Chen, Xibin Zhao, and Yue Gao. 2019. MLVCNN: Multi-loop-view convolutional neural network for 3D shape retrieval. Proceedings of the AAAI Conference on Artificial Intelligence 33, 01 (2019), 8513–8520.Google Scholar
Cross Ref
- Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 652–660.Google Scholar
- Wei-Zhi Nie, An-An Liu, Yue Gao, and Yu-Ting Su. 2018. Hyper-clique graph matching and applications. IEEE Trans. Circ. Syst. Vid. Technol. 29, 6 (2018), 1619–1630.Google Scholar
Cross Ref
- Yu-Ting Su, Yu-Qian Li, Wei-Zhi Nie, Dan Song, and An-An Liu. 2019. Joint heterogeneous feature learning and distribution alignment for 2D image-based 3D object retrieval. IEEE Transactions on Circuits and Systems for Video Technology 30, 10 (2019), 3765–3776.Google Scholar
Cross Ref
- Richard Socher, Brody Huval, Bharath Putta Bath, Christopher D. Manning, and Andrew Y. Ng. 2012. Convolutional-recursive deep learning for 3d object classification. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’12). 665–673.Google Scholar
- Zhizhong Han, Zhenbao Liu, Junwei Han, Chi Man Vong, Shuhui Bu, and C. L. Philip Chen. 2017. Unsupervised learning of 3-d local features from raw voxels based on a novel permutation voxelization strategy. IEEE Trans. Cybernet.99 (2017), 1–14.Google Scholar
- Zhizhong Han, Zhenbao Liu, Chi-Man Vong, Yu-Shen Liu, Shuhui Bu, Junwei Han, and C. L. Philip Chen. 2018. Deep spatiality: Unsupervised learning of spatially-enhanced global and local 3D features by deep neural network with coupled softmax. IEEE Trans. Image Process. 27, 6 (2018), 3049–3063.Google Scholar
Cross Ref
- Yutong Feng, Yifan Feng, Haoxuan You, Xibin Zhao, and Yue Gao. 2018. MeshNet: Mesh neural network for 3d shape representation. arxiv:1811.11424. Retrieved from http://arxiv.org/abs/1811.11424.Google Scholar
- Mohcine Bouksim, F. Rafii Zakani, K. Arhid, M. Aboulfatah, and T. Gadi. 2018. New approach for 3D Mesh Retrieval using data envelopment analysis. Int. J. Intell. Eng. Syst. 11, 1 (2018), 98–107.Google Scholar
- Hiroharu Kato, Yoshitaka Ushiku, and Tatsuya Harada. 2018. Neural 3d mesh renderer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3907–3916.Google Scholar
Cross Ref
- Ran Song and Liping Wang. 2019. Multiscale representation of 3d surfaces via stochastic mesh laplacian. Comput.-Aid. Des. 115 (2019), 98–110.Google Scholar
Cross Ref
- Konstantinos Sfikas, Theoharis Theoharis, and Ioannis Pratikakis. 2017. Exploiting the PANORAMA Representation for convolutional neural network classification and retrieval. In Proceedings of the Eurographics Workshop on 3D Object Retrieval, Ioannis Pratikakis, Florent Dupont, and Maks Ovsjanikov (Eds.). The Eurographics Association. DOI:http://dx.doi.org/10.2312/3dor.20171045Google Scholar
- Chao Ma, Yulan Guo, Jungang Yang, and Wei An. 2018. Learning multi-view representation with LSTM for 3-D shape recognition and retrieval. IEEE Trans. Multimedia 21, 5 (2018), 1169–1182.Google Scholar
Digital Library
- Xinwei He, Yang Zhou, Zhichao Zhou, Song Bai, and Xiang Bai. 2018. Triplet-center loss for multi-view 3d object retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1945–1954.Google Scholar
Cross Ref
- Alexander Grabner, Peter M. Roth, and Vincent Lepetit. 2018. 3d pose estimation and 3d model retrieval for objects in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3022–3031.Google Scholar
Cross Ref
- Yifan Feng, Zizhao Zhang, Xibin Zhao, Rongrong Ji, and Yue Gao. 2018. GVCNN: Group-view convolutional neural networks for 3D shape recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 264–272.Google Scholar
Cross Ref
- Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, and Tian Xia. 2017. Multi-view 3d object detection network for autonomous driving. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 1907–1915.Google Scholar
Cross Ref
- Haoxuan You, Yifan Feng, Rongrong Ji, and Yue Gao. 2018. Pvnet: A joint convolutional network of point cloud and multi-view for 3d shape recognition. In Proceedings of the 26th ACM International Conference on Multimedia. 1310–1318.Google Scholar
Digital Library
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097–1105.Google Scholar
Digital Library
- Panagiotis Papadakis, Ioannis Pratikakis, Stavros Perantonis, and Theoharis Theoharis. 2007. Efficient 3D shape matching and retrieval using a concrete radialized spherical projection representation. Pattern Recogn. 40, 9 (2007), 2437–2452.Google Scholar
Digital Library
- Pankaj Malhotra, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal. 2015. Long short term memory networks for anomaly detection in time series. In Proceedings, Vol. 89. Presses universitaires de Louvain, 89–94.Google Scholar
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473. Retrieved from https://arxiv.org/abs/1409.0473.Google Scholar
- Liang-Chieh Chen, Yi Yang, Jiang Wang, Wei Xu, and Alan L. Yuille. 2016. Attention to scale: Scale-aware semantic image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3640–3649.Google Scholar
- Michael Kazhdan, Thomas Funkhouser, and Szymon Rusinkiewicz. 2003. Rotation invariant spherical harmonic representation of 3 d shape descriptors. In Proceedings of the Symposium on Geometry Processing, Vol. 6. 156–164.Google Scholar
- Ding-Yun Chen, Xiao-Pei Tian, Yu-Te Shen, and Ming Ouhyoung. 2003. On visual similarity based 3D model retrieval. In Computer Graphics Forum, Vol. 22. Wiley Online Library, 223–232.Google Scholar
- Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, Jianxiong Xiao, Zhirong Wu, Shuran Song, and Aditya Khosla. 2015. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- Michael Allen, Lewis Girod, Ryan Newton, Samuel Madden, and Deborah Estrin. 2008. VoxNet: An interactive, rapidly-deployable acoustic monitoring platform. In Proceedings of the International Conference on Information Processing in Sensor Networks.Google Scholar
Digital Library
- Andrew Brock, Theodore Lim, J. M. Ritchie, and Nick Weston. 2016. Generative and discriminative voxel modeling with convolutional neural networks. arXiv:1608.04236. Retrieved from http://arxiv.org/abs/1608.04236.Google Scholar
- Charles R. Qi, Hao Su, Matthias Nießner, Angela Dai, Mengyuan Yan, and Leonidas J. Guibas. 2016. Volumetric and multi-view cnns for object classification on 3d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5648–5656.Google Scholar
- Qian Yu, Chengzhuan Yang, Honghui Fan, and Hui Wei. 2020. Latent-MVCNN: 3D shape recognition using multiple views from pre-defined or random viewpoints. Neural Processing Letters 52 (2020), 581–602.Google Scholar
Cross Ref
- Yanxin Ma, Bin Zheng, Yulan Guo, Yinjie Lei, and Jun Zhang. 2017. Boosting multi-view convolutional neural networks for 3d object recognition via view saliency. In Proceedings of the Chinese Conference on Image and Graphics Technologies. Springer, 199–209.Google Scholar
- Asako Kanezaki, Yasuyuki Matsushita, and Yoshifumi Nishida. 2018. Rotationnet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5010–5019.Google Scholar
Cross Ref
- Zizhao Zhang, Haojie Lin, Xibin Zhao, Rongrong Ji, and Yue Gao. 2018. Inductive multi-hypergraph learning and its application on view-based 3D object classification. IEEE Trans. Image Process. 27, 12 (2018), 5957–5968.Google Scholar
Digital Library
- Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J. Guibas. 2017. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems. 5099–5108.Google Scholar
- Roman Klokov and Victor Lempitsky. 2017. Escape from cells: Deep kd-networks for the recognition of 3d point cloud models. In Proceedings of the IEEE International Conference on Computer Vision. 863–872.Google Scholar
Cross Ref
- Yangyan Li, Rui Bu, Mingchao Sun, and Baoquan Chen. 2018. PointCNN. arXiv:1801.07791. Retrieved from https://arxiv.org/abs/1801.07791.Google Scholar
- Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. 2018. Dynamic graph cnn for learning on point clouds. arXiv:1801.07829. Retrieved from https://arxiv.org/abs/1801.07829.Google Scholar
- Haoxuan You, Yifan Feng, Rongrong Ji, and Yue Gao. 2018. Pvnet: A joint convolutional network of point cloud and multi-view for 3d shape recognition. In Proceedings of the 26th ACM International Conference on Multimedia. 1310–1318.Google Scholar
Digital Library
- Xinwei He, Tengteng Huang, Song Bai, and Xiang Bai. 2019. View n-gram network for 3D object retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 7515–7524.Google Scholar
Cross Ref
- Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, Jianxiong Xiao, Zhirong Wu, Shuran Song, and Aditya Khosla. 2015. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition.Google Scholar
- Manolis Savva, Fisher Yu, Hao Su, M. Aono, B. Chen, D. Cohen-Or, W. Deng, Hang Su, Song Bai, Xiang Bai, et al. 2016. Shrec16 track: largescale 3d shape retrieval from shapenet core55. In Proceedings of the Eurographics Workshop on 3D Object Retrieval. 89–98.Google Scholar
- Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. 2019. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. 38, 5 (2019), 1–12.Google Scholar
Digital Library
- Manolis Savva and Yu Fisher. 2017. SHREC’17 Track large-scale 3d shape retrieval from shapenet core55. In Proceedings of the Eurographics Workshop on 3D Object Retrieval. 5010–5019.Google Scholar
Index Terms
PGNet: Progressive Feature Guide Learning Network for Three-dimensional Shape Recognition
Recommendations
PVFNet: Point-View Fusion Network for 3D Shape Recognition
Knowledge Science, Engineering and ManagementAbstract3D object recognition has enjoyed much of research attention in the machine vision filed. Deep learning methods for 3D shape recognition such as the multi-view based methods and the point cloud based methods have achieved the state-of-the-art ...
Learning Attentive and Hierarchical Representations for 3D Shape Recognition
Computer Vision – ECCV 2020AbstractThis paper proposes a novel method for 3D shape representation learning, namely Hyperbolic Embedded Attentive Representation (HEAR). Different from existing multi-view based methods, HEAR develops a unified framework to address both multi-view ...






Comments