Abstract
3D model retrieval has been widely utilized in numerous domains, such as computer-aided design, digital entertainment, and virtual reality. Recently, many graph-based methods have been proposed to address this task by using multi-view information of 3D models. However, these methods are always constrained by many-to-many graph matching for the similarity measure between pairwise models. In this article, we propose a multi-view graph matching method (MVGM) for 3D model retrieval. The proposed method can decompose the complicated multi-view graph-based similarity measure into multiple single-view graph-based similarity measures and fusion. First, we present the method for single-view graph generation, and we further propose the novel method for the similarity measure in a single-view graph by leveraging both node-wise context and model-wise context. Then, we propose multi-view fusion with diffusion, which can collaboratively integrate multiple single-view similarities w.r.t. different viewpoints and adaptively learn their weights, to compute the multi-view similarity between pairwise models. In this way, the proposed method can avoid the difficulty in the definition and computation of the traditional high-order graph. Moreover, this method is unsupervised and does not require a large-scale 3D dataset for model learning. We conduct evaluations on four popular and challenging datasets. The extensive experiments demonstrate the superiority and effectiveness of the proposed method compared against the state of the art. In particular, this unsupervised method can achieve competitive performances against the most recent supervised and deep learning method.
- Cosimo Rubino, Marco Crocco, and Alessio Del Bue. 2018. 3D object localisation from multi-view image detections. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 6 (2018), 1281--129Google Scholar
- Jin Xie, Guoxian Dai, Fan Zhu, Ling Shao, and Yi Fang. 2018. Deep nonlinear metric learning for 3-D shape retrieval. IEEE Transactions on Cybernetics 48, 1 (2018), 412--422.Google Scholar
Cross Ref
- An-An Liu, Weizhi Nie, Yue Gao, and Yuting Su. 2018. View-based 3-D model retrieval: A benchmark. IEEE Transactions on Cybernetics 48, 3 (2018), 916--928.Google Scholar
- Jin Xie, Guoxian Dai, Fan Zhu, Edward K. Wong, and Yi Fang. 2017. DeepShape: Deep-learned shape descriptor for 3D shape retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 7 (2017), 1335--1345.Google Scholar
Cross Ref
- Haoxuan You, Yifan Feng, Rongrong Ji, and Yue Gao. 2018. PVNet: A joint convolutional network of point cloud and multi-view for 3D shape recognition. In Proceedings of the ACM Conference on Multimedia (MM’18). 1310--1318.Google Scholar
Digital Library
- Mihael Ankerst, Gabi Kastenmüller, Hans-Peter Kriegel, and Thomas Seidl. 1999. 3D shape histograms for similarity search and classification in spatial databases. In Proceedings of the 6th International Symposium on Advances in Spatial Databases. 207--226.Google Scholar
Digital Library
- Masaki Hilaga, Yoshihisa Shinagawa, Taku Komura, and Tosiyasu L. Kunii. 2001. Topology matching for fully automatic similarity estimation of 3D shapes. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’01). 203--212.Google Scholar
- Petros Daras and Apostolos Axenopoulos. 2010. A 3D shape retrieval framework supporting multimodal queries. International Journal of Computer Vision 89, 2–3 (2010), 229--247.Google Scholar
Digital Library
- Shenghua Gao, Lixin Duan, and Ivor W. Tsang. 2016. DEFEATnet— deep conventional image representation for image classification. IEEE Transactions on Circuits and Systems for Video Technology 26, 3 (2016), 494--505.Google Scholar
Digital Library
- Zhiyong Cheng, Xiaojun Chang, Lei Zhu, Rose Catherine Kanjirathinkal, and Mohan S. Kankanhalli. 2019. MMALFM: Explainable recommendation by leveraging reviews and images. ACM Transactions on Information Systems 37, 2 (2019), Article 16, 28 pages.Google Scholar
Digital Library
- Zhineng Chen, Shanshan Ai, and Caiyan Jia. 2019. Structure-aware deep learning for product image classification. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 1s (2019), Article 4, 20 pages.Google Scholar
Digital Library
- Chenggang Yan, Liang Li, Chunjie Zhang, Bingtao Liu, Yongdong Zhang, and Qionghai Dai. 2019. Cross-modality bridging and knowledge transferring for image understanding. IEEE Transactions on Multimedia 21, 10 (2019), 2675--2685.Google Scholar
Digital Library
- Hongtao Xie, Shancheng Fang, Zheng-Jun Zha, Yating Yang, Yan Li, and Yongdong Zhang. 2019. Convolutional attention networks for scene text recognition. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 1s (2019), Article 3, 17 pages.Google Scholar
Digital Library
- Jingjing Li, Ke Lu, Zi Huang, Lei Zhu, and Heng Tao Shen. 2019. Heterogeneous domain adaptation through progressive alignment. IEEE Transactions on Neural Networks and Learning Systems 30, 5 (2019), 1381--1391.Google Scholar
Cross Ref
- Hanli Wang, Bo Xiao, Lei Wang, Fengkuangtian Zhu, Yu-Gang Jiang, and Jun Wu. 2015. CHCF: A cloud-based heterogeneous computing framework for large-scale image retrieval. IEEE Transactions on Circuits and Systems for Video Technology 25, 12 (2015), 1900--1913.Google Scholar
Cross Ref
- Duc-Tien Dang-Nguyen, Luca Piras, Giorgio Giacinto, Giulia Boato, and Francesco G. B. De Natale. 2017. Multimodal retrieval with diversification and relevance feedback for tourist attraction images. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 4 (2017), Article 49, 24 pages.Google Scholar
Digital Library
- Dapeng Tao, Yanan Guo, Baosheng Yu, Jianxin Pang, and Zhengtao Yu. 2018. Deep multi-view feature learning for person re-identification. IEEE Transactions on Circuits and Systems for Video Technology 28, 10 (2018), 2657--2666.Google Scholar
Digital Library
- Amir Mazaheri, Boqing Gong, and Mubarak Shah. 2018. Learning a multi-concept video retrieval model with multiple latent variables. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 2 (2018), Article 46, 21 pages.Google Scholar
Digital Library
- Lei Zhu, Zi Huang, Zhihui Li, Liang Xie, and Heng Tao Shen. 2018. Exploring auxiliary context: Discrete semantic transfer hashing for scalable image retrieval. IEEE Transactions on Neural Networks and Learning Systems 29, 11 (2018), 5264--5276.Google Scholar
Cross Ref
- Anan Liu, Yuting Su, Weizhi Nie, and Mohan S. Kankanhalli. 2017. Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 1 (2017), 102--114.Google Scholar
Digital Library
- Ning Xu, Hanwang Zhang, Anan Liu, Weizhi Nie, Yuting Su, Jie Nie, and Yongdong Zhang. 2020. Multi-level policy and reward-based deep reinforcement learning framework for image captioning. IEEE Transactions on Multimedia 22, 5 (2020), 1372--1383.Google Scholar
Cross Ref
- Chenggang Yan, Yunbin Tu, Xingzheng Wang, Yongbing Zhang, Xinhong Hao, Yongdong Zhang, and Qionghai Dai. 2020. STAT: Spatial-temporal attention mechanism for video captioning. IEEE Transactions on Multimedia 22, 1 (2020), 229--241.Google Scholar
Digital Library
- Tarik Filali Ansary, Mohamed Daoudi, and Jean-Philippe Vandeborre. 2007. A Bayesian 3-D search engine using adaptive views clustering. IEEE Transactions on Multimedia 9, 1 (2007), 78--88.Google Scholar
Digital Library
- Yue Gao, Jinhui Tang, Richang Hong, Shuicheng Yan, Qionghai Dai, Naiyao Zhang, and T.-S. Chua. 2012. Camera constraint-free view-based 3-D object retrieval. IEEE Transactions on Image Processing 21, 4 (2012), 2269--2281.Google Scholar
Digital Library
- Liqiang Nie, Meng Wang, Yue Gao, Zheng-Jun Zha, and Tat-Seng Chua. 2013. Beyond text QA: Multimedia answer generation by harvesting web information. IEEE Transactions on Multimedia 15, 2 (2013), 426--441.Google Scholar
Digital Library
- Anan Liu, Zhongyang Wang, Weizhi Nie, and Yuting Su. 2015. Graph-based characteristic view set extraction and matching for 3D model retrieval. Information Sciences 320 (2015), 429--442.Google Scholar
Digital Library
- Mingjie Liang, Huaqing Min, Ronghua Luo, and Jin-Hui Zhu. 2015. Simultaneous recognition and modeling for learning 3-D object models from everyday scenes. IEEE Transactions on Cybernetics 45, 10 (2015), 2237--2248.Google Scholar
Cross Ref
- Yue Gao and Qionghai Dai. 2014. View-based 3D object retrieval: Challenges and approaches. IEEE MultiMedia 21, 3 (2014), 52--57.Google Scholar
Cross Ref
- Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik G. Learned-Miller. 2015. Multi-view convolutional neural networks for 3D shape recognition. In Proceedings of the 2015 International Conference on Computer Vision (ICCV’15). 945--953.Google Scholar
- Wenhui Li and Yang An. 2017. View-wised discriminative ranking for 3D object retrieval. Multimedia Tools and Applications 1 (2017), 1--15.Google Scholar
Digital Library
- Yi Zhen, Yue Gao, Dit-Yan Yeung, Hongyuan Zha, and Xuelong Li. 2016. Spectral multimodal hashing and its application to multimedia retrieval. IEEE Transactions on Cybernetics 46, 1 (2016), 27--38.Google Scholar
Cross Ref
- Yue Gao, Qionghai Dai, Meng Wang, and Naiyao Zhang. 2011. 3D model retrieval using weighted bipartite graph matching. Signal Processing: Image Communication 26, 1 (2011), 39--47.Google Scholar
Digital Library
- Anan Liu, Weizhi Nie, Yue Gao, and Yuting Su. 2016. Multi-modal clique-graph matching for view-based 3D model retrieval. IEEE Transactions on Image Processing 25, 5 (2016), 2103--2116.Google Scholar
Digital Library
- Jun Yu, Yong Rui, Yuan Yan Tang, and Dacheng Tao. 2014. High-order distance-based multiview stochastic learning in image classification. IEEE Transactions on Cybernetics 44, 12 (2014), 2431--2442.Google Scholar
Cross Ref
- Xueliang Liu, Meng Wang, Bao-Cai Yin, Benoit Huet, and Xuelong Li. 2015. Event-based media enrichment using an adaptive probabilistic hypergraph model. IEEE Transactions on Cybernetics 45, 11 (2015), 2461--2471.Google Scholar
Cross Ref
- Zan Gao, Deyu Wang, Xiangnan He, and Hua Zhang. 2018. Group-pair convolutional neural networks for multi-view based 3D object retrieval. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18).Google Scholar
- Yuting Su, Wenhui Li, Anan Liu, and Weizhi Nie. 2018. Hierarchical graph structure learning for multi-view 3D model retrieval. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18). 913--919.Google Scholar
Digital Library
- Luren Yang and Fritz Albregtsen. 1996. Fast and exact computation of Cartesian geometric moments using discrete Green’s theorem. Pattern Recognition 29, 7 (1996), 1061--1073.Google Scholar
Cross Ref
- Przemyslaw Polewski, Wei Yao, Peter Krzystek, Marco Heurich, and Uwe Stilla. 2014. Detection of fallen tree segments in airborne LiDAR point clouds of a temperate forest by combining point/primitive-level shape descriptors. In Gemeinsame Tagung. 1--12.Google Scholar
- Robert Osada, Thomas A. Funkhouser, Bernard Chazelle, and David P. Dobkin. 2001. Matching 3D models with shape distributions. In Proceedings of the 2001 International Conference on Shape Modeling and Applications (SMI’01). 154--166.Google Scholar
- Ke Lu, Qian Wang, Jian Xue, and Weiguo Pan. 2014. 3D model retrieval and classification by semi-supervised learning with content-based similarity. Information Sciences 281 (2014), 703--713.Google Scholar
Digital Library
- A. D. Papoiu, N. M. Emerson, T. S. Patel, R. A. Kraft, R. Valdes-Rodriguez, L. A. Nattkemper, R. C. Coghill, and G. Yosipovitch. 2014. Voxel-based morphometry and arterial spin labeling fMRI reveal neuropathic and neuroplastic features of brain processing of itch in end-stage-renal-disease. Journal of Neurophysiology 112, 7 (2014), 1729.Google Scholar
Cross Ref
- Masaki Hilaga, Yoshihisa Shinagawa, Taku Komura, and Tosiyasu L. Kunii. 2001. Topology matching for fully automatic similarity estimation of 3D shapes. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’01). 203--212.Google Scholar
- H. Sundar, Deborah Silver, Nikhil Gagvani, and Sven J. Dickinson. 2003. Skeleton based shape matching and retrieval. In Proceedings of the International Conference on Shape Modeling and Applications (SMI’03). 130--142, 290.Google Scholar
- Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 2015. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1912--1920.Google Scholar
- Valeria Garro and Andrea Giachetti. 2016. Scale space graph representation and kernel matching for non rigid and textured 3D shape retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 6 (2016), 1258--1271.Google Scholar
Cross Ref
- Rongrong Ji, Lingyu Duan, Jie Chen, Tiejun Huang, and Wen Gao. 2014. Mining compact bag-of-patterns for low bit rate mobile visual search. IEEE Transactions on Image Processing 23, 7 (2014), 3099--3113.Google Scholar
Cross Ref
- Biao Leng, Changchun Du, Shuang Guo, Xiangyang Zhang, and Zhang Xiong. 2015. A powerful 3D model classification mechanism based on fusing multi-graph. Neurocomputing 168 (2015), 761--769.Google Scholar
Digital Library
- Weizhi Nie, Anan Liu, and Yuting Su. 2016. 3D object retrieval based on sparse coding in weak supervision. J.ournal of Visual Communication and Image Representation 37 (2016), 40--45.Google Scholar
- Biao Leng, Xiangyang Zhang, Ming Yao, and Zhang Xiong. 2015. A 3D model recognition mechanism based on deep Boltzmann machines. Neurocomputing 151 (2015), 593--602.Google Scholar
Cross Ref
- Richang Hong, Zhenzhen Hu, Ruxin Wang, Meng Wang, and Dacheng Tao. 2016. Multi-view object retrieval via multi-scale topic models. IEEE Transactions on Image Processing 25, 12 (2016), 5814--5827.Google Scholar
Digital Library
- Feiping Nie, Jing Li, and Xuelong Li. 2017. Convex multiview semi-supervised classification. IEEE Transactions on Image Processing 26, 12 (2017), 5718--5729.Google Scholar
Digital Library
- An-An Liu, Weizhi Nie, and Yuting Su. 2019. 3D object retrieval based on multi-view latent variable model. IEEE Transactions on Circuits and Systems for Video Technology 29, 3 (2019), 868--880.Google Scholar
Digital Library
- Yue Gao, Meng Wang, Dacheng Tao, Rongrong Ji, and Qionghai Dai. 2012. 3-D object retrieval and recognition with hypergraph analysis. IEEE Transactions on Image Processing 21, 9 (2012), 4290--4303.Google Scholar
Digital Library
- Sicheng Zhao, Hongxun Yao, Yanhao Zhang, Yasi Wang, and Shaohui Liu. 2015. View-based 3D object retrieval via multi-modal graph learning. Signal Processing 112 (2015), 110--118.Google Scholar
Digital Library
- Daniel Carlos Guimarães Pedronette and Ricardo da Silva Torres. 2013. Image re-ranking and rank aggregation based on similarity of ranked lists. Pattern Recognition 46, 8 (2013), 2350--2360.Google Scholar
Digital Library
- Zhun Zhong, Liang Zheng, Donglin Cao, and Shaozi Li. 2017. Re-ranking person re-identification with k-reciprocal encoding. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 3652--3661.Google Scholar
Cross Ref
- Song Bai, Peng Tang, Philip H. S. Torr, and Longin Jan Latecki. 2019. Re-ranking via metric fusion for object retrieval and person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 740--749.Google Scholar
Cross Ref
- Yifan Feng, Zizhao Zhang, Xibin Zhao, Rongrong Ji, and Yue Gao. 2018. GVCNN: Group-view convolutional neural networks for 3D shape recognition. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. 264--272.Google Scholar
Cross Ref
- Zan Gao, Deyu Wang, Hua Zhang, Yanbing Xue, and Guangping Xu. 2016. A fast 3D retrieval algorithm via class-statistic and pair-constraint model. In Proceedings of the 24th ACM International Conference on Multimedia (MM’16). 117--121.Google Scholar
Digital Library
- William Webber, Alistair Moffat, and Justin Zobel. 2010. A similarity measure for indefinite rankings. ACM Transactions on Information Systems 28, 4 (2010), Article 20, 38 pages.Google Scholar
Digital Library
- Michael Donoser and Horst Bischof. 2013. Diffusion processes for retrieval revisited. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. 1320--1327.Google Scholar
Digital Library
- Yang Long, Li Liu, Fumin Shen, Ling Shao, and Xuelong Li. 2018. Zero-shot learning using synthesised unseen visual data with diffusion regularisation. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 10 (2018), 2498--2512.Google Scholar
Digital Library
- Song Bai, Zhichao Zhou, Jingdong Wang, Xiang Bai, Longin Jan Latecki, and Qi Tian. 2017. Ensemble diffusion for retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 774--783.Google Scholar
Cross Ref
- Ding-Yun Chen, Xiao-Pei Tian, Yu-Te Shen, and Ming Ouhyoung. 2003. On visual similarity based 3D model retrieval. Computer Graphics Forum 22 (2003), 223--232.Google Scholar
- Philip Shilane, Patrick Min, Michael M. Kazhdan, and Thomas A. Funkhouser. 2004. The Princeton shape benchmark. In Proceedings of the 2004 International Conference on Shape Modeling and Applications (SMI’04). 167--178.Google Scholar
- Bastian Leibe and Bernt Schiele. 2003. Analyzing appearance and contour based methods for object categorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 409--415.Google Scholar
Cross Ref
- Feng Lu, Imari Sato, and Yoichi Sato. 2015. Uncalibrated photometric stereo based on elevation angle recovery from BRDF symmetry of isotropic materials. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 168--176.Google Scholar
Cross Ref
- Henning Müller, Wolfgang Müller, David Squire, Stéphane Marchand-Maillet, and Thierry Pun. 2001. Performance evaluation in content-based image retrieval: Overview and proposals. Pattern Recognition Letters 22, 5 (2001), 593--601.Google Scholar
Digital Library
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1106--1114.Google Scholar
Digital Library
Index Terms
Multi-View Graph Matching for 3D Model Retrieval
Recommendations
Exploring Deep Learning for View-Based 3D Model Retrieval
In recent years, view-based 3D model retrieval has become one of the research focuses in the field of computer vision and machine learning. In fact, the 3D model retrieval algorithm consists of feature extraction and similarity measurement, and the ...
Graph-based characteristic view set extraction and matching for 3D model retrieval
In recent times, multi-view representation of the 3D model has led to extensive research in view-based methods for 3D model retrieval. However, most approaches focus on feature extraction from 2D images while ignoring the spatial information of the 3D ...
Hierarchical graph structure learning for multi-view 3D model retrieval
IJCAI'18: Proceedings of the 27th International Joint Conference on Artificial Intelligence3D model retrieval has been widely utilized in numerous domains, such as computer-aided design, digital entertainment and virtual reality. Recently, many graph-based methods have been proposed to address this task by using multiple views of 3D models. ...






Comments