Abstract
Image-based 3D model retrieval aims at organizing unlabeled 3D models according to the relevance to the labeled 2D images. With easy accessibility of 2D images and wide applications of 3D models, image-based 3D model retrieval attracts more and more attentions. However, it is still a challenging problem due to the modality gap between 2D images and 3D models. In spite of the remarkable progress brought by domain adaptation techniques for this research topic, which usually propose to align the global distribution statistics of two domains, these methods are limited in learning discriminative features for target samples due to the lack of label information in target domain. In this article, besides utilizing the label information of 2D image domain and the adversarial domain alignment, we additionally incorporate self-supervision to address cross-domain 3D model retrieval problem. Specifically, we simultaneously optimize the adversarial adaptation for both domains based on visual features and the contrastive learning for unlabeled 3D model domain to help the feature extractor to learn discriminative feature representations. The contrastive learning is used to map view representations of the identical model nearby while view representations of different models far apart. To guarantee adequate and high-quality negative samples for contrastive learning, we design a memory bank to store and update representative view for each 3D model based on entropy minimization principle. Comprehensive experimental results on the public image-based 3D model retrieval datasets, i.e., MI3DOR and MI3DOR-2, have demonstrated the effectiveness of the proposed method.
- [1] . 2019. 3D Point cloud compression: A survey. In Proceedings of the 24th International Conference on 3D Web Technology, Web3D., , , and (Eds.), ACM, 1–9.
DOI: Google ScholarDigital Library
- [2] . 2018. Deep clustering for unsupervised learning of visual features. In Proceedings of the European Conference on Computer Vision., , , and (Eds.), Lecture Notes in Computer Science, Vol. 11218. Springer, 139–156.
DOI: Google ScholarDigital Library
- [3] . 2019. Unsupervised pre-training of image features on non-curated data. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision.IEEE, 2959–2968.
DOI: Google ScholarCross Ref
- [4] . 2020. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning.PMLR, 1597–1607. Retrieved from http://proceedings.mlr.press/v119/chen20j.html.Google Scholar
Digital Library
- [5] . 2020. Scalable deep hashing for large-scale social image retrieval. IEEE Transactions on Image Processing 29 (2020), 1271–1284.
DOI: Google ScholarCross Ref
- [6] . 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 248–255.
DOI: Google ScholarCross Ref
- [7] . 2015. Unsupervised visual representation learning by context prediction. In Proceedings of the 2015 IEEE International Conference on Computer Vision. IEEE Computer Society, 1422–1430.
DOI: Google ScholarDigital Library
- [8] . 2018. GVCNN: Group-view convolutional neural networks for 3D shape recognition. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation / IEEE Computer Society, 264–272.
DOI: Google ScholarCross Ref
- [9] . 2015. Unsupervised domain adaptation by backpropagation. In Proceedings of the 32nd International Conference on Machine Learning. and (Eds.), JMLR.org, 1180–1189. Retrieved from http://proceedings.mlr.press/v37/ganin15.html.Google Scholar
- [10] . 2020. Adversarial open set domain adaptation via progressive selection of transferable target samples. Neurocomputing 410 (2020), 174–184.
DOI: Google ScholarCross Ref
- [11] . 2020. Exploring deep learning for view-based 3D model retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 1 (2020), 18:1–18:21.
DOI: Google ScholarDigital Library
- [12] . 2018. Unsupervised representation learning by predicting image rotations. In Proceedings of the 6th International Conference on Learning Representations. OpenReview.net. Retrieved from https://openreview.net/forum?id=S1v4N2l0-.Google Scholar
- [13] . 2014. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems., , , , and (Eds.), 2672–2680. Retrieved from https://proceedings.neurips.cc/paper/2014/hash/5ca3e9b122f61f8f06494c97b1afccf3-Abstract.html.Google Scholar
- [14] . 2004. Semi-supervised learning by entropy minimization. In Proceedings of the Advances in Neural Information Processing Systems 17. 529–536. Retrieved from https://proceedings.neurips.cc/paper/2004/hash/96f2b50b5d3613adf9c27049b2a888c7-Abstract.html.Google Scholar
- [15] . 2010. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. and (Eds.), JMLR.org, 297–304. Retrieved from http://proceedings.mlr.press/v9/gutmann10a.html.Google Scholar
- [16] . 2021. MVTN: Multi-view transformation network for 3D shape recognition. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. IEEE, 1–11.
DOI: Google ScholarCross Ref
- [17] . 2016. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 770–778.
DOI: Google ScholarCross Ref
- [18] . 2018. Learning image representations by completing damaged jigsaw puzzles. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision. IEEE Computer Society, 793–802.
DOI: Google ScholarCross Ref
- [19] . 2019. Revisiting self-supervised visual representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation / IEEE, 1920–1929.
DOI: Google ScholarCross Ref
- [20] . 2021. Cross-domain adaptive clustering for semi-supervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation / IEEE, 2505–2514. Retrieved from https://openaccess.thecvf.com/content/CVPR2021/html/Li_Cross-Domain_Adaptive_Clustering_for_Semi-Supervised_Domain_Adaptation_CVPR_2021_paper.html.Google Scholar
Cross Ref
- [21] . 2019. Monocular image based 3D model retrieval. In Proceedings of the 12th Eurographics Workshop on 3D Object Retrieval, [email protected] 2019., , and (Eds.), Eurographics Association, 103–110.
DOI: Google ScholarCross Ref
- [22] . 2021. Hierarchical multi-view context modelling for 3D object classification and retrieval. Information Sciences 547 (2021), 984–995.
DOI: Google ScholarCross Ref
- [23] . 2018. Conditional adversarial domain adaptation. In Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018., , , , , and (Eds.), 1647–1657. Retrieved from https://proceedings.neurips.cc/paper/2018/hash/ab88b15733f543179858600245108dd8-Abstract.html.Google Scholar
- [24] . 2017. Deep transfer learning with joint adaptation networks. In Proceedings of the 34th International Conference on Machine Learning. and (Eds.), PMLR, 2208–2217. Retrieved from http://proceedings.mlr.press/v70/long17a.html.Google Scholar
Digital Library
- [25] . 2019. Online multi-modal hashing with dynamic query-adaption. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval., , , , , and (Eds.), ACM, 715–724.
DOI: Google ScholarDigital Library
- [26] . 2020. Self-supervised learning of pretext-invariant representations. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation / IEEE, 6706–6716.
DOI: Google ScholarCross Ref
- [27] . 2018. Improvements to context based self-supervised learning. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation / IEEE Computer Society, 9339–9348.
DOI: Google ScholarCross Ref
- [28] . 2019. MMJN: Multi-modal joint networks for 3D shape recognition. In Proceedings of the 27th ACM International Conference on Multimedia., , , , , , and (Eds.), ACM, 908–916.
DOI: Google ScholarDigital Library
- [29] . 2020. HGAN: Holistic generative adversarial networks for two-dimensional image-based three-dimensional object retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 4 (2020), 101:1–101:24.
DOI: Google ScholarDigital Library
- [30] . 2022. CLN: Cross-domain learning network for 2D image-based 3D shape retrieval. IEEE Transactions on Circuits and Systems for Video Technology 32, 3 (2022), 992–1005.
DOI: Google ScholarCross Ref
- [31] . 2018. Multi-adversarial domain adaptation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, the 30th innovative Applications of Artificial Intelligence, and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence. and (Eds.), AAAI Press, 3934–3941. Retrieved from https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17067.Google Scholar
Cross Ref
- [32] . 2019. CM-GANs: Cross-modal generative adversarial networks for common representation learning. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 1 (2019), 22:1–22:24.
DOI: Google ScholarDigital Library
- [33] . 2017. PointNet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 77–85.
DOI: Google ScholarCross Ref
- [34] . 2017. Asymmetric tri-training for unsupervised domain adaptation. In Proceedings of the 34th International Conference on Machine Learning. and (Eds.), PMLR, 2988–2997. Retrieved from http://proceedings.mlr.press/v70/saito17a.html.Google Scholar
Digital Library
- [35] . 2018. Adversarial dropout regularization. In Proceedings of the 6th International Conference on Learning Representations. OpenReview.net. Retrieved from https://openreview.net/forum?id=HJIoJWZCZ.Google Scholar
- [36] . 2021. Universal cross-domain 3D model retrieval. IEEE Transactions on Multimedia 23 (2021), 2721–2731.
DOI: Google ScholarDigital Library
- [37] . 2015. Multi-view convolutional neural networks for 3D shape recognition. In Proceedings of the 2015 IEEE International Conference on Computer Vision. IEEE Computer Society, 945–953.
DOI: Google ScholarDigital Library
- [38] . 2020. Multi-view graph matching for 3D model retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 3 (2020), 77:1–77:20.
DOI: Google ScholarDigital Library
- [39] . 2020. Contrastive multiview coding. In Proceedings of the European Conference on Computer Vision., , , and (Eds.), Lecture Notes in Computer Science, Vol. 12356, Springer, 776–794.
DOI: Google ScholarDigital Library
- [40] . 2017. Adversarial discriminative domain adaptation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2962–2971.
DOI: Google ScholarCross Ref
- [41] . 2014. Deep domain confusion: Maximizing for domain invariance. arXiv:1412.3474. Retrieved from https://arxiv.org/abs/1412.3474 http://arxiv.org/abs/1412.3474Google Scholar
- [42] . 2018. Visual domain adaptation with manifold embedded distribution alignment. In Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference., , , , , , , and (Eds.), ACM, 402–410.
DOI: Google ScholarDigital Library
- [43] . 2022. Instance correlation graph for unsupervised domain adaptation. ACM Transactions on Multimedia Computing, Communications, and Applications 18, 1s (2022), 33:1–33:23.
DOI: Google ScholarDigital Library
- [44] . 2015. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 1912–1920.
DOI: Google ScholarCross Ref
- [45] . 2020. Structure preserving generative cross-domain learning. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation / IEEE, 4363–4372.
DOI: Google ScholarCross Ref
- [46] . 2018. Learning semantic representations for unsupervised domain adaptation. In Proceedings of the 35th International Conference on Machine Learning and (Eds.), PMLR, 5419–5428. Retrieved from http://proceedings.mlr.press/v80/xie18c.html.Google Scholar
- [47] . 2019. Self-supervised domain adaptation for computer vision tasks. IEEE Access 7 (2019), 156694–156706.
DOI: Google ScholarCross Ref
- [48] . 2020. Sketch-based shape retrieval via best view selection and a cross-domain similarity measure. IEEE Transactions on Multimedia 22, 11 (2020), 2950–2962.
DOI: Google ScholarCross Ref
- [49] . 2021. Self-supervised deep correlation tracking. IEEE Transactions on Image Processing 30 (2021), 976–985.
DOI: Google ScholarDigital Library
- [50] . 2019. Extended 2D scene sketch-based 3D scene retrieval. In Proceedings of the 12th Eurographics Workshop on 3D Object Retrieval., , and (Eds.), Eurographics Association, 33–39.
DOI: Google ScholarCross Ref
- [51] . 2017. Central moment discrepancy (CMD) for domain-invariant representation learning. In Proceedings of the 5th International Conference on Learning Representations. OpenReview.net. Retrieved from https://openreview.net/forum?id=SkB-_mcel.Google Scholar
- [52] . 2021. Weakly supervised object localization and detection: A survey. arXiv:2104.07918. Retrieved from https://arxiv.org/abs/2104.07918 https://arxiv.org/abs/2104.07918.Google Scholar
- [53] . 2017. Joint geometrical and statistical alignment for visual domain adaptation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 5150–5158.
DOI: Google ScholarCross Ref
- [54] . 2017. Split-brain autoencoders: Unsupervised learning by cross-channel prediction. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 645–654.
DOI: Google ScholarCross Ref
- [55] . 2019. Dual-level embedding alignment network for 2D image-based 3d object retrieval. In Proceedings of the 27th ACM International Conference on Multimedia., , , , , , and (Eds.), ACM, 1667–1675.
DOI: Google ScholarDigital Library
Index Terms
Self-supervised Image-based 3D Model Retrieval
Recommendations
Cross-Domain 3D Model Retrieval Based On Contrastive Learning And Label Propagation
MM '22: Proceedings of the 30th ACM International Conference on MultimediaIn this work, we aim to tackle the task of unsupervised image based 3D model retrieval, where we seek to retrieve unlabeled 3D models that are most visually similar to the 2D query image. Due to the challenging modality gap between 2D images and 3D ...
Self-supervised Domain Adaptation Model Based on Contrastive Learning
ICMLC 2022: 2022 14th International Conference on Machine Learning and Computing (ICMLC)Contrastive learning is a typical discriminative self-supervised learning method, which can learn knowledge from unlabeled data. Unsupervised domain adaptation (UDA) aims to predict unlabeled target domain data. In this paper, we propose a self-...
Improving Few-Shot Image Classification with Self-supervised Learning
Cloud Computing – CLOUD 2022AbstractFew-Shot Image Classification (FSIC) aims to learn an image classifier with only a few training samples. The key challenge of few-shot image classification is to learn this classifier with scarce labeled data. To tackle the issue, we leverage the ...






Comments