Editorial Notes
The authors have requested minor, non-substantive changes to the VoR and, in accordance with ACM policies, a Corrected Version of Record was published on March 18, 2022. For reference purposes, the VoR may still be accessed via the Supplemental Material section on this citation page.
Abstract
In this article, we exploit Semi-Supervised Learning (SSL) to increase the amount of training data to improve the performance of Fine-Grained Visual Categorization (FGVC). This problem has not been investigated in the past in spite of prohibitive annotation costs that FGVC requires. Our approach leverages unlabeled data with an adversarial optimization strategy in which the internal features representation is obtained with a second-order pooling model. This combination allows one to back-propagate the information of the parts, represented by second-order pooling, onto unlabeled data in an adversarial training setting. We demonstrate the effectiveness of the combined use by conducting experiments on six state-of-the-art fine-grained datasets, which include Aircrafts, Stanford Cars, CUB-200-2011, Oxford Flowers, Stanford Dogs, and the recent Semi-Supervised iNaturalist-Aves. Experimental results clearly show that our proposed method has better performance than the only previous approach that examined this problem; it also obtained higher classification accuracy with respect to the supervised learning methods with which we compared.
Supplemental Material
Available for Download
Version of Record for "Fine-Grained Adversarial Semi-Supervised Learning" by Mugnai et al., ACM Transactions on Multimedia Computing, Communications, and Applications, Volume 18, No. 1s (TOMM 18:1s).
- [1] . 2020. Facing the hard problems in FGVC. arXiv:2006.13190. https://arxiv.org/abs/2006.13190.Google Scholar
- [2] . 2018. There are many consistent explanations of unlabeled data: Why you should average. In International Conference on Learning Representations.Google Scholar
- [3] . 2014. Birdsnap: Large-scale fine-grained visual categorization of birds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google Scholar
Digital Library
- [4] . 2019. Mixmatch: A holistic approach to semi-supervised learning. In Advances in Neural Information Processing Systems. Google Scholar
Digital Library
- [5] . 2021. Curriculum labeling: Revisiting pseudo-labeling for semi-supervised learning. Proceedings of the AAAI Conference on Artificial Intelligence 35, 8 (
May 2021), 6912–6920.Google Scholar - [6] . 2020. Big self-supervised models are strong semi-supervised learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020 (NeurIPS’20), virtual.Google Scholar
- [7] . 2021. Deep image retrieval: A survey. arXiv:2101.11282. https://arxiv.org/abs/2101.11282.Google Scholar
- [8] . 2020. On the exploration of incremental learning for fine-grained image retrieval. In 31st British Machine Vision Conference 2020 (BMVC’20), virtual event. BMVA Press.Google Scholar
- [9] . 2020. Semi-supervised recognition under a noisy and fine-grained dataset. arXiv:2006.10702. https://arxiv.org/abs/2006.10702.Google Scholar
- [10] . 2018. Large scale fine-grained categorization and domain-specific transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [11] . 2010. Frustratingly easy semi-supervised domain adaptation. In Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing. Google Scholar
Digital Library
- [12] . 2013. Fine-grained crowdsourcing for fine-grained recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google Scholar
Digital Library
- [13] . 2013. Semi-supervised domain adaptation with instance constraints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Google Scholar
Digital Library
- [14] . 2020. An Image is Worth 16 \(\times\) 16 Words: Transformers for Image Recognition at Scale. arXiv preprint arXiv:2010.11929. https://arxiv.org/abs/2010.11929.Google Scholar
- [15] . 2015. Unsupervised domain adaptation by backpropagation. In International Conference on Machine Learning. PMLR 37, 1180–1189. Google Scholar
Digital Library
- [16] . 2016. Domain-adversarial training of neural networks. The Journal of Machine Learning Research 17, 59 (2016), 1–35. Google Scholar
Digital Library
- [17] . 2019. Weakly supervised complementary parts models for fine-grained image classification from the bottom up. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [18] . 2005. Semi-supervised learning by entropy minimization. In Advances in Neural Information Processing Systems. Google Scholar
Digital Library
- [19] . 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [20] . 2019. Multi-adversarial faster-RCNN for unrestricted object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6668–6677.Google Scholar
Cross Ref
- [21] . 2019. Data-efficient image recognition with contrastive predictive coding. arXiv:1905.09272. https://arxiv.org/abs/1905.09272.Google Scholar
- [22] . 2008. Functions of Matrices: Theory and Computation. SIAM. https://arxiv.org/abs/1503.02531. Google Scholar
Digital Library
- [23] . 2015. Distilling the knowledge in a neural network. arXiv:1503.02531.Google Scholar
- [24] . 2019. See better before looking closer: Weakly supervised data augmentation network for fine-grained visual classification. arXiv:1901.09891. https://arxiv.org/abs/1901.09891.Google Scholar
- [25] . 2015. Matrix backpropagation for deep networks with structured layers. In Proceedings of the IEEE International Conference on Computer Vision. Google Scholar
Digital Library
- [26] . 2018. Domain adaptation for biomedical image segmentation using adversarial training. In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI’18). IEEE, 554–558.Google Scholar
- [27] . 2019. Learning view priors for single-view 3D reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9778–9787.Google Scholar
Cross Ref
- [28] . 2011. Novel dataset for fine-grained image categorization. In 1st Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- [29] . 2019. Classification-specific parts for improving fine-grained visual categorization. In German Conference on Pattern Recognition.Google Scholar
Digital Library
- [30] . 2020. End-to-end learning of a Fisher vector encoding for part features in fine-grained recognition. arXiv:2007.02080. https://arxiv.org/abs/2007.02080.Google Scholar
- [31] . 2016. The unreasonable effectiveness of noisy data for fine-grained recognition. In European Conference on Computer Vision.Google Scholar
Cross Ref
- [32] . 2013. 3D object representations for fine-grained categorization. In Proceedings of the IEEE International Conference on Computer Vision Workshops. Google Scholar
Digital Library
- [33] . 2009. Learning multiple layers of features from tiny images. Technical Report. University of Toronto.Google Scholar
- [34] . 2012. ImageNet classification with deep convolutional neural networks. In NIPS. Google Scholar
Digital Library
- [35] . 2010. Co-regularization based semi-supervised domain adaptation. In Advances in Neural Information Processing Systems. Google Scholar
Digital Library
- [36] . 2013. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In ICML 2013 Workshop: Challenges in Representation Learning (WREPL’13).Google Scholar
- [37] . 2018. Towards faster training of global covariance pooling networks by iterative matrix square root normalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [38] . 2018. Mixed supervised object detection with robust objectness transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 3 (2018), 639–653. Google Scholar
Digital Library
- [39] . 2019. Safe semi-supervised learning: A brief introduction. Frontiers of Computer Science 13, 4 (2019), 669–676. Google Scholar
Digital Library
- [40] . 2017. Improved bilinear pooling with CNNs. In Proceedings of the British Machine Vision Conference (BMVC’17). BMVA Press.Google Scholar
Cross Ref
- [41] . 2015. Bilinear CNN models for fine-grained visual recognition. In Proceedings of the IEEE International Conference on Computer Vision. Google Scholar
Digital Library
- [42] . 2019. Deep metric transfer for label propagation with limited annotated data. In Proceedings of the IEEE International Conference on Computer Vision Workshops.Google Scholar
Cross Ref
- [43] . 2017. Sphereface: Deep hypersphere embedding for face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [44] . 2016. Large-margin softmax loss for convolutional neural networks. In Proceedings of the 33rd International Conference on International Conference on Machine Learning, Vol. 8. Google Scholar
Digital Library
- [45] . 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 86 (2008), 2579–2605.Google Scholar
- [46] . 2013. Fine-grained visual classification of aircraft. arXiv:1306.5151. https://arxiv.org/abs/1306.5151.Google Scholar
- [47] . 2020. Class-incremental learning: Survey and performance evaluation. arXiv:2010.15277. https://arxiv.org/abs/2010.15277.Google Scholar
- [48] . 2018. Virtual adversarial training: A regularization method for supervised and semi-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 8 (2018), 1979–1993.Google Scholar
- [49] . 2021. Soft pseudo-labeling semi-supervised learning applied to fine-grained visual classification. In Proceedings of the ICPR International Workshops and Challenges on Pattern Recognition. Part IV, virtual event. Springer International Publishing, 102–110.Google Scholar
Cross Ref
- [50] . 2019. Semi-supervised learning for fine-grained classification with self-training. IEEE Access 8 (2019), 2109–2121.Google Scholar
- [51] . 2011. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning.Google Scholar
- [52] . 2018. Domain adaptive transfer learning with specialist models. arXiv:1811.07056. https://arxiv.org/abs/1811.07056.Google Scholar
- [53] . 2008. Automated flower classification over a large number of classes. In Indian Conference on Computer Vision, Graphics and Image Processing. Google Scholar
Digital Library
- [54] . 2018. Realistic evaluation of deep semi-supervised learning algorithms. In Advances in Neural Information Processing Systems. Google Scholar
Digital Library
- [55] . 2006. Semi-supervised learning. MIT Press. Google Scholar
Digital Library
- [56] . 2020. An overview of deep semi-supervised learning. arXiv:2006.05278. https://arxiv.org/abs/2006.05278.Google Scholar
- [57] . 2018. Memory based online learning of deep representations from video streams. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2324–2334.Google Scholar
Cross Ref
- [58] . 2021. Regular polytope networks. IEEE Transactions on Neural Networks and Learning Systems (2021), 1–15.
DOI: DOI: https://doi.org/10.1109/TNNLS.2021.3056762Google ScholarCross Ref
- [59] . 2019. Maximally compact and separated features with regular polytope networks. In CVPR Workshops. 46–53.Google Scholar
- [60] . 2020. Class-incremental learning with pre-allocated fixed classifiers. In 25th International Conference on Pattern Recognition (ICPR’20). IEEE Computer Society.Google Scholar
- [61] . 2020. Self-supervised on-line cumulative learning from video streams. Computer Vision and Image Understanding 197 (2020), 102983.Google Scholar
Cross Ref
- [62] . 2017. Unsupervised incremental learning of deep descriptors from video streams. In 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW’17). IEEE, 477–482.Google Scholar
Cross Ref
- [63] . 2021. Lifelong person re-identification via adaptive knowledge accumulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 7901–7910.Google Scholar
Cross Ref
- [64] . 2019. Regularized evolution for image classifier architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence. Google Scholar
Digital Library
- [65] . 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115 (2015), 211–252. Google Scholar
Digital Library
- [66] . 2019. Semi-supervised domain adaptation via minimax entropy. In Proceedings of the IEEE International Conference on Computer Vision.Google Scholar
Cross Ref
- [67] . 2020. Towards backward-compatible representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6368–6377.Google Scholar
Cross Ref
- [68] . 2016. Adversarial multi-task learning of deep neural networks for robust speech recognition. In Interspeech. 2369–2372.Google Scholar
- [69] . 2017. Generalized orderless pooling performs implicit salient matching. In Proceedings of the IEEE International Conference on Computer Vision.Google Scholar
Cross Ref
- [70] . 2018. The whole is more than its parts? From explicit to implicit pose normalization. IEEE Transactions on Pattern Analysis and Machine Intelligence (2018).Google Scholar
- [71] . 2020. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. arXiv:2001.07685. https://arxiv.org/abs/2006.05278.Google Scholar
- [72] . 2021. A realistic evaluation of semi-supervised learning for fine-grained classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12966–12975.Google Scholar
Cross Ref
- [73] . 2021. The Semi-Supervised iNaturalist-Aves Challenge at FGVC7 Workshop.Google Scholar
- [74] . 2017. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE International Conference on Computer Vision.Google Scholar
Cross Ref
- [75] . 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Advances in Neural Information Processing Systems. Google Scholar
Digital Library
- [76] . 2019. Fixing the train-test resolution discrepancy. In Advances in Neural Information Processing Systems. Google Scholar
Digital Library
- [77] . 2020. A survey on semi-supervised learning. Machine Learning 109 (2020), 373–440.Google Scholar
- [78] . 2017. Attention is all you need. In Advances in Neural Information Processing Systems. Google Scholar
Digital Library
- [79] . 2011. The Caltech-UCSD birds-200-2011 dataset.Google Scholar
- [80] . 2021. Deep face recognition: A survey. Neurocomputing 429 (2021), 215–244.
DOI: DOI: https://doi.org/10.1016/j.neucom.2020.10.081Google ScholarCross Ref
- [81] . 2020. Deep CNNs meet global covariance pooling: Better representation and generalization. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 8 (2020), 2582–2597.Google Scholar
Cross Ref
- [82] . 2013. Safety-aware semi-supervised classification. IEEE Transactions on Neural Networks and Learning Systems 24, 11 (2013), 1763–1772.Google Scholar
- [83] . 2019. Deep learning for fine-grained image analysis: A survey. arXiv:1907.03069. https://arxiv.org/abs/1907.03069.Google Scholar
- [84] . 2015. The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- [85] . 2019. Unsupervised data augmentation for consistency training. arXiv:1904.12848. https://arxiv.org/abs/1904.12848.Google Scholar
- [86] . 2019. Billion-scale semi-supervised learning for image classification.
arxiv:1905.00546 [cs.CV]. https://arxiv.org/abs/1905.00546.Google Scholar - [87] . 2018. Learning to navigate for fine-grained classification. In Proceedings of the European Conference on Computer Vision (ECCV’18).Google Scholar
Cross Ref
- [88] . 2015. Semi-supervised domain adaptation with subspace learning for visual recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [89] . 2021. Deep learning for person re-identification: A survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence.Google Scholar
Cross Ref
- [90] . 2008. CutMix: Regularization strategy to train strong classifiers with localizable features. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV’19). 6022–6031.Google Scholar
- [91] . 2019. S4l: Self-supervised semi-supervised learning. In Proceedings of the IEEE International Conference on Computer Vision.Google Scholar
Cross Ref
- [92] . 2020. Three-branch and multi-scale learning for fine-grained image recognition (TBMSL-Net). arXiv:2003.09150. https://arxiv.org/abs/2003.09150.Google Scholar
- [93] . 2016. SPDA-CNN: Unifying semantic part detection and abstraction for fine-grained recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- [94] . 2019. Unsupervised part mining for fine-grained image classification. arXiv:1902.09941. https://arxiv.org/abs/1902.09941.Google Scholar
- [95] . 2019. Learning a mixture of granularity-specific experts for fine-grained categorization. In Proceedings of the IEEE International Conference on Computer Vision.Google Scholar
Cross Ref
- [96] . 2014. Part-based R-CNNs for fine-grained category detection. In European Conference on Computer Vision. Springer.Google Scholar
Cross Ref
- [97] . 2017. Learning multi-attention convolutional neural network for fine-grained image recognition. In Proceedings of the IEEE International Conference on Computer Vision.Google Scholar
Cross Ref
- [98] . 2019. Learning deep bilinear transformation for fine-grained image representation. In Advances in Neural Information Processing Systems. Google Scholar
Digital Library
- [99] . 2003. Semi-supervised learning using Gaussian fields and harmonic functions. In Proceedings of the 20th International Conference on Machine Learning (ICML’03). Google Scholar
Digital Library
- [100] . 2020. Learning attentive pairwise interaction for fine-grained classification. In AAAI.Google Scholar
Index Terms
Fine-Grained Adversarial Semi-Supervised Learning
Recommendations
Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningIn multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
Adversarial Self-supervised Learning for Semi-supervised 3D Action Recognition
Computer Vision – ECCV 2020AbstractWe consider the problem of semi-supervised 3D action recognition which has been rarely explored before. Its major challenge lies in how to effectively learn motion representations from unlabeled data. Self-supervised learning (SSL) has been proved ...
Semi-supervised partial label learning algorithm via reliable label propagation
AbstractPartial label learning (PLL) is a weakly supervised learning method that is able to predict one label as the correct answer from a given candidate label set. In PLL, when all possible candidate labels are as signed to real-world training examples, ...






Comments