Abstract
Few-shot classification studies the problem of quickly adapting a deep learner to understanding novel classes based on few support images. In this context, recent research efforts have been aimed at designing more and more complex classifiers that measure similarities between query and support images but left the importance of feature embeddings seldom explored. We show that the reliance on sophisticated classifiers is not necessary, and a simple classifier applied directly to improved feature embeddings can instead outperform most of the leading methods in the literature. To this end, we present a new method, named DCAP, for few-shot classification, in which we investigate how one can improve the quality of embeddings by leveraging Dense Classification and Attentive Pooling (DCAP). Specifically, we propose to train a learner on base classes with abundant samples to solve dense classification problem first and then meta-train the learner on plenty of randomly sampled few-shot tasks to adapt it to few-shot scenario or the test time scenario. During meta-training, we suggest to pool feature maps by applying attentive pooling instead of the widely used global average pooling to prepare embeddings for few-shot classification. Attentive pooling learns to reweight local descriptors, explaining what the learner is looking for as evidence for decision making. Experiments on two benchmark datasets show the proposed method to be superior in multiple few-shot settings while being simpler and more explainable. Code is publicly available at https://github.com/Ukeyboard/dcap/.
- [1] . 2020. Meta-Learning with adaptive hyperparameters. In Proceedings of the 33th International Conference on Neural Information Processing Systems, , , , , and (Eds.).Google Scholar
- [2] . 2020. Learning from few samples: A survey. CoRR abs/2007.15484. Retrieved from https://arxiv.org/abs/2007.15484.Google Scholar
- [3] . 2017. Adaptively attending to visual attributes and linguistic knowledge for captioning. In Proceedings of the 25th ACM International Conference on Multimedia. 1345–1353.Google Scholar
Digital Library
- [4] . 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv:2004.10934. Retrieved from https://arxiv.org/abs/arXiv:2004.10934.Google Scholar
- [5] . 2011. The Caltech-UCSD Birds-200-2011 Dataset.
Technical Report .Google Scholar - [6] . 2019. A closer look at few-shot classification. In Proceedings of the 7th International Conference on Learning Representations.Google Scholar
- [7] . 2020. A new meta-baseline for few-shot learning. arXiv:2003.04390. Retrieved from https://arxiv.org/abs/2003.04390.Google Scholar
- [8] . 2019. A baseline for few-shot image classification. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [9] . 2015. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision. 1422–1430.Google Scholar
Digital Library
- [10] . 2020. CrossTransformers: Spatially-aware few-shot transfer. In Proceedings of the 34th Annual Conference on Neural Information Processing Systems.Google Scholar
- [11] . 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning, Volume 70. 1126–1135.Google Scholar
Digital Library
- [12] . 2018. Born-Again neural networks. In Proceedings of the 35th International Conference on Machine Learning, Volume 80. 1602–1611.Google Scholar
- [13] . 2018. Unsupervised representation learning by predicting image rotations. In Proceedings of the 6th International Conference on Learning Representations.Google Scholar
- [14] . 2021. Context-Aware graph inference with knowledge distillation for visual dialog. In IEEE Transactions on Pattern Analysis and Machine Intelligence.
DOI: 10.1109/TPAMI.2021.3085755Google Scholar - [15] . 2020. Iterative context-aware graph inference for visual dialog. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10052–10061.Google Scholar
Cross Ref
- [16] . 2017. Low-Shot visual recognition by shrinking and hallucinating features. In Proceedings of the International Conference on Computer Vision. 3037–3046.Google Scholar
Cross Ref
- [17] . 2020. Memory-Augmented relation network for few-shot learning. In Proceedings of the 28th ACM International Conference on Multimedia. 1236–1244.Google Scholar
Digital Library
- [18] . 2017. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision. 2961–2969.Google Scholar
Cross Ref
- [19] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google Scholar
Cross Ref
- [20] . 2020. Meta-Learning in neural networks: A survey. CoRR abs/2004.05439. Retrieved from https://arxiv.org/abs/2004.05439.Google Scholar
- [21] . 2019. Cross attention network for few-shot classification. In Proceedings of the 33rd Annual Conference on Neural Information Processing Systems. 4005–4016.Google Scholar
- [22] . 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4700–4708.Google Scholar
Cross Ref
- [23] . 2019. Task agnostic meta-learning for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11719–11727.Google Scholar
Cross Ref
- [24] . 2020. Few-shot classification via adaptive attention. arXiv:2008.02465. Retrieved from https://arxiv.org/abs/2008.02465.Google Scholar
- [25] . 2019. Edge-labeling graph neural network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11–20.Google Scholar
Cross Ref
- [26] . 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems. 1106–1114.Google Scholar
- [27] . 2016. Learning representations for automatic colorization. In Proceedings of the European Conference on Computer Vision. Springer, 577–593.Google Scholar
Cross Ref
- [28] . 2019. Meta-learning with differentiable convex optimization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10657–10665.Google Scholar
Cross Ref
- [29] . 2020. Boosting few-shot learning with adaptive margin loss. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 12573–12581.Google Scholar
Cross Ref
- [30] . 2019. Memory-based neighbourhood embedding for visual recognition. In Proceedings of the IEEE International Conference on Computer Vision. 6101–6110.Google Scholar
Cross Ref
- [31] . 2019. Revisiting local descriptor based image-to-class measure for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7260–7268.Google Scholar
Cross Ref
- [32] . 2021. Interventional video relation detection. In Proceedings of the ACM International Conference on Multimedia. 4091–4099.Google Scholar
Digital Library
- [33] . 2017. Meta-SGD: Learning to learn quickly for few shot learning. CoRR abs/1707.09835. Retrieved from http://arxiv.org/abs/1707.09835.Google Scholar
- [34] . 2019. Dense classification and implanting for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9258–9267.Google Scholar
Cross Ref
- [35] . 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2117–2125.Google Scholar
Cross Ref
- [36] . 2020. Deep neighborhood component analysis for visual similarity modeling. ACM Trans. Intell. Syst. Technol. 11, 3 (2020), 1–15.Google Scholar
Digital Library
- [37] . 2019. Learning to propagate labels: Transductive propagation network for few-shot learning. In Proceedings of the IEEE International Conference on Computer Vision. 6101–6110.Google Scholar
- [38] . 2020. Charting the right manifold: Manifold mixup for few-shot learning. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. IEEE, 2207–2216.Google Scholar
Cross Ref
- [39] . 2019. Learning using privileged information for food recognition. In Proceedings of the 27th ACM International Conference on Multimedia. 557–565.Google Scholar
Digital Library
- [40] . 2018. A simple neural attentive meta-learner. In Proceedings of the 6th International Conference on Learning Representations.Google Scholar
- [41] . 2018. On first-order meta-learning algorithms. CoRR abs/1803.02999. Retrieved from http://arxiv.org/abs/1803.02999.Google Scholar
- [42] . 2018. Tadam: Task dependent adaptive metric for improved few-shot learning. In Advances in Neural Information Processing Systems. 719–729.Google Scholar
- [43] . 2019. Rapid learning or feature reuse? towards understanding the effectiveness of maml. arXiv:1909.09157. Retrieved from https://arxiv.org/abs/1909.09157.Google Scholar
- [44] . 2017. Optimization as a model for few-shot learning. In Proceedings of the 5th International Conference on Learning Representations.Google Scholar
- [45] . 2018. Meta-Learning for semi-supervised few-shot classification. In Proceedings of the 5th International Conference on Learning Representations.Google Scholar
- [46] . 2020. Embedding propagation: Smoother manifold for few-shot classification. In Proceedings of the European Conference on Computer Vision. Springer, 121–138.Google Scholar
Digital Library
- [47] . 2015. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (2015), 211–252.Google Scholar
Digital Library
- [48] . 2019. Meta-Learning with latent embedding optimization. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [49] . 2016. Meta-Learning with memory-augmented neural networks. In Proceedings of the 33rd International Conference on Machine Learning, Vol. 48. 1842–1850.Google Scholar
- [50] . 2018. Few-Shot learning with graph neural networks. In Proceedings of the 6th International Conference on Learning Representations.Google Scholar
- [51] . 2019. Annotating objects and relations in user-generated videos. In Proceedings of the 2019 on International Conference on Multimedia Retrieval. 279–287.Google Scholar
Digital Library
- [52] . 2019. Skeleton-based action recognition with directed graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7912–7921.Google Scholar
Cross Ref
- [53] . 2020. Adaptive subspaces for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4136–4145.Google Scholar
Cross Ref
- [54] . 2017. Prototypical networks for few-shot learning. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems. 4077–4087.Google Scholar
- [55] . 2021. Bayesian few-shot classification with one-vs-each Pólya-Gamma augmented gaussian processes. In Proceedings of the 9th International Conference on Learning Representations.Google Scholar
- [56] . 2020. When does self-supervision improve few-shot learning? In Proceedings of the European Conference on Computer Vision. Springer, 645–666.Google Scholar
Digital Library
- [57] . 2019. Meta-transfer learning for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 403–412.Google Scholar
Cross Ref
- [58] . 2018. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1199–1208.Google Scholar
Cross Ref
- [59] . 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818–2826.Google Scholar
Cross Ref
- [60] . 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning. Vol. 97, 6105–6114.Google Scholar
- [61] . 2021. Selective dependency aggregation for action classification. In Proceedings of the 29th ACM International Conference on Multimedia. 592–601.Google Scholar
Digital Library
- [62] . 2020. Rethinking few-shot image classification: A good embedding is all you need? arXiv:2003.11539. Retrieved from https://arxiv.org/abs/2003.11539.Google Scholar
- [63] . 2020. Cross-Domain few-shot classification via learned feature-wise transformation. In Proceedings of the International Conference on Learning Representations, 2020.Google Scholar
- [64] . 2017. Attention is all you need. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems. 5998–6008.Google Scholar
- [65] . 2016. Matching networks for one shot learning. In Proceedings of the 30th Annual Conference on Neural Information Processing Systems. 3630–3638.Google Scholar
- [66] . 2018. Low-shot learning from imaginary data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7278–7286.Google Scholar
Cross Ref
- [67] . 2021. Few-Shot classification with feature map reconstruction networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021. 8012–8021.Google Scholar
Cross Ref
- [68] . 2019. Parn: Position-aware relation networks for few-shot learning. In Proceedings of the IEEE International Conference on Computer Vision. 6659–6667.Google Scholar
Cross Ref
- [69] . 2021. Unsupervised meta-learning for few-shot learning. Pattern Recogn. 116 (2021), 107951.Google Scholar
Digital Library
- [70] . 2021. Exploring parameter space with structured noise for meta-reinforcement learning. In Proceedings of the 29th International Conference on International Joint Conferences on Artificial Intelligence. 3153–3159.Google Scholar
- [71] . 2021. Deconfounded video moment retrieval with causal intervention. In ACM SIGIR Conference on Research and Development in Information Retrieval. 1–10.Google Scholar
Digital Library
- [72] . 2022. Video moment retrieval with cross-modal neural architecture search. IEEE Transactions on Image Processing. vol. 31. 1204–1216.Google Scholar
Cross Ref
- [73] . 2018. Person reidentification via structural deep metric learning. IEEE Transactions on Neural Networks and Learning Systems 30, 10 (2018), 2987–2998.Google Scholar
- [74] . 2020. Few-shot learning via embedding adaptation with set-to-set functions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8808–8817.Google Scholar
Cross Ref
- [75] . 2018. Bayesian model-agnostic meta-learning. In Proceedings of the 32nd Annual Conference on Neural Information Processing Systems. 7343–7353.Google Scholar
Digital Library
- [76] . 2021. Self-Regulation for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision. 6953–6963.Google Scholar
Cross Ref
- [77] . 2020. Uncertainty-Aware few-shot image classification. arXiv:2010.04525. Retrieved from https://arxiv.org/abs/2010.04525.Google Scholar
- [78] . 2020. More grounded image captioning by distilling image-text matching model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4777–4786.Google Scholar
Cross Ref
Index Terms
Revisiting Local Descriptor for Improved Few-Shot Classification
Recommendations
Learning to teach and learn for semi-supervised few-shot image classification
AbstractThis paper presents a novel semi-supervised few-shot image classification method named Learning to Teach and Learn (LTTL) to effectively leverage unlabeled samples in small-data regimes. Our method is based on self-training, which ...
Highlights- We propose a novel self-training strategy for semi-supervised few-shot image classification.
Zero-shot classification with unseen prototype learning
AbstractZero-shot learning (ZSL) aims at recognizing instances from unseen classes via training a classification model with only seen data. Most existing approaches easily suffer from the classification bias from unseen to seen categories since the models ...
Few-Shot Classification with Multi-task Self-supervised Learning
Neural Information ProcessingAbstractFew-shot learning aims to mitigate the need for large-scale annotated data in the real world. The focus of few-shot learning is how to quickly adapt to unseen tasks, which heavily depends on outstanding feature extraction ability. Motivated by the ...






Comments