Abstract
Zero-shot learning (ZSL) aims to recognize image instances of unseen classes solely based on the semantic descriptions of the unseen classes. In this field, Generalized Zero-Shot Learning (GZSL) is a challenging problem in which the images of both seen and unseen classes are mixed in the testing phase of learning. Existing methods formulate GZSL as a semantic-visual correspondence problem and apply generative models such as Generative Adversarial Networks and Variational Autoencoders to solve the problem. However, these methods suffer from the bias problem since the images of unseen classes are often misclassified into seen classes. In this work, a novel model named the Dual Projective model for Zero-Shot Learning (DPZSL) is proposed using text descriptions. In order to alleviate the bias problem, we leverage two autoencoders to project the visual and semantic features into a latent space and evaluate the embeddings by a visual-semantic correspondence loss function. An additional novel classifier is also introduced to ensure the discriminability of the embedded features. Our method focuses on a more challenging inductive ZSL setting in which only the labeled data from seen classes are used in the training phase. The experimental results, obtained from two popular datasets—Caltech-UCSD Birds-200-2011 (CUB) and North America Birds (NAB)—show that the proposed DPZSL model significantly outperforms both the inductive ZSL and GZSL settings. Particularly in the GZSL setting, our model yields an improvement up to 15.2% in comparison with state-of-the-art CANZSL on datasets CUB and NAB with two splittings.
- [1] . 2016. Multi-cue zero-shot learning with strong supervision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, Nevada, USA, 59–68.Google Scholar
Cross Ref
- [2] . 2015. Label-embedding for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 7 (2015), 1425–1438.Google Scholar
Cross Ref
- [3] . 2015. Evaluation of output embeddings for fine-grained image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Boston, Massachusetts, USA, 2927–2936.Google Scholar
Cross Ref
- [4] . 2018. Preserving semantic relations for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, Utah, USA, 7603–7612.Google Scholar
- [5] . 2017. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 12 (2017), 2481–2495.Google Scholar
- [6] . 2016. Synthesized classifiers for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, Nevada, USA, 5327–5336.Google Scholar
Cross Ref
- [7] . 2020. Classifier and exemplar synthesis for zero-shot learning. International Journal of Computer Vision 128, 1 (2020), 166–201.Google Scholar
Digital Library
- [8] . 2016. An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In European Conference on Computer Vision. Springer, Amsterdam, the Netherlands, 52–68.Google Scholar
Cross Ref
- [9] . 2020. CANZSL: Cycle-consistent adversarial networks for zero-shot learning from natural language. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. IEEE, Snowmass village, Colorado, 874–883.Google Scholar
Cross Ref
- [10] . 2020. Adaptive and generative zero-shot learning. In International Conference on Learning Representations. IEEE, Vienna, Austria.Google Scholar
- [11] . 2017. General knowledge embedded image representation learning. IEEE Transactions on Multimedia 20, 1 (2017), 198–207.Google Scholar
Digital Library
- [12] . 2019. Creativity inspired zero-shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, Seoul, Korea (South), 5784–5793.Google Scholar
Cross Ref
- [13] . 2016. Write a classifier: Predicting visual classifiers from unstructured text. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 12 (2016), 2539–2553.Google Scholar
Cross Ref
- [14] . 2017. Link the head to the “beak”: Zero shot learning from noisy text description at part precision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Honolulu, Hawaii, USA, 5640–5649.Google Scholar
Cross Ref
- [15] . 2013. DeVISE: A deep visual-semantic embedding model. Advances in Neural Information Processing Systems 26 (2013).Google Scholar
- [16] . 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, Santiago, Chile, 1440–1448.Google Scholar
Digital Library
- [17] . 2020. Context-aware feature generation for zero-shot semantic segmentation. In Proceedings of the 28th ACM International Conference on Multimedia. ACM, New York, NY, USA, 1921–1929.Google Scholar
Digital Library
- [18] . 2017. VRFP: On-the-fly video retrieval using web images and fast Fisher vector products. IEEE Transactions on Multimedia 19, 7 (2017), 1583–1595.Google Scholar
Digital Library
- [19] . 2019. Generative dual adversarial network for generalized zero-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Long Beach, California, 801–810.Google Scholar
Cross Ref
- [20] . 2015. Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR’15). San Diego, CA, arXiv preprint arXiv:1412.6980 9 (2015).Google Scholar
- [21] . 2015. Unsupervised domain adaptation for zero-shot learning. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, Santiago, Chile, 2452–2460.Google Scholar
Digital Library
- [22] . 2017. Semantic autoencoder for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Honolulu, Hawaii, USA, 3174–3183.Google Scholar
Cross Ref
- [23] . 2013. Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 3 (2013), 453–465.Google Scholar
Digital Library
- [24] . 2015. Deep learning. Nature 521, 7553 (2015), 436–444.Google Scholar
Cross Ref
- [25] . 2019. Alleviating feature confusion for generative zero-shot learning. In Proceedings of the 27th ACM International Conference on Multimedia. ACM, New York, NY, USA, 1587–1595.Google Scholar
Digital Library
- [26] . 2019. Rethinking zero-shot learning: A conditional visual classification perspective. In Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, Seoul, South Korea, 3583–3592.Google Scholar
Cross Ref
- [27] . 2021. Open world compositional zero-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Nashville, TN, USA, 5222–5230.Google Scholar
Cross Ref
- [28] . 2018. A generative model for zero shot learning using conditional variational autoencoders. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. IEEE, Salt Lake City, Utah, USA, 2188–2196.Google Scholar
Cross Ref
- [29] . 2017. Semantically consistent regularization for zero-shot recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Honolulu, HI, USA, 6060–6069.Google Scholar
Cross Ref
- [30] . 2019. Zero-shot task transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Long Beach, California, USA, 2189–2198.Google Scholar
Cross Ref
- [31] . 2019. CM-GANs: Cross-modal generative adversarial networks for common representation learning. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 1 (2019), 1–24.Google Scholar
Digital Library
- [32] . 2016. Less is more: Zero-shot learning from online textual documents with noise suppression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, Nevada, USA, 2249–2257.Google Scholar
Cross Ref
- [33] . 2019. Deep0tag: Deep multiple instance learning for zero-shot image tagging. IEEE Transactions on Multimedia 22, 1 (2019), 242–255.Google Scholar
Digital Library
- [34] . 2015. An embarrassingly simple approach to zero-shot learning. In International Conference on Machine Learning. JMLR.org, Lille, France, 2152–2161.Google Scholar
Digital Library
- [35] . 1988. Term-weighting approaches in automatic text retrieval. Information Processing & Management 24, 5 (1988), 513–523.Google Scholar
Digital Library
- [36] . 2019. Generalized zero- and few-shot learning via aligned variational autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Long Beach, CA, USA, 8247–8255.Google Scholar
Cross Ref
- [37] . 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- [38] . 2018. Transductive unbiased embedding for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, Utah, USA, 1024–1033.Google Scholar
Cross Ref
- [39] . 2008. Visualizing data using t-SNE.Journal of Machine Learning Research 9, 11 (2008).Google Scholar
- [40] . 2015. Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Boston, Massachusetts, USA, 595–604.Google Scholar
Cross Ref
- [41] . 2020. Meta-learning for generalized zero-shot learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. New York, USA, 6062–6069.Google Scholar
Cross Ref
- [42] . 2011. The Caltech-UCSD birds-200-2011 dataset. (2011).Google Scholar
- [43] . 2019. A survey of zero-shot learning: Settings, methods, and applications. ACM Transactions on Intelligent Systems and Technology 10, 2 (2019), 1–37.Google Scholar
Digital Library
- [44] . 2015. Zero-shot person re-identification via cross-view consistency. IEEE Transactions on Multimedia 18, 2 (2015), 260–272.Google Scholar
Digital Library
- [45] . 2016. Unsupervised deep embedding for clustering analysis. In International Conference on Machine Learning. PMLR, New York, USA, 478–487.Google Scholar
Digital Library
- [46] . 2019. Adversarially approximated autoencoder for image generation and manipulation. IEEE Transactions on Multimedia 21, 9 (2019), 2387–2396.Google Scholar
Cross Ref
- [47] . 2016. Attribute2image: Conditional image generation from visual attributes. In European Conference on Computer Vision. Springer, Amsterdam, The Netherlands, 776–791.Google Scholar
Cross Ref
- [48] . 2020. Towards effective deep embedding for zero-shot learning. IEEE Transactions on Circuits and Systems for Video Technology 30, 9 (2020), 2843–2852.Google Scholar
Digital Library
- [49] . 2017. Learning a deep embedding model for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Honolulu, Hawaii, USA, 2021–2030.Google Scholar
Cross Ref
- [50] . 2015. Zero-shot learning via semantic similarity embedding. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, Santiago, Chile, 4166–4174.Google Scholar
Digital Library
- [51] . 2016. Zero-shot learning via joint latent similarity embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, Nevada, USA, 6034–6042.Google Scholar
Cross Ref
- [52] . 2018. A generative adversarial approach for zero-shot learning from noisy texts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, USA, 1004–1013.Google Scholar
Cross Ref
Index Terms
Dual Projective Zero-Shot Learning Using Text Descriptions
Recommendations
Classifier and Exemplar Synthesis for Zero-Shot Learning
AbstractZero-shot learning (ZSL) enables solving a task without the need to see its examples. In this paper, we propose two ZSL frameworks that learn to synthesize parameters for novel unseen classes. First, we propose to cast the problem of ZSL as ...
Generalized Zero-Shot Learning using Identifiable Variational Autoencoders
Highlights- Identifiable VAE is a generative model to address conventional and generalized ZSL.
AbstractDeep learning tasks rely heavily on a large amount of training data, but collecting and annotating data daily is not practical. Therefore, Zero-shot learning (ZSL) has become important for the applications, where no labeled data is ...
Label-activating framework for zero-shot learning
AbstractExisting zero-shot learning (ZSL) models usually learn mappings between visual space and semantic space. However, few of them take the label information into account. Indirect Attribute Prediction (IAP) learns the posterior probability ...






Comments