skip to main content
research-article

Dual Projective Zero-Shot Learning Using Text Descriptions

Published:05 January 2023Publication History
Skip Abstract Section

Abstract

Zero-shot learning (ZSL) aims to recognize image instances of unseen classes solely based on the semantic descriptions of the unseen classes. In this field, Generalized Zero-Shot Learning (GZSL) is a challenging problem in which the images of both seen and unseen classes are mixed in the testing phase of learning. Existing methods formulate GZSL as a semantic-visual correspondence problem and apply generative models such as Generative Adversarial Networks and Variational Autoencoders to solve the problem. However, these methods suffer from the bias problem since the images of unseen classes are often misclassified into seen classes. In this work, a novel model named the Dual Projective model for Zero-Shot Learning (DPZSL) is proposed using text descriptions. In order to alleviate the bias problem, we leverage two autoencoders to project the visual and semantic features into a latent space and evaluate the embeddings by a visual-semantic correspondence loss function. An additional novel classifier is also introduced to ensure the discriminability of the embedded features. Our method focuses on a more challenging inductive ZSL setting in which only the labeled data from seen classes are used in the training phase. The experimental results, obtained from two popular datasets—Caltech-UCSD Birds-200-2011 (CUB) and North America Birds (NAB)—show that the proposed DPZSL model significantly outperforms both the inductive ZSL and GZSL settings. Particularly in the GZSL setting, our model yields an improvement up to 15.2% in comparison with state-of-the-art CANZSL on datasets CUB and NAB with two splittings.

REFERENCES

  1. [1] Akata Zeynep, Malinowski Mateusz, Fritz Mario, and Schiele Bernt. 2016. Multi-cue zero-shot learning with strong supervision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, Nevada, USA, 5968.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Akata Zeynep, Perronnin Florent, Harchaoui Zaid, and Schmid Cordelia. 2015. Label-embedding for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 7 (2015), 14251438.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Akata Zeynep, Reed Scott, Walter Daniel, Lee Honglak, and Schiele Bernt. 2015. Evaluation of output embeddings for fine-grained image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Boston, Massachusetts, USA, 29272936.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Annadani Yashas and Biswas Soma. 2018. Preserving semantic relations for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, Utah, USA, 76037612.Google ScholarGoogle Scholar
  5. [5] Badrinarayanan Vijay, Kendall Alex, and Cipolla Roberto. 2017. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 12 (2017), 24812495.Google ScholarGoogle Scholar
  6. [6] Changpinyo Soravit, Chao Wei-Lun, Gong Boqing, and Sha Fei. 2016. Synthesized classifiers for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, Nevada, USA, 53275336.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Changpinyo Soravit, Chao Wei-Lun, Gong Boqing, and Sha Fei. 2020. Classifier and exemplar synthesis for zero-shot learning. International Journal of Computer Vision 128, 1 (2020), 166201.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Chao Wei-Lun, Changpinyo Soravit, Gong Boqing, and Sha Fei. 2016. An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In European Conference on Computer Vision. Springer, Amsterdam, the Netherlands, 5268.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Chen Zhi, Li Jingjing, Luo Yadan, Huang Zi, and Yang Yang. 2020. CANZSL: Cycle-consistent adversarial networks for zero-shot learning from natural language. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. IEEE, Snowmass village, Colorado, 874883.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Chou Yu-Ying, Lin Hsuan-Tien, and Liu Tyng-Luh. 2020. Adaptive and generative zero-shot learning. In International Conference on Learning Representations. IEEE, Vienna, Austria.Google ScholarGoogle Scholar
  11. [11] Cui Peng, Liu Shaowei, and Zhu Wenwu. 2017. General knowledge embedded image representation learning. IEEE Transactions on Multimedia 20, 1 (2017), 198207.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Elhoseiny Mohamed and Elfeki Mohamed. 2019. Creativity inspired zero-shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, Seoul, Korea (South), 57845793.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Elhoseiny Mohamed, Elgammal Ahmed, and Saleh Babak. 2016. Write a classifier: Predicting visual classifiers from unstructured text. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 12 (2016), 25392553.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Elhoseiny Mohamed, Zhu Yizhe, Zhang Han, and Elgammal Ahmed. 2017. Link the head to the “beak”: Zero shot learning from noisy text description at part precision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Honolulu, Hawaii, USA, 56405649.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Frome Andrea, Corrado Greg S., Shlens Jon, Bengio Samy, Dean Jeff, Ranzato Marc’Aurelio, and Mikolov Tomas. 2013. DeVISE: A deep visual-semantic embedding model. Advances in Neural Information Processing Systems 26 (2013).Google ScholarGoogle Scholar
  16. [16] Girshick Ross. 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, Santiago, Chile, 14401448.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Gu Zhangxuan, Zhou Siyuan, Niu Li, Zhao Zihan, and Zhang Liqing. 2020. Context-aware feature generation for zero-shot semantic segmentation. In Proceedings of the 28th ACM International Conference on Multimedia. ACM, New York, NY, USA, 19211929.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Han Xintong, Singh Bharat, Morariu Vlad I., and Davis Larry S.. 2017. VRFP: On-the-fly video retrieval using web images and fast Fisher vector products. IEEE Transactions on Multimedia 19, 7 (2017), 15831595.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Huang He, Wang Changhu, Yu Philip S., and Wang Chang-Dong. 2019. Generative dual adversarial network for generalized zero-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Long Beach, California, 801810.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Kingma Diederik P. and Jimmy Ba. 2015. Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR’15). San Diego, CA, arXiv preprint arXiv:1412.6980 9 (2015).Google ScholarGoogle Scholar
  21. [21] Kodirov Elyor, Xiang Tao, Fu Zhenyong, and Gong Shaogang. 2015. Unsupervised domain adaptation for zero-shot learning. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, Santiago, Chile, 24522460.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Kodirov Elyor, Xiang Tao, and Gong Shaogang. 2017. Semantic autoencoder for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Honolulu, Hawaii, USA, 31743183.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Lampert Christoph H., Nickisch Hannes, and Harmeling Stefan. 2013. Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 3 (2013), 453465.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] LeCun Yann, Bengio Yoshua, and Hinton Geoffrey. 2015. Deep learning. Nature 521, 7553 (2015), 436444.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Li Jingjing, Jing Mengmeng, Lu Ke, Zhu Lei, Yang Yang, and Huang Zi. 2019. Alleviating feature confusion for generative zero-shot learning. In Proceedings of the 27th ACM International Conference on Multimedia. ACM, New York, NY, USA, 15871595.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Li Kai, Min Martin Renqiang, and Fu Yun. 2019. Rethinking zero-shot learning: A conditional visual classification perspective. In Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, Seoul, South Korea, 35833592.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Mancini Massimiliano, Naeem Muhammad Ferjad, Xian Yongqin, and Akata Zeynep. 2021. Open world compositional zero-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Nashville, TN, USA, 52225230.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Mishra Ashish, Reddy Shiva Krishna, Mittal Anurag, and Murthy Hema A.. 2018. A generative model for zero shot learning using conditional variational autoencoders. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. IEEE, Salt Lake City, Utah, USA, 21882196.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Morgado Pedro and Vasconcelos Nuno. 2017. Semantically consistent regularization for zero-shot recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Honolulu, HI, USA, 60606069.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Pal Arghya and Balasubramanian Vineeth N.. 2019. Zero-shot task transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Long Beach, California, USA, 21892198.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Peng Yuxin and Qi Jinwei. 2019. CM-GANs: Cross-modal generative adversarial networks for common representation learning. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 1 (2019), 124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Qiao Ruizhi, Liu Lingqiao, Shen Chunhua, and Hengel Anton Van Den. 2016. Less is more: Zero-shot learning from online textual documents with noise suppression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, Nevada, USA, 22492257.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Rahman Shafin, Khan Salman, and Barnes Nick. 2019. Deep0tag: Deep multiple instance learning for zero-shot image tagging. IEEE Transactions on Multimedia 22, 1 (2019), 242255.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Romera-Paredes Bernardino and Torr Philip. 2015. An embarrassingly simple approach to zero-shot learning. In International Conference on Machine Learning. JMLR.org, Lille, France, 21522161.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Salton Gerard and Buckley Christopher. 1988. Term-weighting approaches in automatic text retrieval. Information Processing & Management 24, 5 (1988), 513523.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Schonfeld Edgar, Ebrahimi Sayna, Sinha Samarth, Darrell Trevor, and Akata Zeynep. 2019. Generalized zero- and few-shot learning via aligned variational autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Long Beach, CA, USA, 82478255.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Simonyan Karen and Zisserman Andrew. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  38. [38] Song Jie, Shen Chengchao, Yang Yezhou, Liu Yang, and Song Mingli. 2018. Transductive unbiased embedding for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, Utah, USA, 10241033.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Maaten Laurens Van der and Hinton Geoffrey. 2008. Visualizing data using t-SNE.Journal of Machine Learning Research 9, 11 (2008).Google ScholarGoogle Scholar
  40. [40] Horn Grant Van, Branson Steve, Farrell Ryan, Haber Scott, Barry Jessie, Ipeirotis Panos, Perona Pietro, and Belongie Serge. 2015. Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Boston, Massachusetts, USA, 595604.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Verma Vinay Kumar, Brahma Dhanajit, and Rai Piyush. 2020. Meta-learning for generalized zero-shot learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. New York, USA, 60626069.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Wah Catherine, Branson Steve, Welinder Peter, Perona Pietro, and Belongie Serge. 2011. The Caltech-UCSD birds-200-2011 dataset. (2011).Google ScholarGoogle Scholar
  43. [43] Wang Wei, Zheng Vincent W., Yu Han, and Miao Chunyan. 2019. A survey of zero-shot learning: Settings, methods, and applications. ACM Transactions on Intelligent Systems and Technology 10, 2 (2019), 137.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Wang Zheng, Hu Ruimin, Liang Chao, Yu Yi, Jiang Junjun, Ye Mang, Chen Jun, and Leng Qingming. 2015. Zero-shot person re-identification via cross-view consistency. IEEE Transactions on Multimedia 18, 2 (2015), 260272.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Xie Junyuan, Girshick Ross, and Farhadi Ali. 2016. Unsupervised deep embedding for clustering analysis. In International Conference on Machine Learning. PMLR, New York, USA, 478487.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Xu Wenju, Keshmiri Shawn, and Wang Guanghui. 2019. Adversarially approximated autoencoder for image generation and manipulation. IEEE Transactions on Multimedia 21, 9 (2019), 23872396.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Yan Xinchen, Yang Jimei, Sohn Kihyuk, and Lee Honglak. 2016. Attribute2image: Conditional image generation from visual attributes. In European Conference on Computer Vision. Springer, Amsterdam, The Netherlands, 776791.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Zhang Lei, Wang Peng, Liu Lingqiao, Shen Chunhua, Wei Wei, Zhang Yanning, and Hengel Anton Van Den. 2020. Towards effective deep embedding for zero-shot learning. IEEE Transactions on Circuits and Systems for Video Technology 30, 9 (2020), 28432852.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Zhang Li, Xiang Tao, and Gong Shaogang. 2017. Learning a deep embedding model for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Honolulu, Hawaii, USA, 20212030.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Zhang Ziming and Saligrama Venkatesh. 2015. Zero-shot learning via semantic similarity embedding. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, Santiago, Chile, 41664174.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Zhang Ziming and Saligrama Venkatesh. 2016. Zero-shot learning via joint latent similarity embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, Nevada, USA, 60346042.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Zhu Yizhe, Elhoseiny Mohamed, Liu Bingchen, Peng Xi, and Elgammal Ahmed. 2018. A generative adversarial approach for zero-shot learning from noisy texts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, USA, 10041013.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Dual Projective Zero-Shot Learning Using Text Descriptions

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 1
        January 2023
        505 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3572858
        • Editor:
        • Abdulmotaleb El Saddik
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 5 January 2023
        • Online AM: 29 July 2022
        • Accepted: 25 January 2022
        • Revised: 22 October 2021
        • Received: 30 May 2021
        Published in tomm Volume 19, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!