skip to main content
research-article

Aligning Image Semantics and Label Concepts for Image Multi-Label Classification

Authors Info & Claims
Published:06 February 2023Publication History
Skip Abstract Section

Abstract

Image multi-label classification task is mainly to correctly predict multiple object categories in the images. To capture the correlation between labels, graph convolution network based methods have to manually count the label co-occurrence probability from training data to construct a pre-defined graph as the input of graph network, which is inflexible and may degrade model generalizability. Moreover, most of the current methods cannot effectively align the learned salient object features with the label concepts, so that the predicted results of model may not be consistent with the image content. Therefore, how to learn the salient semantic features of images and capture the correlation between labels, and then effectively align them is one of the key to improve the performance of image multi-label classification task. To this end, we propose a novel image multi-label classification framework which aims to align Image Semantics with Label Concepts (ISLC). Specifically, we propose a residual encoder to learn salient object features in the images, and exploit the self-attention layer in aligned decoder to automatically capture the correlation between labels. Then, we leverage the cross-attention layers in aligned decoder to align image semantic features with label concepts, so as to make the labels predicted by model more consistent with image content. Finally, the output features of the last layer of residual encoder and aligned decoder are fused to obtain the final output feature for classification. The proposed ISLC model achieves good performance on various prevalent multi-label image datasets such as MS-COCO 2014, PASCAL VOC 2007, VG-500, and NUS-WIDE with 87.2%, 96.9%, 39.4%, and 64.2%, respectively.

REFERENCES

  1. [1] Cevikalp Hakan, Benligiray Burak, Gerek Ömer Nezih, and Saribas Hasan. 2019. Semi-supervised robust deep neural networks for multi-label classification. In Proceedings of the CVPR Workshops. 917.Google ScholarGoogle Scholar
  2. [2] Chen Tianshui, Lin Liang, Hui Xiaolu, Chen Riquan, and Wu Hefeng. 2020. Knowledge-guided multi-label few-shot learning for general image recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 3 (2020), 1371–1384.Google ScholarGoogle Scholar
  3. [3] Chen Tianshui, Xu Muxin, Hui Xiaolu, Wu Hefeng, and Lin Liang. 2019. Learning semantic-specific graph representation for multi-label image recognition. In Proceedings of the IEEE International Conference on Computer Vision. 522531.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Chen Zhao-Min, Wei Xiu-Shen, Wang Peng, and Guo Yanwen. 2019. Multi-label image recognition with graph convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 51775186.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Chu Xiangxiang, Zhang Bo, Tian Zhi, Wei Xiaolin, and Xia Huaxia. 2021. Do we really need explicit position encodings for vision transformers? arXiv:2102.10882. Retrieved from https://arxiv.org/abs/2102.10882.Google ScholarGoogle Scholar
  6. [6] Chua Tat-Seng, Tang Jinhui, Hong Richang, Li Haojie, Luo Zhiping, and Zheng Yantao. 2009. NUS-WIDE: A real-world web image database from National University of Singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval. 19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Dai Zihang, Yang Zhilin, Yang Yiming, Carbonell Jaime, Le Quoc V., and Salakhutdinov Ruslan. 2019. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv:1901.02860. Retrieved from https://arxiv.org/abs/1901.02860.Google ScholarGoogle Scholar
  8. [8] Deng Jia, Dong Wei, Socher Richard, Li Li-Jia, Li Kai, and Fei-Fei Li. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248255.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Dosovitskiy Alexey, Beyer Lucas, Kolesnikov Alexander, Weissenborn Dirk, Zhai Xiaohua, Unterthiner Thomas, Dehghani Mostafa, Minderer Matthias, Heigold Georg, Gelly Sylvain, Jakob Uszkoreit, and Neil Houlsby. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929. Retrieved from https://arxiv.org/abs/2010.11929.Google ScholarGoogle Scholar
  10. [10] Durand Thibaut, Mordan Taylor, Thome Nicolas, and Cord Matthieu. 2017. Wildcat: Weakly supervised learning of deep convnets for image classification, pointwise localization and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 642651.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Durand Thibaut, Thome Nicolas, and Cord Matthieu. 2018. Exploiting negative evidence for deep latent structured models. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 2 (2018), 337351.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Dutta Ayushi, Verma Yashaswi, and Jawahar C. V.. 2020. Recurrent image annotation with explicit inter-label dependencies. In Proceedings of the European Conference on Computer Vision. Springer, 191207.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Everingham Mark, Gool Luc Van, Williams Christopher K. I., Winn John, and Zisserman Andrew. 2010. The pascal visual object classes (voc) challenge. International Journal of Computer Vision 88, 2 (2010), 303338.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Gao Bin-Bin and Zhou Hong-Yu. 2021. Learning to discover multi-class attentional regions for multi-label image recognition. IEEE Transactions on Image Processing 30, 6 (2021), 5920–5932.Google ScholarGoogle Scholar
  15. [15] Gong Yunchao, Jia Yangqing, Leung Thomas, Toshev Alexander, and Ioffe Sergey. 2013. Deep convolutional ranking for multilabel image annotation. arXiv:1312.4894. Retrieved from https://arxiv.org/abs/1312.4894.Google ScholarGoogle Scholar
  16. [16] Guo Hao, Zheng Kang, Fan Xiaochuan, Yu Hongkai, and Wang Song. 2019. Visual attention consistency under image transforms for multi-label image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 729739.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Guo Jinyang, Ouyang Wanli, and Xu Dong. 2020. Channel pruning guided by classification loss and feature importance. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34, 1088510892.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Guo Jinyang, Ouyang Wanli, and Xu Dong. 2020. Multi-dimensional pruning: A unified framework for model compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15081517.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Guo Jinyang, Zhang Weichen, Ouyang Wanli, and Xu Dong. 2020. Model compression using progressive channel pruning. IEEE Transactions on Circuits and Systems for Video Technology 31, 3 (2020), 11141124.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Gupta Shikha, Sharma Krishan, Dinesh Dileep Aroor, and Thenkanidiyoor Veena. 2021. Visual semantic-based representation learning using deep CNNs for scene recognition. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 2 (2021), 124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Hassanin Mohammed, Radwan Ibrahim, Khan Salman, and Tahtali Murat. 2022. Learning discriminative representations for multi-label image recognition. Journal of Visual Communication and Image Representation 83, C (2022), 103448.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] He Ruining, Ravula Anirudh, Kanagal Bhargav, and Ainslie Joshua. 2020. RealFormer: Transformer likes residual attention. arXiv:2012.11747. Retrieved from https://arxiv.org/abs/2012.11747.Google ScholarGoogle Scholar
  24. [24] Hu Yutao, Liu Xuhui, Zhang Baochang, Han Jungong, and Cao Xianbin. 2021. Alignment enhancement network for fine-grained visual categorization. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 1s (2021), 120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Huang Feiran, Wei Kaimin, Weng Jian, and Li Zhoujun. 2020. Attention-based modality-gated networks for image-text sentiment analysis. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 3 (2020), 119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Huang Zhicheng, Zeng Zhaoyang, Liu Bei, Fu Dongmei, and Fu Jianlong. 2020. Pixel-bert: Aligning image pixels with text by deep multi-modal transformers. arXiv:2004.00849. Retrieved from https://arxiv.org/abs/2004.00849.Google ScholarGoogle Scholar
  27. [27] Ji Wanting and Wang Ruili. 2021. A multi-instance multi-label dual learning approach for video captioning. ACM Transactions on Multimedia Computing Communications and Applications 17, 2s (2021), 118.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Jin Jiren and Nakayama Hideki. 2016. Annotation order matters: Recurrent image annotator for arbitrary length image tagging. In Proceedings of the 2016 23rd International Conference on Pattern Recognition. IEEE, 24522457.Google ScholarGoogle Scholar
  29. [29] Krishna Ranjay, Zhu Yuke, Groth Oliver, Johnson Justin, Hata Kenji, Kravitz Joshua, Chen Stephanie, Kalantidis Yannis, Li Li-Jia, Shamma David A., Michael S. Bernstein, and Li Fei-Fei. 2017. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision 123, 1 (2017), 3273.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Lanchantin Jack, Wang Tianlu, Ordonez Vicente, and Qi Yanjun. 2021. General multi-label image classification with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1647816488.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Li Duo, Yao Anbang, and Chen Qifeng. 2020. PSConv: Squeezing feature pyramid into one compact poly-scale convolutional layer. In Proceedings of the European Conference on Computer Vision. Springer, 615–632.Google ScholarGoogle Scholar
  32. [32] Li Junbing, Zhang Changqing, Wang Xueman, and Du Ling. 2020. Multi-scale cross-modal spatial attention fusion for multi-label image recognition. In Proceedings of the International Conference on Artificial Neural Networks. Springer, 736747.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Li Liang, Zhu Xinge, Hao Yiming, Wang Shuhui, Gao Xingyu, and Huang Qingming. 2019. A hierarchical CNN-RNN approach for visual emotion classification. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 3s (2019), 117.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Li Qing, Peng Xiaojiang, Qiao Yu, and Peng Qiang. 2019. Learning category correlations for multi-label image recognition with graph networks. arXiv:1909.13005. Retrieved from https://arxiv.org/abs/1909.13005.Google ScholarGoogle Scholar
  35. [35] Li Zhixin, Lin Lan, Zhang Canlong, Ma Huifang, Zhao Weizhong, and Shi Zhiping. 2021. A semi-supervised learning approach based on adaptive weighted fusion for automatic image annotation. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 1 (2021), 123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Lin Tsung-Yi, Maire Michael, Belongie Serge, Hays James, Perona Pietro, Ramanan Deva, Dollár Piotr, and Zitnick C. Lawrence. 2014. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740755.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Liu Luchen, Guo Sheng, Huang Weilin, and Scott Matthew R.. 2019. Decoupling category-wise independence and relevance with self-attention for multi-label image classification. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 16821686.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Liu Lingqiao, Wang Peng, Shen Chunhua, Wang Lei, Hengel Anton Van Den, Wang Chao, and Shen Heng Tao. 2017. Compositional model based fisher vector coding for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 12 (2017), 23352348.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Lyu Fan, Hu Fuyuan, Sheng Victor S., Wu Zhengtian, Fu Qiming, and Fu Baochuan. 2018. Coarse to fine: Multi-label image classification with global/local attention. In Proceedings of the 2018 IEEE International Smart Cities Conference. IEEE, 17.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Lyu Fan, Wu Qi, Hu Fuyuan, Wu Qingyao, and Tan Mingkui. 2019. Attend and imagine: Multi-label image classification with visual attention and recurrent neural networks. IEEE Transactions on Multimedia 21, 8 (2019), 19711981.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Maaten Laurens van der and Hinton Geoffrey. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, Nov (2008), 25792605.Google ScholarGoogle Scholar
  42. [42] Meng Quanling and Zhang Weigang. 2019. Multi-label image classification with attention mechanism and graph convolutional networks. In Proceedings of the ACM Multimedia Asia. 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Nguyen Hoang D., Vu Xuan-Son, and Le Duc-Trong. 2021. Modular graph transformer networks for multi-label image classification. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35, 90929100.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Pu Tao, Yuan Lixian, Wu Hefeng, Chen Tianshui, Tian Ling, and Lin Liang. 2022. Semantic representation and dependency learning for multi-label image recognition. arXiv:2204.03795. Retrieved from https://arxiv.org/abs/2204.03795.Google ScholarGoogle Scholar
  45. [45] Selvaraju Ramprasaath R., Cogswell Michael, Das Abhishek, Vedantam Ramakrishna, Parikh Devi, and Batra Dhruv. 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 618626.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Sun Dengdi, Ma Leilei, Ding Zhuanlian, and Luo Bin. 2022. An attention-driven multi-label image classification with semantic embedding and graph convolutional networks. Cognitive Computation 9, 1 (2022), 112.Google ScholarGoogle Scholar
  47. [47] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 59986008.Google ScholarGoogle Scholar
  48. [48] Veličković Petar, Cucurull Guillem, Casanova Arantxa, Romero Adriana, Lio Pietro, and Bengio Yoshua. 2017. Graph attention networks. arXiv:1710.10903. Retrieved from https://arxiv.org/abs/1710.10903.Google ScholarGoogle Scholar
  49. [49] Vu Xuan-Son, Le Duc-Trong, Edlund Christoffer, Jiang Lili, and Nguyen Hoang D.. 2020. Privacy-preserving visual content tagging using graph transformer networks. In Proceedings of the 28th ACM International Conference on Multimedia. 22992307.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Wang Jiang, Yang Yi, Mao Junhua, Huang Zhiheng, Huang Chang, and Xu Wei. 2016. CNN-RNN: A unified framework for multi-label image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 22852294.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Wang Xiaolong, Girshick Ross, Gupta Abhinav, and He Kaiming. 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 77947803.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Wang Xiaomei, Li Yaqian, Luo Tong, Guo Yandong, Fu Yanwei, and Xue Xiangyang. 2021. Distance restricted transformer encoder for multi-label classification. In Proceedings of the 2021 IEEE International Conference on Multimedia and Expo. IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Wang Ya, He Dongliang, Li Fu, Long Xiang, Zhou Zhichao, Ma Jinwen, and Wen Shilei. 2020. Multi-label classification with label graph superimposing. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34, 1226512272.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Wang Yangtao, Xie Yanzhao, Liu Yu, Zhou Ke, and Li Xiaocui. 2020. Fast graph convolution network based multi-label image recognition via cross-modal fusion. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 15751584.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] Wang Zhouxia, Chen Tianshui, Li Guanbin, Xu Ruijia, and Lin Liang. 2017. Multi-label image recognition by recurrently discovering attentional regions. In Proceedings of the IEEE International Conference on Computer Vision. 464472.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Wang Zhe, Fang Zhongli, Li Dongdong, Yang Hai, and Du Wenli. 2021. Semantic supplementary network with prior information for multi-label image classification. IEEE Transactions on Circuits and Systems for Video Technology 32, 4 (2021), 1848–1859.Google ScholarGoogle Scholar
  57. [57] Wu Xiangping, Chen Qingcai, Li Wei, Xiao Yulun, and Hu Baotian. 2020. AdaHGNN: Adaptive hypergraph neural networks for multi-label image classification. In Proceedings of the 28th ACM International Conference on Multimedia. 284293.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Xie Saining, Girshick Ross, Dollár Piotr, Tu Zhuowen, and He Kaiming. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 14921500.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Yan Zheng, Liu Weiwei, Wen Shiping, and Yang Yin. 2019. Multi-label image classification by feature attention network. IEEE Access 7 (2019), 9800598013.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Yang Hao, Zhou Joey Tianyi, Zhang Yu, Gao Bin-Bin, Wu Jianxin, and Cai Jianfei. 2016. Exploit bounding box annotations for multi-label object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 280288.Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Yazici Vacit Oguz, Gonzalez-Garcia Abel, Ramisa Arnau, Twardowski Bartlomiej, and Weijer Joost van de. 2020. Orderless recurrent models for multi-label classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1344013449.Google ScholarGoogle Scholar
  62. [62] Ye Jin, He Junjun, Peng Xiaojiang, Wu Wenhao, and Qiao Yu. 2020. Attention-driven dynamic graph convolutional network for multi-label image recognition. In Proceedings of the 16th European Conference on Computer Vision. Springer, 649665.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. [63] You Renchun, Guo Zhiyao, Cui Lei, Long Xiang, Bao Yingze, and Wen Shilei. 2020. Cross-modality attention with semantic graph embedding for multi-label classification. In Proceedings of the AAAI Conference on Artificial Intelligence. 1270912716.Google ScholarGoogle ScholarCross RefCross Ref
  64. [64] Yu Wan-Jin, Chen Zhen-Duo, Luo Xin, Liu Wu, and Xu Xin-Shun. 2019. DELTA: A deep dual-stream network for multi-label image classification. Pattern Recognition 91, C (2019), 322331.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. [65] Yuan Kun, Guo Shaopeng, Liu Ziwei, Zhou Aojun, Yu Fengwei, and Wu Wei. 2021. Incorporating convolution designs into visual transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 579–588.Google ScholarGoogle Scholar
  66. [66] Zhang Junjie, Wu Qi, Shen Chunhua, Zhang Jian, and Lu Jianfeng. 2018. Multilabel image classification with regional latent semantic dependencies. IEEE Transactions on Multimedia 20, 10 (2018), 28012813.Google ScholarGoogle ScholarCross RefCross Ref
  67. [67] Zhao Haiying, Zhou Wei, Hou Xiaogang, and Zhu Hui. 2020. Double attention for multi-label image classification. IEEE Access 8 (2020), 225539225550.Google ScholarGoogle ScholarCross RefCross Ref
  68. [68] Zhao Jiawei, Yan Ke, Zhao Yifan, Guo Xiaowei, Huang Feiyue, and Li Jia. 2021. Transformer-based dual relation graph for multi-label image recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 163172.Google ScholarGoogle ScholarCross RefCross Ref
  69. [69] Zhao Lichen, Guo Jinyang, Xu Dong, and Sheng Lu. 2021. Transformer3D-Det: Improving 3D object detection by vote refinement. IEEE Transactions on Circuits and Systems for Video Technology 31, 12 (2021), 47354746.Google ScholarGoogle ScholarCross RefCross Ref
  70. [70] Zheng Sixiao, Lu Jiachen, Zhao Hengshuang, Zhu Xiatian, Luo Zekun, Wang Yabiao, Fu Yanwei, Feng Jianfeng, Xiang Tao, Torr Philip H. S., and Li Zhang. 2020. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6881–6890.Google ScholarGoogle Scholar
  71. [71] Zhou Fengtao, Huang Sheng, Liu Bo, and Yang Dan. 2021. Multi-label image classification via category prototype compositional learning. IEEE Transactions on Circuits and Systems for Video Technology 32, 7 (2021), 4513–4525.Google ScholarGoogle Scholar
  72. [72] Zhou Wei, Xia Zhiwu, Dou Peng, Su Tao, and Hu Haifeng. 2022. Double attention based on graph attention network for image multi-label classification. ACM Transactions on Multimedia Computing, Communications, and Applications (2022). Retrieved from Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. [73] Zhu Feng, Li Hongsheng, Ouyang Wanli, Yu Nenghai, and Wang Xiaogang. 2017. Learning spatial regularization with image-level supervisions for multi-label image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 55135522.Google ScholarGoogle ScholarCross RefCross Ref
  74. [74] Zhu Ke and Wu Jianxin. 2021. Residual attention: A simple but effective method for multi-label recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 184193.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Aligning Image Semantics and Label Concepts for Image Multi-Label Classification

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 2
        March 2023
        540 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3572860
        • Editor:
        • Abdulmotaleb El Saddik
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 6 February 2023
        • Online AM: 21 July 2022
        • Accepted: 19 July 2022
        • Revised: 7 July 2022
        • Received: 28 February 2022
        Published in tomm Volume 19, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!