Abstract
The task of image multi-label classification is to accurately recognize multiple objects in an input image. Most of the recent works need to leverage the label co-occurrence matrix counted from training data to construct the graph structure, which are inflexible and may degrade model generalizability. In addition, these methods fail to capture the semantic correlation between the channel feature maps to further improve model performance. To address these issues, we propose DA-GAT (a Double Attention framework based on the Graph Attention neTwork) to effectively learn the correlation between labels from training data. First, we devise a new channel attention mechanism to enhance the semantic correlation between channel feature maps, so as to implicitly capture the correlation between labels. Second, we propose a new label attention mechanism to avoid the adverse impact of a manually constructed label co-occurrence matrix. It only needs to leverage the label embedding as the input of network, then automatically constructs the label relation matrix to explicitly establish the correlation between labels. Finally, we effectively fuse the output of these two attention mechanisms to further improve model performance. Extensive experiments are conducted on three public multi-label classification benchmarks. Our DA-GAT model achieves mean average precision of 87.1%, 96.6%, and 64.3% on MS-COCO 2014, PASCAL VOC 2007, and NUS-WIDE, respectively, and obviously outperforms other existing state-of-the-art methods. In addition, visual analysis experiments demonstrate that each attention mechanism can capture the correlation between labels well and significantly promote the model performance.
- [1] . 2020. Asymmetric loss for multi-label classification. arXiv preprint arXiv:2009.14119 (2020).Google Scholar
- [2] . 2020. Semi-supervised robust deep neural networks for multi-label classification. Pattern Recognition 100 (2020), 107164.Google Scholar
- [3] . 2022. Knowledge-guided multi-label few-shot learning for general image recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (2022), 1371–1384.Google Scholar
- [4] . 2019. Learning semantic-specific graph representation for multi-label image recognition. In Proceedings of the IEEE International Conference on Computer Vision. 522–531.Google Scholar
Cross Ref
- [5] . 2019. Multi-label image recognition with joint class-aware map disentangling and label correlation embedding. In Proceedings of the 2019 IEEE International Conference on Multimedia and Expo (ICME’19). IEEE, Los Alamitos, CA, 622–627.Google Scholar
Cross Ref
- [6] . 2019. Multi-label image recognition with graph convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5177–5186.Google Scholar
Cross Ref
- [7] . 2009. NUS-WIDE: A real-world web image database from National University of Singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval. 1–9.Google Scholar
Digital Library
- [8] . 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 248–255.Google Scholar
Cross Ref
- [9] . 2019. Selective sparse sampling for fine-grained image recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6599–6608.Google Scholar
Cross Ref
- [10] . 2017. WILDCAT: Weakly supervised learning of deep ConvNets for image classification, pointwise localization and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 642–651.Google Scholar
Cross Ref
- [11] . 2018. Exploiting negative evidence for deep latent structured models. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 2 (2018), 337–351.Google Scholar
Digital Library
- [12] . 2010. The Pascal Visual Object Classes (VOC) challenge. International Journal of Computer Vision 88, 2 (2010), 303–338.Google Scholar
Digital Library
- [13] . 2019. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3146–3154.Google Scholar
Cross Ref
- [14] . 2020. Multi-label image recognition with multi-class attentional regions. arXiv preprint arXiv:2007.01755 (2020).Google Scholar
- [15] . 2019. Visual attention consistency under image transforms for multi-label image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 729–739.Google Scholar
Cross Ref
- [16] . 2021. Visual semantic-based representation learning using deep CNNs for scene recognition. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 2 (2021), 1–24.Google Scholar
Digital Library
- [17] . 2021. Learning discriminative representations for multi-label image recognition. arXiv preprint arXiv:2107.11159 (2021).Google Scholar
- [18] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google Scholar
Cross Ref
- [19] . 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.Google Scholar
Cross Ref
- [20] . 2016. Annotation order matters: Recurrent image annotator for arbitrary length image tagging. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR’16). IEEE, Los Alamitos, CA, 2452–2457.Google Scholar
- [21] . 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).Google Scholar
- [22] . 2020. Multi-scale cross-modal spatial attention fusion for multi-label image recognition. In Proceedings of the International Conference on Artificial Neural Networks. 736–747.Google Scholar
Digital Library
- [23] . 2019. A hierarchical CNN-RNN approach for visual emotion classification. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 3s (2019), 1–17.Google Scholar
Digital Library
- [24] . 2019. Learning category correlations for multi-label image recognition with graph networks. arXiv preprint arXiv:1909.13005 (2019).Google Scholar
- [25] . 2021. Towards efficient scene understanding via squeeze reasoning. IEEE Transactions on Image Processing 30 (2021), 7050–7063.Google Scholar
Digital Library
- [26] . 2021. A semi-supervised learning approach based on adaptive weighted fusion for automatic image annotation. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 1 (2021), 1–23.Google Scholar
Digital Library
- [27] . 2020. Spatial preserved graph convolution networks for person re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 1s (2020), 1–14.Google Scholar
Digital Library
- [28] . 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision. 740–755.Google Scholar
Cross Ref
- [29] . 2017. Compositional model based Fisher vector coding for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 12 (2017), 2335–2348.Google Scholar
Cross Ref
- [30] . 2021. Query2Label: A simple transformer way to multi-label classification. arXiv preprint arXiv:2107.10834 (2021).Google Scholar
- [31] . 2019. Attend and imagine: Multi-label image classification with visual attention and recurrent neural networks. IEEE Transactions on Multimedia 21, 8 (2019), 1971–1981.Google Scholar
Cross Ref
- [32] . 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9 (Nov. 2008), 2579–2605.Google Scholar
- [33] . 2019. Multi-label image classification with attention mechanism and graph convolutional networks. In Proceedings of the ACM Multimedia Asia Conference (MMAsia’19). Article 41, 6 pages.Google Scholar
Digital Library
- [34] . 2021. Modular graph transformer networks for multi-label image classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 9092–9100.Google Scholar
Cross Ref
- [35] . 2017. Automatic differentiation in PyTorch. In Proceedings of the31st Conference on Neural Information Processing Systems (NIPS’17). 1–4.Google Scholar
- [36] . 2018. Learning human-object interactions by graph parsing neural networks. In Proceedings of the European Conference on Computer Vision (ECCV’18). 401–417.Google Scholar
Digital Library
- [37] . 2021. Knowledge-aware multi-modal adaptive graph convolutional networks for fake news detection. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 3 (2021), 1–23.Google Scholar
Digital Library
- [38] . 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 618–626.Google Scholar
Cross Ref
- [39] . 2020. Design, analysis, and implementation of efficient framework for image annotation. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 3 (2020), 1–24.Google Scholar
Digital Library
- [40] . 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998–6008.Google Scholar
- [41] . 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).Google Scholar
- [42] . 2020. Privacy-preserving visual content tagging using graph transformer networks. In Proceedings of the 28th ACM International Conference on Multimedia. 2299–2307.Google Scholar
Digital Library
- [43] . 2016. CNN-RNN: A unified framework for multi-label image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2285–2294.Google Scholar
Cross Ref
- [44] . 2020. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11534–11542.Google Scholar
Cross Ref
- [45] . 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7794–7803.Google Scholar
Cross Ref
- [46] . 2020. Deep multimodal fusion by channel exchanging. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS’20). 1–11.Google Scholar
- [47] . 2020. R-Net: A relationship network for efficient and accurate scene text detection. IEEE Transactions on Multimedia 23 (2020), 1316–1329.Google Scholar
Digital Library
- [48] . 2020. Fast graph convolution network based multi-label image recognition via cross-modal fusion. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management. 1575–1584.Google Scholar
Digital Library
- [49] . 2017. Multi-label image recognition by recurrently discovering attentional regions. In Proceedings of the IEEE International Conference on Computer Vision. 464–472.Google Scholar
Cross Ref
- [50] . 2020. Learning dual semantic relations with graph attention for image-text matching. IEEE Transactions on Circuits and Systems for Video Technology PP, 99 (2020), 1.Google Scholar
- [51] . 2021. Multilabel image classification via feature/label co-projection. IEEE Transactions on Systems, Man, and Cybernetics: Systems 51, 11 (2021), 7250–7259.Google Scholar
Cross Ref
- [52] . 2020. AdaHGNN: Adaptive hypergraph neural networks for multi-label image classification. In Proceedings of the 28th ACM International Conference on Multimedia. 284–293.Google Scholar
Digital Library
- [53] . 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1492–1500.Google Scholar
Cross Ref
- [54] . 2019. Multi-label image classification by feature attention network. IEEE Access 7 (2019), 98005–98013.Google Scholar
Cross Ref
- [55] . 2020. Attention-driven dynamic graph convolutional network for multi-label image recognition. In Proceedings of the European Conference on Computer Vision. 649–665.Google Scholar
Digital Library
- [56] . 2020. Cross-modality attention with semantic graph embedding for multi-label classification. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’20). 12709–12716.Google Scholar
Cross Ref
- [57] . 2019. DELTA: A deep dual-stream network for multi-label image classification. Pattern Recognition 91 (2019), 322–331.Google Scholar
Digital Library
- [58] . 2018. Multilabel image classification with regional latent semantic dependencies. IEEE Transactions on Multimedia 20, 10 (2018), 2801–2813.Google Scholar
Cross Ref
- [59] . 2020. Adaptive graph convolutional network with attention graph clustering for co-saliency detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9050–9059.Google Scholar
Cross Ref
- [60] . 2020. Double attention for multi-label image classification. IEEE Access 8 (2020), 225539–225550.Google Scholar
Cross Ref
- [61] . 2020. Deep semantic dictionary learning for multi-label image classification. arXiv preprint arXiv:2012.12509 (2020).Google Scholar
- [62] . 2017. Learning spatial regularization with image-level supervisions for multi-label image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5513–5522.Google Scholar
Cross Ref
Index Terms
Double Attention Based on Graph Attention Network for Image Multi-Label Classification
Recommendations
Aligning Image Semantics and Label Concepts for Image Multi-Label Classification
Image multi-label classification task is mainly to correctly predict multiple object categories in the images. To capture the correlation between labels, graph convolution network based methods have to manually count the label co-occurrence probability ...
Semi-supervised multi-label classification using incomplete label information
Highlights- An inductive semi-supervised method called Smile is proposed for multi-label classification using incomplete label information.
AbstractClassifying multi-label instances using incompletely labeled instances is one of the fundamental tasks in multi-label learning. Most existing methods regard this task as supervised weak-label learning problem and assume sufficient ...
Correlated Multi-label Classification with Incomplete Label Space and Class Imbalance
Special Section on Advances in Causal Discovery and Inference and Regular PapersMulti-label classification is defined as the problem of identifying the multiple labels or categories of new observations based on labeled training data. Multi-labeled data has several challenges, including class imbalance, label correlation, incomplete ...






Comments