Abstract
How to extract distinctive features greatly challenges the fine-grained image classification tasks. In previous models, bilinear pooling has been frequently adopted to address this problem. However, most bilinear pooling models neglect either intra or inter layer feature interaction. This insufficient interaction brings in the loss of discriminative information. In this article, we devise a novel fine-grained image classification approach named Multi-scale Selective Hierarchical biQuadratic Pooling (MSHQP). The proposed biquadratic pooling simultaneously models intra and inter layer feature interactions and enhances part response by integrating multi-layer features. The subsequent coarse-to-fine multi-scale interaction structure captures the complementary information within features. Finally, the active interaction selection module adaptively learns the optimal interaction subset for a specific dataset. Consequently, we obtain a robust image representation with coarse-to-fine semantics. We conduct experiments on five benchmark datasets. The experimental results demonstrate that MSHQP achieves competitive or even match the state-of-the-art methods in terms of both accuracy and computational efficiency, with 89.0%, 94.9%, 93.4%, 90.4%, and 91.5% top-1 classification accuracy on CUB-200-2011, Stanford-Cars, FGVC-Aircraft, Stanford-Dog, and VegFru, respectively.
- [1] . 2020. LowFER: Low-rank bilinear pooling for link prediction. In Proceedings of the ICML. PMLR, 257–268.Google Scholar
- [2] . 2017. Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In Proceedings of the ICCV. 511–520.Google Scholar
Cross Ref
- [3] . 2020. The devil is in the channels: Mutual-channel loss for fine-grained image classification. IEEE Trans. Image Process. 29 (2020), 4683–4695.Google Scholar
Digital Library
- [4] . 2019. Structure-aware deep learning for product image classification. ACM Trans. Multimedia Comput. Commun. Applic. 15, 1s (
Jan. 2019), 1–20.DOI: https://doi.org/10.1145/3231742 Google ScholarDigital Library
- [5] . 2017. Kernel pooling for convolutional neural networks. In Proceedings of the CVPR. 2921–2930.Google Scholar
Cross Ref
- [6] . 2019. From selective deep convolutional features to compact binary representations for image retrieval. ACM Trans. Multimedia Comput. Commun. Applic. 15, 2 (
June 2019), 1–22.DOI: https://doi.org/10.1145/3314051 Google ScholarDigital Library
- [7] . 2018. Pairwise confusion for fine-grained visual classification. In Proceedings of the ECCV. 70–86.Google Scholar
Cross Ref
- [8] . 2018. DeepKSPD: Learning kernel-matrix-based SPD representation for fine-grained image recognition. In Proceedings of the ECCV. 612–627.Google Scholar
Cross Ref
- [9] . 2016. Compact bilinear pooling. In Proceedings of the CVPR. 317–326.Google Scholar
Cross Ref
- [10] . 2020. Channel interaction networks for fine-grained image categorization. In Proceedings of the AAAI. 10818–10825.Google Scholar
Cross Ref
- [11] . 2019. Weakly supervised complementary parts models for fine-grained image classification from the bottom up. In Proceedings of the CVPR. 3034–3043.Google Scholar
Cross Ref
- [12] . 2018. Attribute-aware attention model for fine-grained representation learning. In Proceedings of the ACM MM. 2040–2048. Google Scholar
Digital Library
- [13] . 2016. Deep residual learning for image recognition. In Proceedings of the CVPR. 770–778.Google Scholar
Cross Ref
- [14] . 2019. Bag of tricks for image classification with convolutional neural networks. In Proceedings of the CVPR. 558–567.Google Scholar
Cross Ref
- [15] . 2017. Fine-grained image classification via combining vision and language. In Proceedings of the CVPR. 5994–6002.Google Scholar
Cross Ref
- [16] . 2017. VegFru: A domain-specific dataset for fine-grained visual categorization. In Proceedings of the ICCV. 541–549.Google Scholar
Cross Ref
- [17] . 2020. Low-rank pairwise alignment bilinear network for few-shot fine-grained image classification. IEEE Trans. Multimedia 23 (2020), 1666–1680.Google Scholar
Digital Library
- [18] . 2020. Dynamic instance normalization for arbitrary style transfer. In Proceedings of the AAAI, Vol. 34. 4369–4376.Google Scholar
Cross Ref
- [19] . 2011. Novel dataset for fine-grained image categorization: Stanford dogs. In Proceedings of the CVPR Workshop on Fine-grained Visual Categorization (FGVC), Vol. 2.Google Scholar
- [20] . 2017. Low-rank bilinear pooling for fine-grained classification. In Proceedings of the CVPR. 365–374.Google Scholar
Cross Ref
- [21] . 2013. 3D object representations for fine-grained categorization. In Proceedings of the ICCV Workshops. 554–561. Google Scholar
Digital Library
- [22] . 2020. Neuron interaction based representation composition for neural machine translation. In Proceedings of the AAAI, Vol. 34. 8204–8211.Google Scholar
Cross Ref
- [23] . 2018. Attentive recurrent neural network for weak-supervised multi-label image classification. In Proceedings of the ACM MM. 1092–1100. Google Scholar
Digital Library
- [24] . 2021. Detachable second-order pooling: Toward high-performance first-order networks. IEEE Trans. Neural Netw. Learn. Syst. (2021), 1–15.
DOI: 10.1109/TNNLS.2021.3052829Google Scholar - [25] . 2019. A hierarchical CNN-RNN approach for visual emotion classification. ACM Trans. Multimedia Comput. Commun. Applic. 15, 3s (2019), 1–17.
DOI: https://doi.org/10.1145/3359753 Google ScholarDigital Library
- [26] . 2018. Towards faster training of global covariance pooling networks by iterative matrix square root normalization. In Proceedings of the CVPR. 947–955.Google Scholar
Cross Ref
- [27] . 2020. Group based deep shared feature learning for fine-grained image classification. arXiv:2004.01817 (2020).Google Scholar
- [28] . 2020. Part-based structured representation learning for person re-identification. ACM Trans. Multimedia Comput. Commun. Applic. 16, 4 (
Dec. 2020), 1–22.DOI: https://doi.org/10.1145/3412384 Google ScholarDigital Library
- [29] . 2017. Dynamic computational time for visual attention. In Proceedings of the ICCV Workshops. 1199–1209.Google Scholar
Cross Ref
- [30] . 2015. Bilinear CNN models for fine-grained visual recognition. In Proceedings of the ICCV. 1449–1457. Google Scholar
Digital Library
- [31] . 2020. Hierarchical bi-directional feature perception network for person re-identification. In Proceedings of the ACM MM. 4289–4298.
DOI: https://doi.org/10.1145/3394171.3413689 Google ScholarDigital Library
- [32] . 2020. FIN: Feature integrated network for object detection. ACM Trans. Multimedia Comput. Commun. Applic. 16, 2 (
May 2020), 1–18.DOI: https://doi.org/10.1145/3381086 Google ScholarDigital Library
- [33] . 2013. Fine-grained visual classification of aircraft. arXiv:1306.5151 (2013).Google Scholar
- [34] . 2019. Adaptive bilinear pooling for fine-grained representation learning. In Proceedings of the ACM MM Asia. 1–6. Google Scholar
Digital Library
- [35] . 2016. Boosted convolutional neural networks. In Proceedings of the BMVC, Vol. 5. 6.Google Scholar
Cross Ref
- [36] . 2020. Multi-modal domain adaptation for fine-grained action recognition. In Proceedings of the CVPR. 122–132.Google Scholar
Cross Ref
- [37] . 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (2015), 211–252. Google Scholar
Digital Library
- [38] . 2020. Focus longer to see better: Recursively refined attention for fine-grained image classification. In Proceedings of the CVPR Workshops. 868–869.Google Scholar
Cross Ref
- [39] . 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014).Google Scholar
- [40] . 2018. Hyperlayer bilinear pooling with application to fine-grained categorization and image retrieval. Neurocomputing 282 (2018), 174–183.Google Scholar
Digital Library
- [41] . 2016. Robust object recognition via weakly supervised metric and template learning. Neurocomputing 181 (2016), 96–107. Google Scholar
Digital Library
- [42] . 2016. Weakly supervised metric learning for traffic sign recognition in a LIDAR-equipped vehicle. IEEE Trans. Intell. Transport. Syst. 17, 5 (2016), 1415–1427. Google Scholar
Digital Library
- [43] . 2019. Fine-grained classification via hierarchical bilinear pooling with aggregated slack mask. IEEE Access 7 (2019), 117944–117953.Google Scholar
Cross Ref
- [44] . 2018. Click data guided query modeling with click propagation and sparse coding. Multimedia Tools Applic. 77, 17 (2018), 22145–22158. Google Scholar
Digital Library
- [45] . 2018. User-click-data-based fine-grained image recognition via weakly supervised metric learning. ACM Trans. Multimedia Comput. Commun. Applic. 14, 3 (2018), 1–23. Google Scholar
Digital Library
- [46] . 2019. Image recognition by predicted user click feature with multidomain multitask transfer deep network. IEEE Trans. Image Process. 28, 12 (2019), 6047–6062.Google Scholar
Digital Library
- [47] . 2020. Fine-grained image classification with factorized deep user click feature. Inf. Process. Manag. 57, 3 (2020), 102186.Google Scholar
Digital Library
- [48] . 2011. The caltech-ucsd birds-200-2011 dataset. (2011).Google Scholar
- [49] . 2018. Global gated mixture of second-order pooling for improving deep convolutional neural networks. In Proceedings of the NIPS. 1277–1286. Google Scholar
Digital Library
- [50] . 2017. G2DeNet: Global gaussian distribution embedding network and its application to visual recognition. In Proceedings of the CVPR. 2730–2739.Google Scholar
Cross Ref
- [51] . 2019. Discriminative features matter: Multi-layer bilinear pooling for camera localization. In Proceedings of the BMVC.Google Scholar
- [52] . 2020. Weakly supervised fine-grained image classification via Gaussian mixture model oriented discriminative learning. In Proceedings of the CVPR. 9749–9758. Google Scholar
Digital Library
- [53] . 2019. Weakly supervised fine-grained image classification via correlation-guided discriminative learning. In Proceedings of the ACM MM. 1851–1860. Google Scholar
Digital Library
- [54] . 2018. Grassmann pooling as compact homogeneous bilinear pooling for fine-grained visual classification. In Proceedings of the ECCV. 355–370.Google Scholar
Cross Ref
- [55] . 2017. Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Trans. Image Process. 26, 6 (2017), 2868–2881. Google Scholar
Digital Library
- [56] . 2018. Mask-CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recog. 76 (2018), 704–714. Google Scholar
Digital Library
- [57] . 2021. Multimodal cross-layer bilinear pooling for RGBT tracking. IEEE Trans. Multimedia (2021), 1–1.
DOI: 10.1109/TMM.2021.3055362Google Scholar - [58] . 2018. Learning to navigate for fine-grained classification. In Proceedings of the ECCV. 420–435.Google Scholar
Cross Ref
- [59] . 2018. Hierarchical bilinear pooling for fine-grained visual recognition. In Proceedings of the ECCV. 574–589.Google Scholar
Cross Ref
- [60] . 2019. Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans. Pattern Anal. Mach. Intell. (2019), 1–1.
DOI: 10.1109/TPAMI.2019.2932058Google Scholar - [61] . 2015. Learning to rank using user clicks and visual features for image retrieval. IEEE Trans. Cybern. 45, 4 (2015), 767–779.Google Scholar
Cross Ref
- [62] . 2019. Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition. IEEE Trans. Neural Netw. Learn. Syst. 31, 2 (2019), 661–674.Google Scholar
Cross Ref
- [63] . 2018. Statistically-motivated second-order pooling. In Proceedings of the ECCV. 600–616.Google Scholar
Cross Ref
- [64] . 2021. Fast and compact bilinear pooling by shifted random Maclaurin. In Proceedings of the AAAI, Vol. 35. 3243–3251.Google Scholar
- [65] . 2019. Local temporal bilinear pooling for fine-grained action parsing. In Proceedings of the CVPR. 12005–12015.Google Scholar
Cross Ref
- [66] . 2017. Learning multi-attention convolutional neural network for fine-grained image recognition. In Proceedings of the ICCV. 5209–5217.Google Scholar
Cross Ref
- [67] . 2019. Learning deep bilinear transformation for fine-grained image representation. In Proceedings of the NIPS. 4277–4286. Google Scholar
Digital Library
- [68] . 2020. Fine-grained visual categorization by localizing object parts with single image. IEEE Trans. Multimedia 23 (2020), 1187–1199.Google Scholar
Cross Ref
Index Terms
Fine-grained Image Classification via Multi-scale Selective Hierarchical Biquadratic Pooling
Recommendations
Multilayer feature fusion with parallel convolutional block for fine-grained image classification
AbstractFine-grained image classification aims at classifying the image subclass under a certain category. It is a challenging task due to the similar features, different gestures and background interference of the images. A key issue in fine-grained ...
Fine-grained visual classification via multilayer bilinear pooling with object localization
AbstractFine-grained visual classification is a challenging task in the computer vision field. How to explore discriminative features is vital for classification. As one crucial step, exactly object localization is able to eliminate the background noises ...
Hierarchical image feature extraction and classification
MM '10: Proceedings of the 18th ACM international conference on MultimediaIn the field of machine learning and pattern recognition, an alternative to conventional classification is hierarchical classification that exploits hierarchical relations between concepts of interest. To the best of our knowledge, all hierarchical ...






Comments