skip to main content
research-article

Fine-grained Image Classification via Multi-scale Selective Hierarchical Biquadratic Pooling

Authors Info & Claims
Published:25 January 2022Publication History
Skip Abstract Section

Abstract

How to extract distinctive features greatly challenges the fine-grained image classification tasks. In previous models, bilinear pooling has been frequently adopted to address this problem. However, most bilinear pooling models neglect either intra or inter layer feature interaction. This insufficient interaction brings in the loss of discriminative information. In this article, we devise a novel fine-grained image classification approach named Multi-scale Selective Hierarchical biQuadratic Pooling (MSHQP). The proposed biquadratic pooling simultaneously models intra and inter layer feature interactions and enhances part response by integrating multi-layer features. The subsequent coarse-to-fine multi-scale interaction structure captures the complementary information within features. Finally, the active interaction selection module adaptively learns the optimal interaction subset for a specific dataset. Consequently, we obtain a robust image representation with coarse-to-fine semantics. We conduct experiments on five benchmark datasets. The experimental results demonstrate that MSHQP achieves competitive or even match the state-of-the-art methods in terms of both accuracy and computational efficiency, with 89.0%, 94.9%, 93.4%, 90.4%, and 91.5% top-1 classification accuracy on CUB-200-2011, Stanford-Cars, FGVC-Aircraft, Stanford-Dog, and VegFru, respectively.

REFERENCES

  1. [1] Amin Saadullah, Varanasi Stalin, Dunfield Katherine Ann, and Neumann Günter. 2020. LowFER: Low-rank bilinear pooling for link prediction. In Proceedings of the ICML. PMLR, 257268.Google ScholarGoogle Scholar
  2. [2] Cai Sijia, Zuo Wangmeng, and Zhang Lei. 2017. Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In Proceedings of the ICCV. 511520.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Chang Dongliang, Ding Yifeng, Xie Jiyang, Bhunia Ayan Kumar, Li Xiaoxu, Ma Zhanyu, Wu Ming, Guo Jun, and Song Yi-Zhe. 2020. The devil is in the channels: Mutual-channel loss for fine-grained image classification. IEEE Trans. Image Process. 29 (2020), 46834695.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Chen Zhineng, Ai Shanshan, and Jia Caiyan. 2019. Structure-aware deep learning for product image classification. ACM Trans. Multimedia Comput. Commun. Applic. 15, 1s (Jan. 2019), 120. DOI: https://doi.org/10.1145/3231742 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Cui Yin, Zhou Feng, Wang Jiang, Liu Xiao, Lin Yuanqing, and Belongie Serge. 2017. Kernel pooling for convolutional neural networks. In Proceedings of the CVPR. 29212930.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Do Thanh-Toan, Hoang Tuan, Tan Dang-Khoa Le, Le Huu, Nguyen Tam V., and Cheung Ngai-Man. 2019. From selective deep convolutional features to compact binary representations for image retrieval. ACM Trans. Multimedia Comput. Commun. Applic. 15, 2 (June 2019), 122. DOI: https://doi.org/10.1145/3314051 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Dubey Abhimanyu, Gupta Otkrist, Guo Pei, Raskar Ramesh, Farrell Ryan, and Naik Nikhil. 2018. Pairwise confusion for fine-grained visual classification. In Proceedings of the ECCV. 7086.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Engin Melih, Wang Lei, Zhou Luping, and Liu Xinwang. 2018. DeepKSPD: Learning kernel-matrix-based SPD representation for fine-grained image recognition. In Proceedings of the ECCV. 612627.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Gao Yang, Beijbom Oscar, Zhang Ning, and Darrell Trevor. 2016. Compact bilinear pooling. In Proceedings of the CVPR. 317326.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Gao Yu, Han Xintong, Wang Xun, Huang Weilin, and Scott Matthew. 2020. Channel interaction networks for fine-grained image categorization. In Proceedings of the AAAI. 1081810825.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Ge Weifeng, Lin Xiangru, and Yu Yizhou. 2019. Weakly supervised complementary parts models for fine-grained image classification from the bottom up. In Proceedings of the CVPR. 30343043.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Han Kai, Guo Jianyuan, Zhang Chao, and Zhu Mingjian. 2018. Attribute-aware attention model for fine-grained representation learning. In Proceedings of the ACM MM. 20402048. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the CVPR. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] He Tong, Zhang Zhi, Zhang Hang, Zhang Zhongyue, Xie Junyuan, and Li Mu. 2019. Bag of tricks for image classification with convolutional neural networks. In Proceedings of the CVPR. 558567.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] He Xiangteng and Peng Yuxin. 2017. Fine-grained image classification via combining vision and language. In Proceedings of the CVPR. 59946002.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Hou Saihui, Feng Yushan, and Wang Zilei. 2017. VegFru: A domain-specific dataset for fine-grained visual categorization. In Proceedings of the ICCV. 541549.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Huang Huaxi, Zhang Junjie, Zhang Jian, Xu Jingsong, and Wu Qiang. 2020. Low-rank pairwise alignment bilinear network for few-shot fine-grained image classification. IEEE Trans. Multimedia 23 (2020), 16661680.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Jing Yongcheng, Liu Xiao, Ding Yukang, Wang Xinchao, Ding Errui, Song Mingli, and Wen Shilei. 2020. Dynamic instance normalization for arbitrary style transfer. In Proceedings of the AAAI, Vol. 34. 43694376.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Khosla Aditya, Jayadevaprakash Nityananda, Yao Bangpeng, and Li Fei-Fei. 2011. Novel dataset for fine-grained image categorization: Stanford dogs. In Proceedings of the CVPR Workshop on Fine-grained Visual Categorization (FGVC), Vol. 2.Google ScholarGoogle Scholar
  20. [20] Kong Shu and Fowlkes Charless. 2017. Low-rank bilinear pooling for fine-grained classification. In Proceedings of the CVPR. 365374.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Krause Jonathan, Stark Michael, Deng Jia, and Fei-Fei Li. 2013. 3D object representations for fine-grained categorization. In Proceedings of the ICCV Workshops. 554561. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Li Jian, Wang Xing, Yang Baosong, Shi Shuming, Lyu Michael R., and Tu Zhaopeng. 2020. Neuron interaction based representation composition for neural machine translation. In Proceedings of the AAAI, Vol. 34. 82048211.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Li Liang, Wang Shuhui, Jiang Shuqiang, and Huang Qingming. 2018. Attentive recurrent neural network for weak-supervised multi-label image classification. In Proceedings of the ACM MM. 10921100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Li Lida, Xie Jiangtao, Li Peihua, and Zhang Lei. 2021. Detachable second-order pooling: Toward high-performance first-order networks. IEEE Trans. Neural Netw. Learn. Syst. (2021), 115. DOI: 10.1109/TNNLS.2021.3052829Google ScholarGoogle Scholar
  25. [25] Li Liang, Zhu Xinge, Hao Yiming, Wang Shuhui, Gao Xingyu, and Huang Qingming. 2019. A hierarchical CNN-RNN approach for visual emotion classification. ACM Trans. Multimedia Comput. Commun. Applic. 15, 3s (2019), 117. DOI: https://doi.org/10.1145/3359753 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Li Peihua, Xie Jiangtao, Wang Qilong, and Gao Zilin. 2018. Towards faster training of global covariance pooling networks by iterative matrix square root normalization. In Proceedings of the CVPR. 947955.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Li Xuelu and Monga Vishal. 2020. Group based deep shared feature learning for fine-grained image classification. arXiv:2004.01817 (2020).Google ScholarGoogle Scholar
  28. [28] Li Yaoyu, Yao Hantao, Zhang Tianzhu, and Xu Changsheng. 2020. Part-based structured representation learning for person re-identification. ACM Trans. Multimedia Comput. Commun. Applic. 16, 4 (Dec. 2020), 122. DOI: https://doi.org/10.1145/3412384 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Li Zhichao, Yang Yi, Liu Xiao, Zhou Feng, Wen Shilei, and Xu Wei. 2017. Dynamic computational time for visual attention. In Proceedings of the ICCV Workshops. 11991209.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Lin Tsung-Yu, RoyChowdhury Aruni, and Maji Subhransu. 2015. Bilinear CNN models for fine-grained visual recognition. In Proceedings of the ICCV. 14491457. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Liu Zhipu, Zhang Lei, and Yang Yang. 2020. Hierarchical bi-directional feature perception network for person re-identification. In Proceedings of the ACM MM. 42894298. DOI: https://doi.org/10.1145/3394171.3413689 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Luo Xiaofan, Wong Fukoeng, and Hu Haifeng. 2020. FIN: Feature integrated network for object detection. ACM Trans. Multimedia Comput. Commun. Applic. 16, 2 (May 2020), 118. DOI: https://doi.org/10.1145/3381086 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Maji Subhransu, Rahtu Esa, Kannala Juho, Blaschko Matthew, and Vedaldi Andrea. 2013. Fine-grained visual classification of aircraft. arXiv:1306.5151 (2013).Google ScholarGoogle Scholar
  34. [34] Min Shaobo, Xie Hongtao, Tian Youliang, Yao Hantao, and Zhang Yongdong. 2019. Adaptive bilinear pooling for fine-grained representation learning. In Proceedings of the ACM MM Asia. 16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Moghimi Mohammad, Belongie Serge J., Saberian Mohammad J., Yang Jian, Vasconcelos Nuno, and Li Li-Jia. 2016. Boosted convolutional neural networks. In Proceedings of the BMVC, Vol. 5. 6.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Munro Jonathan and Damen Dima. 2020. Multi-modal domain adaptation for fine-grained action recognition. In Proceedings of the CVPR. 122132.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Russakovsky Olga, Deng Jia, Su Hao, Krause Jonathan, Satheesh Sanjeev et al. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (2015), 211252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Shroff Prateek, Chen Tianlong, Wei Yunchao, and Wang Zhangyang. 2020. Focus longer to see better: Recursively refined attention for fine-grained image classification. In Proceedings of the CVPR Workshops. 868869.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Simonyan Karen and Zisserman Andrew. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  40. [40] Sun Qiule, Wang Qilong, Zhang Jianxin, and Li Peihua. 2018. Hyperlayer bilinear pooling with application to fine-grained categorization and image retrieval. Neurocomputing 282 (2018), 174183.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Tan Min, Hu Zhenfang, Wang Baoyuan, Zhao Jieyi, and Wang Yueming. 2016. Robust object recognition via weakly supervised metric and template learning. Neurocomputing 181 (2016), 96107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Tan Min, Wang Baoyuan, Wu Zhaohui, Wang Jingdong, and Pan Gang. 2016. Weakly supervised metric learning for traffic sign recognition in a LIDAR-equipped vehicle. IEEE Trans. Intell. Transport. Syst. 17, 5 (2016), 14151427. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Tan Min, Wang Guijun, Zhou Jian, Peng Zhiyou, and Zheng Meilian. 2019. Fine-grained classification via hierarchical bilinear pooling with aggregated slack mask. IEEE Access 7 (2019), 117944117953.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Tan Min, Yu Jun, Huang Qingming, and Wu Weichen. 2018. Click data guided query modeling with click propagation and sparse coding. Multimedia Tools Applic. 77, 17 (2018), 2214522158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Tan Min, Yu Jun, Yu Zhou, Gao Fei, Rui Yong, and Tao Dacheng. 2018. User-click-data-based fine-grained image recognition via weakly supervised metric learning. ACM Trans. Multimedia Comput. Commun. Applic. 14, 3 (2018), 123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Tan Min, Yu Jun, Zhang Hongyuan, Rui Yong, and Tao Dacheng. 2019. Image recognition by predicted user click feature with multidomain multitask transfer deep network. IEEE Trans. Image Process. 28, 12 (2019), 60476062.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Tan Min, Zhou Jian, Peng Zhiyou, Yu Jun, and Tang Fang. 2020. Fine-grained image classification with factorized deep user click feature. Inf. Process. Manag. 57, 3 (2020), 102186.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Wah Catherine, Branson Steve, Welinder Peter, Perona Pietro, and Belongie Serge. 2011. The caltech-ucsd birds-200-2011 dataset. (2011).Google ScholarGoogle Scholar
  49. [49] Wang Qilong, Gao Zilin, Xie Jiangtao, Zuo Wangmeng, and Li Peihua. 2018. Global gated mixture of second-order pooling for improving deep convolutional neural networks. In Proceedings of the NIPS. 12771286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Wang Qilong, Li Peihua, and Zhang Lei. 2017. G2DeNet: Global gaussian distribution embedding network and its application to visual recognition. In Proceedings of the CVPR. 27302739.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Wang Xin, Wang Xiang, Wang Chen, Bai Xiao, Wu Jing, and Hancock Edwin R.. 2019. Discriminative features matter: Multi-layer bilinear pooling for camera localization. In Proceedings of the BMVC.Google ScholarGoogle Scholar
  52. [52] Wang Zhihui, Wang Shijie, Yang Shuhui, Li Haojie, Li Jianjun, and Li Zezhou. 2020. Weakly supervised fine-grained image classification via Gaussian mixture model oriented discriminative learning. In Proceedings of the CVPR. 97499758. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Wang Zhihui, Wang Shijie, Zhang Pengbo, Li Haojie, Zhong Wei, and Li Jianjun. 2019. Weakly supervised fine-grained image classification via correlation-guided discriminative learning. In Proceedings of the ACM MM. 18511860. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Wei Xing, Zhang Yue, Gong Yihong, Zhang Jiawei, and Zheng Nanning. 2018. Grassmann pooling as compact homogeneous bilinear pooling for fine-grained visual classification. In Proceedings of the ECCV. 355370.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Wei Xiu-Shen, Luo Jian-Hao, Wu Jianxin, and Zhou Zhi-Hua. 2017. Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Trans. Image Process. 26, 6 (2017), 28682881. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Wei Xiu-Shen, Xie Chen-Wei, Wu Jianxin, and Shen Chunhua. 2018. Mask-CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recog. 76 (2018), 704714. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] Xu Qin, Mei Yiming, Liu Jinpei, and Li Chenglong. 2021. Multimodal cross-layer bilinear pooling for RGBT tracking. IEEE Trans. Multimedia (2021), 11. DOI: 10.1109/TMM.2021.3055362Google ScholarGoogle Scholar
  58. [58] Yang Ze, Luo Tiange, Wang Dong, Hu Zhiqiang, Gao Jun, and Wang Liwei. 2018. Learning to navigate for fine-grained classification. In Proceedings of the ECCV. 420435.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Yu Chaojian, Zhao Xinyi, Zheng Qi, Zhang Peng, and You Xinge. 2018. Hierarchical bilinear pooling for fine-grained visual recognition. In Proceedings of the ECCV. 574589.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Yu Jun, Tan Min, Zhang Hongyuan, Tao Dacheng, and Rui Yong. 2019. Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans. Pattern Anal. Mach. Intell. (2019), 11. DOI: 10.1109/TPAMI.2019.2932058Google ScholarGoogle Scholar
  61. [61] Yu Jun, Tao Dacheng, Wang Meng, and Rui Yong. 2015. Learning to rank using user clicks and visual features for image retrieval. IEEE Trans. Cybern. 45, 4 (2015), 767779.Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Yu Jun, Zhu Chaoyang, Zhang Jian, Huang Qingming, and Tao Dacheng. 2019. Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition. IEEE Trans. Neural Netw. Learn. Syst. 31, 2 (2019), 661674.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Yu Kaicheng and Salzmann Mathieu. 2018. Statistically-motivated second-order pooling. In Proceedings of the ECCV. 600616.Google ScholarGoogle ScholarCross RefCross Ref
  64. [64] Yu Tan, Li Xiaoyun, and Li Ping. 2021. Fast and compact bilinear pooling by shifted random Maclaurin. In Proceedings of the AAAI, Vol. 35. 32433251.Google ScholarGoogle Scholar
  65. [65] Zhang Yan, Tang Siyu, Muandet Krikamol, Jarvers Christian, and Neumann Heiko. 2019. Local temporal bilinear pooling for fine-grained action parsing. In Proceedings of the CVPR. 1200512015.Google ScholarGoogle ScholarCross RefCross Ref
  66. [66] Zheng Heliang, Fu Jianlong, Mei Tao, and Luo Jiebo. 2017. Learning multi-attention convolutional neural network for fine-grained image recognition. In Proceedings of the ICCV. 52095217.Google ScholarGoogle ScholarCross RefCross Ref
  67. [67] Zheng Heliang, Fu Jianlong, Zha Zheng-Jun, and Luo Jiebo. 2019. Learning deep bilinear transformation for fine-grained image representation. In Proceedings of the NIPS. 42774286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. [68] Zheng Xiangtao, Qi Lei, Ren Yutao, and Lu Xiaoqiang. 2020. Fine-grained visual categorization by localizing object parts with single image. IEEE Trans. Multimedia 23 (2020), 11871199.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Fine-grained Image Classification via Multi-scale Selective Hierarchical Biquadratic Pooling

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 1s
        February 2022
        352 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3505206
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 January 2022
        • Accepted: 1 October 2021
        • Revised: 1 August 2021
        • Received: 1 January 2021
        Published in tomm Volume 18, Issue 1s

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!