Abstract
Object detection models based on feature pyramid networks have made significant progress in general object detection. However, small object detection is still a challenge for the existing models. In this paper, we think that two factors in the existing feature pyramid networks inhibit the performance of small object detection. The first one is that the different feature domains of shallow and deep layer features inhibit the model performance. The second one is that the accumulation of upper layer features leads to feature aliasing effect on the lower layer features, which interferes with the representations of small object features. Therefore, we propose Unified and Enhanced Feature Pyramid Networks (UEFPN) to improve the APs and ARs of small object detection. It has the following three characteristics: (1) Using the deep features of high-resolution image and original image to form the multi-scale features of unified domain. (2) In multi-scale features fusion, we learn the importance of upper layer features with the Channel Attention Fusion module (CAF), to optimize feature aliasing effect and enhance the context information of shallow layer features. (3) UEFPN can be quickly applied to different models. The results of many experiments show that the models with UEFPN achieve significant performance improvement in small object detection compared with the baseline models.
- [1] . 2021. BARF: A new direct and cross-based binary residual feature fusion with uncertainty-aware module for medical image classification. Information Sciences 577 (2021), 353–378.Google Scholar
Digital Library
- [2] . 2021. UncertaintyFuseNet: Robust uncertainty-aware hierarchical feature fusion with ensemble Monte Carlo dropout for COVID-19 detection. arXiv preprint arXiv:2105.08590 (2021).Google Scholar
- [3] . 2018. SNIPER: Efficient multi-scale training. In Advances in Neural Information Processing Systems. 9310–9320.Google Scholar
- [4] . 2016. A unified multi-scale deep convolutional neural network for fast object detection. In European Conference on Computer Vision. Springer, 354–370.Google Scholar
Cross Ref
- [5] . 2020. Prime sample attention in object detection. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11580–11588.Google Scholar
Cross Ref
- [6] . 2019. MMDetection: Open MMLab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019).Google Scholar
- [7] . 2017. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 764–773.Google Scholar
Cross Ref
- [8] . 2021. Dynamic head: Unifying object detection heads with attentions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7373–7382.Google Scholar
Cross Ref
- [9] . 2021. Dynamic DETR: End-to-end object detection with dynamic attention. 2021 IEEE/CVF International Conference on Computer Vision. 2968–2977.Google Scholar
- [10] . 2005. Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 1. IEEE, 886–893.Google Scholar
Digital Library
- [11] . 2021. Extended feature pyramid network for small object detection. IEEE Transactions on Multimedia (2021).Google Scholar
- [12] . 2019. NAS-FPN: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7036–7045.Google Scholar
Cross Ref
- [13] . 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 1440–1448.Google Scholar
Digital Library
- [14] . 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 580–587.Google Scholar
Digital Library
- [15] . 2021. Effective fusion factor in FPN for tiny object detection. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 1160–1168.Google Scholar
Cross Ref
- [16] . 2022. Early detection of cardiovascular autonomic neuropathy: A multi-class classification model based on feature selection and deep learning feature fusion. Information Fusion 77 (2022), 70–80.Google Scholar
Digital Library
- [17] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google Scholar
Cross Ref
- [18] . 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.Google Scholar
Cross Ref
- [19] . 2015. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations.Google Scholar
- [20] . 2019. Selective kernel networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 510–519.Google Scholar
Cross Ref
- [21] . 2019. Scale-aware trident networks for object detection. In Proceedings of the IEEE International Conference on Computer Vision. 6054–6063.Google Scholar
Cross Ref
- [22] . 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2117–2125.Google Scholar
Cross Ref
- [23] . 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision. 2980–2988.Google Scholar
Cross Ref
- [24] . 2014. Microsoft COCO: Common objects in context. In European Conference on Computer Vision. 740–755.Google Scholar
Cross Ref
- [25] . 2018. Receptive field block net for accurate and fast object detection. In Proceedings of the European Conference on Computer Vision (ECCV). 385–400.Google Scholar
Digital Library
- [26] . 2018. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8759–8768.Google Scholar
Cross Ref
- [27] . 2016. SSD: Single shot multibox detector. In European Conference on Computer Vision. Springer, 21–37.Google Scholar
Cross Ref
- [28] . 2020. IPG-Net: Image pyramid guidance network for small object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1026–1027.Google Scholar
Cross Ref
- [29] . 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3431–3440.Google Scholar
Cross Ref
- [30] . 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 2 (2004), 91–110.Google Scholar
Digital Library
- [31] . 2019. Grid R-CNN. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7363–7372.Google Scholar
Cross Ref
- [32] . 2019. Autofocus: Efficient multi-scale inference. In Proceedings of the IEEE International Conference on Computer Vision. 9745–9755.Google Scholar
Cross Ref
- [33] . 2021. XAI meets mobile traffic classification: Understanding and improving multimodal deep learning architectures. IEEE Transactions on Network and Service Management 18, 4 (2021), 4225–4246.Google Scholar
Cross Ref
- [34] . 2021. A review on the attention mechanism of deep learning. Neurocomputing 452 (2021), 48–62.Google Scholar
Cross Ref
- [35] . 2019. Libra R-CNN: Towards balanced learning for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 821–830.Google Scholar
Cross Ref
- [36] . 2020. ECA-Net: Efficient channel attention for deep convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition. 11531–11539.Google Scholar
- [37] . 2020. Small-object detection in remote sensing images with end-to-end edge-enhanced GAN and object detector network. Remote Sensing 12, 9 (2020), 1432.Google Scholar
Cross Ref
- [38] . 2021. DeepCervix: A deep learning-based framework for the classification of cervical cells using hybrid deep feature fusion techniques. Computers in Biology and Medicine 136 (2021), 104649.Google Scholar
Digital Library
- [39] . 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779–788.Google Scholar
Cross Ref
- [40] . 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems 28 (2015), 91–99.Google Scholar
Digital Library
- [41] . 2015. U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-assisted Intervention. Springer, 234–241.Google Scholar
Cross Ref
- [42] . 2016. Beyond skip connections: Top-down modulation for object detection. arXiv preprint arXiv:1612.06851 (2016).Google Scholar
- [43] . 2018. An analysis of scale invariance in object detection snip. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3578–3587.Google Scholar
Cross Ref
- [44] . 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision. 3–19.Google Scholar
Digital Library
- [45] . 2019. FreeAnchor: Learning to match anchors for visual object detection. In Advances in Neural Information Processing Systems. 147–155.Google Scholar
- [46] . 2020. Scale match for tiny person detection. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 1257–1265.Google Scholar
Cross Ref
- [47] . 2022. An attention-based hybrid deep learning framework integrating brain connectivity and activity of resting-state functional MRI data. Medical Image Analysis 78 (2022), 102413.Google Scholar
Cross Ref
- [48] . 2019. Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9308–9316.Google Scholar
Cross Ref
- [49] . 2019. Attention CoupleNet: Fully convolutional attention coupling network for object detection. IEEE Transactions on Image Processing 28 (2019), 113–126.Google Scholar
Digital Library
Index Terms
UEFPN: Unified and Enhanced Feature Pyramid Networks for Small Object Detection
Recommendations
Complementary Feature Pyramid Network for Object Detection
The way of constructing a robust feature pyramid is crucial for object detection. However, existing feature pyramid methods, which aggregate multi-level features by using element-wise sum or concatenation, are inefficient to construct a robust feature ...
Enhanced semantic feature pyramid network for small object detection
AbstractFeature-pyramid network-based models, which progressively fuse multi-scale features, have been proven highly effective in object detection. However, these models often learn multi-scale features with ambiguous boundaries, due to small ...
Highlights- We propose an efficient feature pyramid network to improve the semanticity of feature fusion.
A recursive attention-enhanced bidirectional feature pyramid network for small object detection
AbstractSingle Shot MultiBox Detector (SSD) method shows outstanding performance by using multiscale feature maps in object detection task. However, the SSD method exhibits low accuracy in small object detection. In this paper, A Recursive Attention-...






Comments