skip to main content
research-article

UEFPN: Unified and Enhanced Feature Pyramid Networks for Small Object Detection

Published:17 February 2023Publication History
Skip Abstract Section

Abstract

Object detection models based on feature pyramid networks have made significant progress in general object detection. However, small object detection is still a challenge for the existing models. In this paper, we think that two factors in the existing feature pyramid networks inhibit the performance of small object detection. The first one is that the different feature domains of shallow and deep layer features inhibit the model performance. The second one is that the accumulation of upper layer features leads to feature aliasing effect on the lower layer features, which interferes with the representations of small object features. Therefore, we propose Unified and Enhanced Feature Pyramid Networks (UEFPN) to improve the APs and ARs of small object detection. It has the following three characteristics: (1) Using the deep features of high-resolution image and original image to form the multi-scale features of unified domain. (2) In multi-scale features fusion, we learn the importance of upper layer features with the Channel Attention Fusion module (CAF), to optimize feature aliasing effect and enhance the context information of shallow layer features. (3) UEFPN can be quickly applied to different models. The results of many experiments show that the models with UEFPN achieve significant performance improvement in small object detection compared with the baseline models.

REFERENCES

  1. [1] Abdar Moloud, Fahami Mohammad Amin, Chakrabarti Satarupa, Khosravi Abbas, Pławiak Paweł, Acharya U. Rajendra, Tadeusiewicz Ryszard, and Nahavandi Saeid. 2021. BARF: A new direct and cross-based binary residual feature fusion with uncertainty-aware module for medical image classification. Information Sciences 577 (2021), 353378.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Abdar Moloud, Salari Soorena, Qahremani Sina, Lam Hak-Keung, Karray Fakhri, Hussain Sadiq, Khosravi Abbas, Acharya U. Rajendra, and Nahavandi Saeid. 2021. UncertaintyFuseNet: Robust uncertainty-aware hierarchical feature fusion with ensemble Monte Carlo dropout for COVID-19 detection. arXiv preprint arXiv:2105.08590 (2021).Google ScholarGoogle Scholar
  3. [3] Singh Bharat, Mahyar Najibi, and Larry S. Davis. 2018. SNIPER: Efficient multi-scale training. In Advances in Neural Information Processing Systems. 93109320.Google ScholarGoogle Scholar
  4. [4] Cai Zhaowei, Fan Quanfu, Feris Rogerio S., and Vasconcelos Nuno. 2016. A unified multi-scale deep convolutional neural network for fast object detection. In European Conference on Computer Vision. Springer, 354370.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Cao Yuhang, Chen Kai, Loy Chen Change, and Lin Dahua. 2020. Prime sample attention in object detection. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1158011588.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Chen Kai, Wang Jiaqi, Pang Jiangmiao, Cao Yuhang, Xiong Yu, Li Xiaoxiao, Sun Shuyang, Feng Wansen, Liu Ziwei, Xu Jiarui, et al. 2019. MMDetection: Open MMLab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019).Google ScholarGoogle Scholar
  7. [7] Dai Jifeng, Qi Haozhi, Xiong Yuwen, Li Yi, Zhang Guodong, Hu Han, and Wei Yichen. 2017. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 764773.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Dai Xiyang, Chen Yinpeng, Xiao Bin, Chen Dongdong, Liu Mengchen, Yuan Lu, and Zhang Lei. 2021. Dynamic head: Unifying object detection heads with attentions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 73737382.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Dai Xiyang, Chen Yinpeng, Yang Jianwei, Zhang Pengchuan, Yuan Lu, and Zhang Lei. 2021. Dynamic DETR: End-to-end object detection with dynamic attention. 2021 IEEE/CVF International Conference on Computer Vision. 29682977.Google ScholarGoogle Scholar
  10. [10] Dalal Navneet and Triggs Bill. 2005. Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 1. IEEE, 886893.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Deng Chunfang, Wang Mengmeng, Liu Liang, Liu Yong, and Jiang Yunliang. 2021. Extended feature pyramid network for small object detection. IEEE Transactions on Multimedia (2021).Google ScholarGoogle Scholar
  12. [12] Ghiasi Golnaz, Lin Tsung-Yi, and Le Quoc V.. 2019. NAS-FPN: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 70367045.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Girshick Ross. 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 14401448.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Girshick Ross, Donahue Jeff, Darrell Trevor, and Malik Jitendra. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 580587.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Gong Yuqi, Yu Xuehui, Ding Yao, Peng Xiaoke, Zhao Jian, and Han Zhenjun. 2021. Effective fusion factor in FPN for tiny object detection. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 11601168.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Hassan Md. Rafiul, Huda Shamsul, Hassan Mohammad Mehedi, Abawajy Jemal, Alsanad Ahmed, and Fortino Giancarlo. 2022. Early detection of cardiovascular autonomic neuropathy: A multi-class classification model based on feature selection and deep learning feature fusion. Information Fusion 77 (2022), 7080.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Hu Jie, Shen Li, and Sun Gang. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 71327141.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Simonyan Karen and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  20. [20] Li Xiang, Wang Wenhai, Hu Xiaolin, and Yang Jian. 2019. Selective kernel networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 510519.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Li Yanghao, Chen Yuntao, Wang Naiyan, and Zhang Zhaoxiang. 2019. Scale-aware trident networks for object detection. In Proceedings of the IEEE International Conference on Computer Vision. 60546063.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Lin Tsung-Yi, Dollár Piotr, Girshick Ross, He Kaiming, Hariharan Bharath, and Belongie Serge. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 21172125.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Lin Tsung-Yi, Goyal Priya, Girshick Ross, He Kaiming, and Dollár Piotr. 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision. 29802988.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Lin Tsung-Yi, Maire Michael, Belongie Serge, Hays James, Perona Pietro, Ramanan Deva, Dollár Piotr, and Zitnick C. Lawrence. 2014. Microsoft COCO: Common objects in context. In European Conference on Computer Vision. 740755.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Liu Songtao, Huang Di, et al. 2018. Receptive field block net for accurate and fast object detection. In Proceedings of the European Conference on Computer Vision (ECCV). 385400.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Liu Shu, Qi Lu, Qin Haifang, Shi Jianping, and Jia Jiaya. 2018. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 87598768.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Liu Wei, Anguelov Dragomir, Erhan Dumitru, Szegedy Christian, Reed Scott, Fu Cheng-Yang, and Berg Alexander C.. 2016. SSD: Single shot multibox detector. In European Conference on Computer Vision. Springer, 2137.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Liu Ziming, Gao Guangyu, Sun Lin, and Fang Li. 2020. IPG-Net: Image pyramid guidance network for small object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 10261027.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Long Jonathan, Shelhamer Evan, and Darrell Trevor. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 34313440.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Lowe David G.. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 2 (2004), 91110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Lu Xin, Li Buyu, Yue Yuxin, Li Quanquan, and Yan Junjie. 2019. Grid R-CNN. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 73637372.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Najibi Mahyar, Singh Bharat, and Davis Larry S.. 2019. Autofocus: Efficient multi-scale inference. In Proceedings of the IEEE International Conference on Computer Vision. 97459755.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Nascita Alfredo, Montieri Antonio, Aceto Giuseppe, Ciuonzo Domenico, Persico Valerio, and Pescapé Antonio. 2021. XAI meets mobile traffic classification: Understanding and improving multimodal deep learning architectures. IEEE Transactions on Network and Service Management 18, 4 (2021), 42254246.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Niu Zhaoyang, Zhong Guoqiang, and Yu Hui. 2021. A review on the attention mechanism of deep learning. Neurocomputing 452 (2021), 4862.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Pang Jiangmiao, Chen Kai, Shi Jianping, Feng Huajun, Ouyang Wanli, and Lin Dahua. 2019. Libra R-CNN: Towards balanced learning for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 821830.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Qilong Wang, Banggu Wu, Pengfei Zhu, Peihua Li, Wangmeng Zuo, and Qinghua Hu. 2020. ECA-Net: Efficient channel attention for deep convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition. 1153111539.Google ScholarGoogle Scholar
  37. [37] Rabbi Jakaria, Ray Nilanjan, Schubert Matthias, Chowdhury Subir, and Chao Dennis. 2020. Small-object detection in remote sensing images with end-to-end edge-enhanced GAN and object detector network. Remote Sensing 12, 9 (2020), 1432.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Rahaman Md. Mamunur, Li Chen, Yao Yudong, Kulwa Frank, Wu Xiangchen, Li Xiaoyan, and Wang Qian. 2021. DeepCervix: A deep learning-based framework for the classification of cervical cells using hybrid deep feature fusion techniques. Computers in Biology and Medicine 136 (2021), 104649.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Redmon Joseph, Divvala Santosh, Girshick Ross, and Farhadi Ali. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779788.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Ren Shaoqing, He Kaiming, Girshick Ross, and Sun Jian. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems 28 (2015), 9199.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Ronneberger Olaf, Fischer Philipp, and Brox Thomas. 2015. U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-assisted Intervention. Springer, 234241.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Shrivastava Abhinav, Sukthankar Rahul, Malik Jitendra, and Gupta Abhinav. 2016. Beyond skip connections: Top-down modulation for object detection. arXiv preprint arXiv:1612.06851 (2016).Google ScholarGoogle Scholar
  43. [43] Singh Bharat and Davis Larry S.. 2018. An analysis of scale invariance in object detection snip. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 35783587.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Woo Sanghyun, Park Jongchan, Lee Joon-Young, and Kweon In So. 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision. 319.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Xiaosong Zhang, Fang Wan, Chang Liu, Rongrong Ji, and Qixiang Ye. 2019. FreeAnchor: Learning to match anchors for visual object detection. In Advances in Neural Information Processing Systems. 147155.Google ScholarGoogle Scholar
  46. [46] Yu Xuehui, Gong Yuqi, Jiang Nan, Ye Qixiang, and Han Zhenjun. 2020. Scale match for tiny person detection. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 12571265.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Zhao Min, Yan Weizheng, Luo Na, Zhi Dongmei, Fu Zening, Du Yuhui, Yu Shan, Jiang Tianzi, Calhoun Vince D., and Sui Jing. 2022. An attention-based hybrid deep learning framework integrating brain connectivity and activity of resting-state functional MRI data. Medical Image Analysis 78 (2022), 102413.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Zhu Xizhou, Hu Han, Lin Stephen, and Dai Jifeng. 2019. Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 93089316.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Zhu Yousong, Zhao Chaoyang, Guo Haiyun, Wang Jinqiao, Zhao Xu, and Lu Hanqing. 2019. Attention CoupleNet: Fully convolutional attention coupling network for object detection. IEEE Transactions on Image Processing 28 (2019), 113126.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. UEFPN: Unified and Enhanced Feature Pyramid Networks for Small Object Detection

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 2s
      April 2023
      545 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3572861
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 February 2023
      • Online AM: 8 September 2022
      • Accepted: 1 September 2022
      • Revised: 20 June 2022
      • Received: 26 February 2022
      Published in tomm Volume 19, Issue 2s

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)248
      • Downloads (Last 6 weeks)15

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!