skip to main content
research-article

FIN: Feature Integrated Network for Object Detection

Authors Info & Claims
Published:22 May 2020Publication History
Skip Abstract Section

Abstract

Multi-layer detection is a widely used method in the field of object detection. It extracts multiple feature maps with different resolutions from the backbone network to detect objects of different scales, which can effectively cope with the problem of object scale change in object detection. Although the multi-layer detection utilizes multiple detection layers to alleviate the burden of one single detection layer and can improve the detection accuracy to some extent, this method has two limitations. First, manually assigning anchor boxes of different sizes to different feature maps is too dependent on the human experience. Second, there is a semantic gap between each detection layer in multi-layer detection. The same detector needs to simultaneously process the detection layers with inconsistent semantic strength, which increases the optimization difficulty of the detector. In this article, we propose a feature integrated network (FIN) based on single layer detection to deal with the problems mentioned above. Different from the existing methods, we design a series of verification experiments based on the multi-layer detection model, which shows that the shallow high-resolution feature map has the potential to simultaneously and effectively detect objects of various scales. Considering that the semantic information of the shallow feature map is weak, we propose two modules to enhance the representation ability of the single detection layer. First, we propose a detection adaptation network (DANet) to extract powerful feature maps that are useful for object detection tasks. Second, we combine global context information and local detail information with a verified hourglass module (VHM) to generate a single feature map with high resolution and rich semantic information so that we can assign all anchor boxes to this detection layer. In our model, all the detection operations are concentrated on a high-resolution feature map whose semantic information and detailed information are enhanced as much as possible. Therefore, the proposed model can solve the problem of anchor assignment and inconsistent semantic strength between multiple detection layers mentioned above. A large number of experiments on the Pattern Analysis, Statistical Modelling and Computational Learning Visual Object Classes (PASCAL VOC) and Microsoft Common Objects in Context (MS COCO) datasets show that our model has good detection performance for objects of various sizes. The proposed model can achieve<?brk?> 81.9 mAP when the size of the input image is 300 × 300.

References

  1. Sean Bell, C. Lawrence Zitnick, Kavita Bala, and Ross Girshick. 2016. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2874--2883.Google ScholarGoogle ScholarCross RefCross Ref
  2. Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. 2016. R-FCN: Object detection via region-based fully convolutional networks. In Advances in Neural Information Processing Systems. 379--387.Google ScholarGoogle Scholar
  3. Cheng-Yang Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi, and Alexander C. Berg. 2017. DSSD: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017).Google ScholarGoogle Scholar
  4. Spyros Gidaris and Nikos Komodakis. 2015. Object detection via a multi-region and semantic segmentation-aware CNN model. In Proceedings of the IEEE International Conference on Computer Vision. 1134--1142.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 2961--2969.Google ScholarGoogle Scholar
  6. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision. 1026--1034.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  8. Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4700--4708.Google ScholarGoogle Scholar
  9. Zhaojin Huang, Lichao Huang, Yongchao Gong, Chang Huang, and Xinggang Wang. 2019. Mask scoring R-CNN. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6409--6418.Google ScholarGoogle ScholarCross RefCross Ref
  10. Tao Kong, Fuchun Sun, Anbang Yao, Huaping Liu, Ming Lu, and Yurong Chen. 2017. Ron: Reverse connection with objectness prior networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5936--5944.Google ScholarGoogle ScholarCross RefCross Ref
  11. Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, and Jian Sun. 2018. Detnet: A backbone network for object detection. arXiv preprint arXiv:1804.06215 (2018).Google ScholarGoogle Scholar
  12. Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2117--2125.Google ScholarGoogle ScholarCross RefCross Ref
  13. Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision. 2980--2988.Google ScholarGoogle ScholarCross RefCross Ref
  14. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision. Springer, 21--37.Google ScholarGoogle Scholar
  15. Paszke Adam, Gross Sam, Massa Francisco, Lerer Adam, Bradbury James, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems. 8024--8035.Google ScholarGoogle Scholar
  16. Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779--788.Google ScholarGoogle ScholarCross RefCross Ref
  17. Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).Google ScholarGoogle Scholar
  18. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2017). 1137--1149.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  20. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  21. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.Google ScholarGoogle ScholarCross RefCross Ref
  22. Sanghyun Woo, Soonmin Hwang, and In So Kweon. 2018. Stairnet: Top-down semantic aggregation for accurate one shot detection. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1093--1102.Google ScholarGoogle ScholarCross RefCross Ref
  23. Xingyu Zeng, Wanli Ouyang, Junjie Yan, Hongsheng Li, Tong Xiao, Kun Wang, Yu Liu, Yucong Zhou, Bin Yang, Zhe Wang, et al. 2018. Crafting GBD-net for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 9 (2018), 2109--2123.Google ScholarGoogle ScholarCross RefCross Ref
  24. Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, and Stan Z Li. 2018. Single-shot refinement neural network for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4203--4212.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. FIN: Feature Integrated Network for Object Detection

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Multimedia Computing, Communications, and Applications
          ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 16, Issue 2
          May 2020
          390 pages
          ISSN:1551-6857
          EISSN:1551-6865
          DOI:10.1145/3401894
          Issue’s Table of Contents

          Copyright © 2020 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 22 May 2020
          • Online AM: 7 May 2020
          • Accepted: 1 January 2020
          • Revised: 1 December 2019
          • Received: 1 June 2019
          Published in tomm Volume 16, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!