Abstract
Multi-layer detection is a widely used method in the field of object detection. It extracts multiple feature maps with different resolutions from the backbone network to detect objects of different scales, which can effectively cope with the problem of object scale change in object detection. Although the multi-layer detection utilizes multiple detection layers to alleviate the burden of one single detection layer and can improve the detection accuracy to some extent, this method has two limitations. First, manually assigning anchor boxes of different sizes to different feature maps is too dependent on the human experience. Second, there is a semantic gap between each detection layer in multi-layer detection. The same detector needs to simultaneously process the detection layers with inconsistent semantic strength, which increases the optimization difficulty of the detector. In this article, we propose a feature integrated network (FIN) based on single layer detection to deal with the problems mentioned above. Different from the existing methods, we design a series of verification experiments based on the multi-layer detection model, which shows that the shallow high-resolution feature map has the potential to simultaneously and effectively detect objects of various scales. Considering that the semantic information of the shallow feature map is weak, we propose two modules to enhance the representation ability of the single detection layer. First, we propose a detection adaptation network (DANet) to extract powerful feature maps that are useful for object detection tasks. Second, we combine global context information and local detail information with a verified hourglass module (VHM) to generate a single feature map with high resolution and rich semantic information so that we can assign all anchor boxes to this detection layer. In our model, all the detection operations are concentrated on a high-resolution feature map whose semantic information and detailed information are enhanced as much as possible. Therefore, the proposed model can solve the problem of anchor assignment and inconsistent semantic strength between multiple detection layers mentioned above. A large number of experiments on the Pattern Analysis, Statistical Modelling and Computational Learning Visual Object Classes (PASCAL VOC) and Microsoft Common Objects in Context (MS COCO) datasets show that our model has good detection performance for objects of various sizes. The proposed model can achieve<?brk?> 81.9 mAP when the size of the input image is 300 × 300.
- Sean Bell, C. Lawrence Zitnick, Kavita Bala, and Ross Girshick. 2016. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2874--2883.Google Scholar
Cross Ref
- Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. 2016. R-FCN: Object detection via region-based fully convolutional networks. In Advances in Neural Information Processing Systems. 379--387.Google Scholar
- Cheng-Yang Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi, and Alexander C. Berg. 2017. DSSD: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017).Google Scholar
- Spyros Gidaris and Nikos Komodakis. 2015. Object detection via a multi-region and semantic segmentation-aware CNN model. In Proceedings of the IEEE International Conference on Computer Vision. 1134--1142.Google Scholar
Digital Library
- Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 2961--2969.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision. 1026--1034.Google Scholar
Digital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google Scholar
Cross Ref
- Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4700--4708.Google Scholar
- Zhaojin Huang, Lichao Huang, Yongchao Gong, Chang Huang, and Xinggang Wang. 2019. Mask scoring R-CNN. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6409--6418.Google Scholar
Cross Ref
- Tao Kong, Fuchun Sun, Anbang Yao, Huaping Liu, Ming Lu, and Yurong Chen. 2017. Ron: Reverse connection with objectness prior networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5936--5944.Google Scholar
Cross Ref
- Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, and Jian Sun. 2018. Detnet: A backbone network for object detection. arXiv preprint arXiv:1804.06215 (2018).Google Scholar
- Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2117--2125.Google Scholar
Cross Ref
- Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision. 2980--2988.Google Scholar
Cross Ref
- Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision. Springer, 21--37.Google Scholar
- Paszke Adam, Gross Sam, Massa Francisco, Lerer Adam, Bradbury James, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems. 8024--8035.Google Scholar
- Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779--788.Google Scholar
Cross Ref
- Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).Google Scholar
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2017). 1137--1149.Google Scholar
Digital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--9.Google Scholar
Cross Ref
- Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.Google Scholar
Cross Ref
- Sanghyun Woo, Soonmin Hwang, and In So Kweon. 2018. Stairnet: Top-down semantic aggregation for accurate one shot detection. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1093--1102.Google Scholar
Cross Ref
- Xingyu Zeng, Wanli Ouyang, Junjie Yan, Hongsheng Li, Tong Xiao, Kun Wang, Yu Liu, Yucong Zhou, Bin Yang, Zhe Wang, et al. 2018. Crafting GBD-net for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 9 (2018), 2109--2123.Google Scholar
Cross Ref
- Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, and Stan Z Li. 2018. Single-shot refinement neural network for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4203--4212.Google Scholar
Cross Ref
Index Terms
FIN: Feature Integrated Network for Object Detection
Recommendations
A multi-layer approach for advanced persistent threat detection using machine learning based on network traffic
Advanced Persistent Threat (APT) is a dangerous network attack method that is widely used by attackers nowadays. During the APT attack process, attackers often use advanced techniques and tools, thus, causing many difficulties for information security ...
Object detection using YOLO: challenges, architectural successors, datasets and applications
AbstractObject detection is one of the predominant and challenging problems in computer vision. Over the decade, with the expeditious evolution of deep learning, researchers have extensively experimented and contributed in the performance enhancement of ...
SLMS-SSD: Improving the balance of semantic and spatial information in object detection
AbstractWith the development of deep learning technology, the research on convolutional neural network-based object detection is becoming more and more mature. However, most methods are unsatisfactory in dealing with the issue of semantic and ...
Highlights- Designing feature connectivity to balance semantic and spatial information.
- ...






Comments