ABSTRACT
Occlusions, scale variation and numerous false positives still represent fundamental challenges in pedestrian detection. Intuitively, different sizes of receptive fields and more attention to the visible parts are required for detecting pedestrians with various scales and occlusion levels, respectively. However, these challenges have not been addressed well by existing pedestrian detectors. This paper presents a novel convolutional network, denoted as box guided convolution network (BGCNet), to tackle these challenges simultaneously in a unified framework. In particular, we proposed a box guided convolution (BGC) that can dynamically adjust the sizes of convolution kernels guided by the predicted bounding boxes. In this way, BGCNet provides position-aware receptive fields to address the challenge of large variations of scales. In addition, for the issue of heavy occlusion, the kernel parameters of BGC are spatially localized around the salient and mostly visible key points of a pedestrian, such as the head and foot, to effectively capture high-level semantic features to help detection. Furthermore, a local maximum (LM) loss is introduced to depress false positives and highlight true positives by forcing positives, rather than negatives, as local maximums, without any additional inference burden. We evaluate BGCNet on popular pedestrian detection benchmarks, and achieve the state-of-the-art results, with the significant performance improvement on heavily occluded and small-scale pedestrians.
Supplemental Material
Available for Download
The results of BGCNet on the CrowdHuman dataset.
- Zhaowei Cai, Quanfu Fan, Rogério Schmidt Feris, and Nuno Vasconcelos. 2016. A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part IV. 354--370. https://doi.org/10.1007/978--3--319--46493-0_22Google Scholar
- Zhaowei Cai, Mohammad J. Saberian, and Nuno Vasconcelos. 2015. Learning Complexity-Aware Cascades for Deep Pedestrian Detection. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7--13, 2015. 3361--3369. https://doi.org/10.1109/ICCV.2015.384Google Scholar
- Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade R-CNN: Delving Into High Quality Object Detection. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18--22, 2018. 6154--6162. https://doi.org/10.1109/CVPR.2018.00644Google Scholar
- Jiale Cao, Yanwei Pang, Jungong Han, and Xuelong Li. 2019 b. Hierarchical Shot Detector. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 9704--9713. https://doi.org/10.1109/ICCV.2019.00980Google Scholar
- Jiale Cao, Yanwei Pang, and Xuelong Li. 2019 a. Triply Supervised Decoder Networks for Joint Detection and Segmentation. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16--20, 2019. Computer Vision Foundation / IEEE, 7392--7401. https://doi.org/10.1109/CVPR.2019.00757Google Scholar
- Yunyun Cao, Sugiri Pranata, Makoto Yasugi, Zhiheng Niu, and Hirofumi Nishimura. 2012. Stagged multi-scale LBP for pedestrian detection. In 19th IEEE International Conference on Image Processing, ICIP 2012, Lake Buena Vista, Orlando, FL, USA, September 30 - October 3, 2012. 449--452. https://doi.org/10.1109/ICIP.2012.6466893Google Scholar
Cross Ref
- Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. 2015. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1412.7062Google Scholar
- Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The Cityscapes Dataset for Semantic Urban Scene Understanding. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27--30, 2016. 3213--3223. https://doi.org/10.1109/CVPR.2016.350Google Scholar
- Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable Convolutional Networks. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22--29, 2017. 764--773. https://doi.org/10.1109/ICCV.2017.89Google Scholar
- Navneet Dalal and Bill Triggs. 2005. Histograms of Oriented Gradients for Human Detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), 20--26 June 2005, San Diego, CA, USA. 886--893. https://doi.org/10.1109/CVPR.2005.177Google Scholar
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20--25 June 2009, Miami, Florida, USA. 248--255. https://doi.org/10.1109/CVPRW.2009.5206848Google Scholar
Cross Ref
- Piotr Dollá r, Ron Appel, Serge J. Belongie, and Pietro Perona. 2014. Fast Feature Pyramids for Object Detection. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 36, 8 (2014), 1532--1545. https://doi.org/10.1109/TPAMI.2014.2300479Google Scholar
Digital Library
- Piotr Dollá r, Christian Wojek, Bernt Schiele, and Pietro Perona. 2009. Pedestrian detection: A benchmark. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20--25 June 2009, Miami, Florida, USA. 304--311. https://doi.org/10.1109/CVPRW.2009.5206631Google Scholar
- Zhenyu Duan, Jinpeng Lan, Yi Xu, Bingbing Ni, Lixue Zhuang, and Xiaokang Yang. 2017. Pedestrian Detection via Bi-directional Multi-scale Analysis. In Proceedings of the 2017 ACM on Multimedia Conference, MM 2017, Mountain View, CA, USA, October 23--27, 2017. 1023--1031. https://doi.org/10.1145/3123266.3123356Google Scholar
Digital Library
- Philip Geismann and Alois Knoll. 2010. Speeding Up HOG and LBP Features for Pedestrian Detection by Multiresolution Techniques. In Advances in Visual Computing - 6th International Symposium, ISVC 2010, Las Vegas, NV, USA, November 29-December 1, 2010. Proceedings, Part I. 243--252. https://doi.org/10.1007/978--3--642--17289--2_24Google Scholar
- Spyridon Gidaris and Nikos Komodakis. 2016. Attend Refine Repeat: Active Box Proposal Generation via In-Out Localization. In Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19--22, 2016, Richard C. Wilson, Edwin R. Hancock, and William A. P. Smith (Eds.). BMVA Press. http://www.bmva.org/bmvc/2016/papers/paper090/index.htmlGoogle Scholar
Cross Ref
- Ross B. Girshick. 2015. Fast R-CNN. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7--13, 2015. 1440--1448. https://doi.org/10.1109/ICCV.2015.169Google Scholar
Digital Library
- Priya Goyal, Piotr Dollá r, Ross B. Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2017. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. CoRR, Vol. abs/1706.02677 (2017). arxiv: 1706.02677 http://arxiv.org/abs/1706.02677Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7--13, 2015. 1026--1034. https://doi.org/10.1109/ICCV.2015.123Google Scholar
Digital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27--30, 2016. 770--778. https://doi.org/10.1109/CVPR.2016.90Google Scholar
- Li WangDong-Chen He. 1990. Texture classification using texture spectrum. Pattern Recognition, Vol. 23, 8 (1990), 905--910. https://doi.org/10.1016/0031--3203(90)90135--8Google Scholar
Digital Library
- Van-Dung Hoang, My Ha Le, and Kang-Hyun Jo. 2014. Hybrid cascade boosting machine using variant scale blocks based HOG features for pedestrian detection. Neurocomputing, Vol. 135 (2014), 357--366. https://doi.org/10.1016/j.neucom.2013.12.017Google Scholar
Digital Library
- Jan Hendrik Hosang, Mohamed Omran, Rodrigo Benenson, and Bernt Schiele. 2015. Taking a deeper look at pedestrians. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7--12, 2015. 4073--4082. https://doi.org/10.1109/CVPR.2015.7299034Google Scholar
Cross Ref
- Hei Law and Jia Deng. 2018. CornerNet: Detecting Objects as Paired Keypoints. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8--14, 2018, Proceedings, Part XIV. 765--781. https://doi.org/10.1007/978--3-030-01264--9_45Google Scholar
- Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He, and Piotr Dollá r. 2017. Focal Loss for Dense Object Detection. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22--29, 2017. 2999--3007. https://doi.org/10.1109/ICCV.2017.324Google Scholar
- Songtao Liu, Di Huang, and Yunhong Wang. 2019 a. Adaptive NMS: Refining Pedestrian Detection in a Crowd. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16--20, 2019. 6459--6468. http://openaccess.thecvf.com/content_CVPR_2019/html/Liu_Adaptive_NMS_Refining_Pedestrian_Detection_in_a_Crowd_CVPR_2019_paper.htmlGoogle Scholar
Cross Ref
- Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single Shot MultiBox Detector. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part I. 21--37. https://doi.org/10.1007/978--3--319--46448-0_2Google Scholar
- Wei Liu, Shengcai Liao, Weidong Hu, Xuezhi Liang, and Xiao Chen. 2018. Learning Efficient Single-Stage Pedestrian Detectors by Asymptotic Localization Fitting. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8--14, 2018, Proceedings, Part XIV. 643--659. https://doi.org/10.1007/978--3-030-01264--9_38Google Scholar
- Wei Liu, Shengcai Liao, Weiqiang Ren, Weidong Hu, and Yinan Yu. 2019 b. High-Level Semantic Feature Detection: A New Perspective for Pedestrian Detection. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16--20, 2019. 5187--5196. http://openaccess.thecvf.com/content_CVPR_2019/html/Liu_High-Level_Semantic_Feature_Detection_A_New_Perspective_for_Pedestrian_Detection_CVPR_2019_paper.htmlGoogle Scholar
- Markus Mathias, Rodrigo Benenson, Radu Timofte, and Luc Van Gool. 2013. Handling Occlusions with Franken-Classifiers. In IEEE International Conference on Computer Vision, ICCV 2013, Sydney, Australia, December 1--8, 2013. 1505--1512. https://doi.org/10.1109/ICCV.2013.190Google Scholar
- Woonhyun Nam, Piotr Dollá r, and Joon Hee Han. 2014. Local Decorrelation For Improved Pedestrian Detection. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8--13 2014, Montreal, Quebec, Canada. 424--432. http://papers.nips.cc/paper/5419-local-decorrelation-for-improved-pedestrian-detectionGoogle Scholar
- Wanli Ouyang and Xiaogang Wang. 2013. Joint Deep Learning for Pedestrian Detection. In IEEE International Conference on Computer Vision, ICCV 2013, Sydney, Australia, December 1--8, 2013. 2056--2063. https://doi.org/10.1109/ICCV.2013.257Google Scholar
- Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. (2017).Google Scholar
- Rabia Rauf, Ahmad R. Shahid, Sheikh Ziauddin, and Asad Ali Safi. 2016. Pedestrian detection using HOG, LUV and optical flow as features with AdaBoost as classifier. In Sixth International Conference on Image Processing Theory, Tools and Applications, IPTA 2016, Oulu, Finland, December 12--15, 2016. 1--4. https://doi.org/10.1109/IPTA.2016.7821024Google Scholar
Cross Ref
- Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27--30, 2016. 779--788. https://doi.org/10.1109/CVPR.2016.91Google Scholar
- Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 39, 6 (2017), 1137--1149. https://doi.org/10.1109/TPAMI.2016.2577031Google Scholar
Digital Library
- Shuai Shao, Zijian Zhao, Boxun Li, Tete Xiao, Gang Yu, Xiangyu Zhang, and Jian Sun. 2018. CrowdHuman: A Benchmark for Detecting Human in a Crowd. CoRR, Vol. abs/1805.00123 (2018). arxiv: 1805.00123 http://arxiv.org/abs/1805.00123Google Scholar
- Tao Song, Leiyu Sun, Di Xie, Haiming Sun, and Shiliang Pu. 2018a. Small-scale Pedestrian Detection Based on Somatic Topology Localization and Temporal Feature Aggregation. CoRR, Vol. abs/1807.01438 (2018). arxiv: 1807.01438 http://arxiv.org/abs/1807.01438Google Scholar
- Tao Song, Leiyu Sun, Di Xie, Haiming Sun, and Shiliang Pu. 2018b. Small-Scale Pedestrian Detection Based on Topological Line Localization and Temporal Feature Aggregation. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8--14, 2018, Proceedings, Part VII. 554--569. https://doi.org/10.1007/978--3-030-01234--2_33Google Scholar
- Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. 2019. Deep High-Resolution Representation Learning for Human Pose Estimation. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16--20, 2019. 5693--5703. http://openaccess.thecvf.com/content_CVPR_2019/html/Sun_Deep_High-Resolution_Representation_Learning_for_Human_Pose_Estimation_CVPR_2019_paper.htmlGoogle Scholar
- Antti Tarvainen and Harri Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24--26, 2017, Workshop Track Proceedings. https://openreview.net/forum?id=ry8u21rtlGoogle Scholar
- Yonglong Tian, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Strong Parts for Pedestrian Detection. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7--13, 2015. 1904--1912. https://doi.org/10.1109/ICCV.2015.221Google Scholar
- Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2019. FCOS: Fully Convolutional One-Stage Object Detection. In Proc. Int. Conf. Computer Vision (ICCV) .Google Scholar
Cross Ref
- Jasper R. R. Uijlings, Koen E. A. van de Sande, Theo Gevers, and Arnold W. M. Smeulders. 2013. Selective Search for Object Recognition. International Journal of Computer Vision, Vol. 104, 2 (2013), 154--171. https://doi.org/10.1007/s11263-013-0620--5Google Scholar
Digital Library
- Paul A. Viola and Michael J. Jones. 2001. Rapid Object Detection using a Boosted Cascade of Simple Features. In 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), with CD-ROM, 8--14 December 2001, Kauai, HI, USA. 511--518. https://doi.org/10.1109/CVPR.2001.990517Google Scholar
- Xinlong Wang, Tete Xiao, Yuning Jiang, Shuai Shao, Jian Sun, and Chunhua Shen. 2018. Repulsion Loss: Detecting Pedestrians in a Crowd. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18--22, 2018. 7774--7783. https://doi.org/10.1109/CVPR.2018.00811Google Scholar
Cross Ref
- Mo Zhang, Jie Zhao, Xiang Li, Li Zhang, and Quanzheng Li. 2019. ASCNet: Adaptive-Scale Convolutional Neural Networks for Multi-Scale Feature Learning. CoRR, Vol. abs/1907.03241 (2019). arxiv: 1907.03241 http://arxiv.org/abs/1907.03241Google Scholar
- Rui Zhang, Sheng Tang, Yongdong Zhang, Jintao Li, and Shuicheng Yan. 2017b. Scale-Adaptive Convolutions for Scene Parsing. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22--29, 2017. IEEE Computer Society, 2050--2058. https://doi.org/10.1109/ICCV.2017.224Google Scholar
- Shanshan Zhang, Rodrigo Benenson, Mohamed Omran, Jan Hendrik Hosang, and Bernt Schiele. 2016. How Far are We from Solving Pedestrian Detection?. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27--30, 2016. 1259--1267. https://doi.org/10.1109/CVPR.2016.141Google Scholar
Cross Ref
- Shanshan Zhang, Rodrigo Benenson, and Bernt Schiele. 2017a. CityPersons: A Diverse Dataset for Pedestrian Detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21--26, 2017. 4457--4465. https://doi.org/10.1109/CVPR.2017.474Google Scholar
- Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, and Stan Z. Li. 2018a. Occlusion-Aware R-CNN: Detecting Pedestrians in a Crowd. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8--14, 2018, Proceedings, Part III. 657--674. https://doi.org/10.1007/978--3-030-01219--9_39Google Scholar
- Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, and Stan Z. Li. 2018b. Single-Shot Refinement Neural Network for Object Detection. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18--22, 2018. IEEE Computer Society, 4203--4212. https://doi.org/10.1109/CVPR.2018.00442Google Scholar
- Chunluan Zhou and Junsong Yuan. 2016. Learning to Integrate Occlusion-Specific Detectors for Heavily Occluded Pedestrian Detection. In Computer Vision - ACCV 2016 - 13th Asian Conference on Computer Vision, Taipei, Taiwan, November 20--24, 2016, Revised Selected Papers, Part II. 305--320. https://doi.org/10.1007/978--3--319--54184--6_19Google Scholar
- Chunluan Zhou and Junsong Yuan. 2018. Bi-box Regression for Pedestrian Detection and Occlusion Estimation. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8--14, 2018, Proceedings, Part I. 138--154. https://doi.org/10.1007/978--3-030-01246--5_9Google Scholar
- Xingyi Zhou, Jiacheng Zhuo, and Philipp Kr"a henbü hl. 2019. Bottom-Up Object Detection by Grouping Extreme and Center Points. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16--20, 2019. 850--859. http://openaccess.thecvf.com/content_CVPR_2019/html/Zhou_Bottom-Up_Object_Detection_by_Grouping_Extreme_and_Center_Points_CVPR_2019_paper.htmlGoogle Scholar
Cross Ref
- Chenchen Zhu, Yihui He, and Marios Savvides. 2019. Feature Selective Anchor-Free Module for Single-Shot Object Detection. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16--20, 2019. 840--849. http://openaccess.thecvf.com/content_CVPR_2019/html/Zhu_Feature_Selective_Anchor-Free_Module_for_Single-Shot_Object_Detection_CVPR_2019_paper.htmlGoogle Scholar
Index Terms
- Box Guided Convolution for Pedestrian Detection
Recommendations
Real-time pedestrian detection via hierarchical convolutional feature
With the development of pedestrian detection technologies, existing methods can not simultaneously satisfy high quality detection and fast calculation for practical applications. Therefore, the goal of our research is to balance of pedestrian detection ...
Multi-spectral pedestrian detection
Pedestrian detection is a crucial problem in human pose recovery and behavior analysis, especially in applications such as visual surveillance, robotics, and drive-assistance systems. Recently, most pedestrian detection approaches of machine learning ...
Single-Pedestrian Detection Aided by Multi-pedestrian Detection
CVPR '13: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern RecognitionIn this paper, we address the challenging problem of detecting pedestrians who appear in groups and have interaction. A new approach is proposed for single-pedestrian detection aided by multi-pedestrian detection. A mixture model of multi-pedestrian ...





Comments