ABSTRACT
Most of the existing single-stage and two-stage 3D object detectors are anchor-based methods, while the efficient but challenging anchor-free single-stage 3D object detection is not well investigated. Recent studies on 2D object detection show that the anchor-free methods also are of great potential. However, the unordered and sparse properties of point clouds prevent us from directly leveraging the advanced 2D methods on 3D point clouds. We overcome this by converting the voxel-based sparse 3D feature volumes into the sparse 2D feature maps. We propose an attentive module to fit the sparse feature maps to dense mostly on the object regions through the deformable convolution tower and the supervised mask-guided attention. By directly regressing the 3D bounding box from the enhanced and dense feature maps, we construct a novel single-stage 3D detector for point clouds in an anchor-free manner. We propose an IoU-based detection confidence re-calibration scheme to improve the correlation between the detection confidence score and the accuracy of the bounding box regression. Our code is publicly available at https://github.com/jialeli1/MGAF-3DSSD.
- Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade R-CNN: Delving Into High Quality Object Detection. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6154--6162.Google Scholar
- Qi Chen, Lin Sun, Zhixin Wang, Kui Jia, and Alan Yuille. 2020. Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots. In European Conference on Computer Vision (ECCV). 68--84.Google Scholar
Cross Ref
- Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, and Tian Xia. 2017. Multi-view 3D Object Detection Network for Autonomous Driving. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 6526--6534.Google Scholar
Cross Ref
- Yilun Chen, Shu Liu, Xiaoyong Shen, and Jiaya Jia. 2019. Fast Point R-CNN. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 9775--9784.Google Scholar
- Hang Dai, Shujie Luo, Yong Ding, and Ling Shao. 2020. Commands for autonomous vehicles by progressively stacking visual-linguistic representations. In European Conference on Computer Vision (ECCV). Springer, 27--32.Google Scholar
Cross Ref
- Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable Convolutional Networks. In 2017 IEEE International Conference on Computer Vision (ICCV). 764--773.Google Scholar
- Liang Du, Xiaoqing Ye, Xiao Tan, Jianfeng Feng, Zhenbo Xu, Errui Ding, and Shilei Wen. 2020. Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13326--13335.Google Scholar
Cross Ref
- Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, and Qi Tian. 2019. CenterNet: Keypoint Triplets for Object Detection. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 6568--6577.Google Scholar
- Juncong Fei, Wenbo Chen, Philipp Heidenreich, Sascha Wirges, and Christoph Stiller. 2020. SemanticVoxels: Sequential Fusion for 3D Pedestrian Detection using LiDAR Point Cloud and Semantic Segmentation. In IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems. IEEE, 185--190.Google Scholar
Cross Ref
- Tuo Feng, Licheng Jiao, Hao Zhu, and Long Sun. 2020. A Novel Object Re-Track Framework for 3D Point Clouds. In MM '20: The 28th ACM International Conference on Multimedia. ACM, 3118--3126. Google Scholar
Digital Library
- Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3354--3361. Google Scholar
Digital Library
- Ben Graham. 2015. Sparse 3D Convolutional Neural Networks. In 2015 British Machine Vision Conference (BMVC). 150.1--150.9.Google Scholar
Cross Ref
- Benjamin Graham, Martin Engelcke, and Laurens van der Maaten. 2018. 3D Semantic Segmentation With Submanifold Sparse Convolutional Networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 9224--9232.Google Scholar
- Chenhang He, Hui Zeng, Jianqiang Huang, Xian-Sheng Hua, and Lei Zhang. 2020. Structure Aware Single-Stage 3D Object Detection From Point Cloud. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11870--11879.Google Scholar
Cross Ref
- Kaiming He, Georgia Gkioxari, Piotr Dollá r, and Ross B. Girshick. 2017. Mask R-CNN. In 2017 IEEE International Conference on Computer Vision (ICCV). 2980--2988.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778.Google Scholar
- Winston H. Hsu. 2019. Learning from 3D (Point Cloud) Data. In MM '19: The 27th ACM International Conference on Multimedia. ACM, 2697--2698. Google Scholar
Digital Library
- Borui Jiang, Ruixuan Luo, Jiayuan Mao, Tete Xiao, and Yuning Jiang. 2018. Acquisition of Localization Confidence for Accurate Object Detection. In European Conference on Computer Vision (ECCV). 784--799.Google Scholar
- KITTI. 2021. KITTI Leaderboard of 3D Object Detection Benchmark. http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3dGoogle Scholar
- Jason Ku, Melissa Mozifian, Jungwook Lee, Ali Harakeh, and Steven L. Waslander. 2018. Joint 3D Proposal Generation and Object Detection from View Aggregation. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 1--8.Google Scholar
- Alex H. Lang, Sourabh Vora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. 2019. PointPillars: Fast Encoders for Object Detection From Point Clouds. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12689--12697.Google Scholar
- Hei Law and Jia Deng. 2018. CornerNet: Detecting Objects as Paired Keypoints. In European Conference on Computer Vision (ECCV), Vol. 11218. 765--781.Google Scholar
Cross Ref
- Jiale Li, Shujie Luo, Ziqi Zhu, Hang Dai, Andrey S Krylov, Yong Ding, and Ling Shao. 2020 a. 3D IoU-Net: IoU guided 3D object detector for point clouds. arXiv preprint arXiv:2004.04962 (2020).Google Scholar
- Jiale Li, Yu Sun, Shujie Luo, Ziqi Zhu, Hang Dai, Andrey S Krylov, Yong Ding, and Ling Shao. 2021. P2V-RCNN: Point to Voxel Feature Learning for 3D Object Detection from Point Clouds. IEEE Access (2021).Google Scholar
- Xiang Li, Wenhai Wang, Lijun Wu, Shuo Chen, Xiaolin Hu, Jun Li, Jinhui Tang, and Jian Yang. 2020 b. Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection. In Conference on Neural Information Processing Systems (NeurIPS) .Google Scholar
- Ming Liang, Bin Yang, Yun Chen, Rui Hu, and Raquel Urtasun. 2019. Multi-Task Multi-Sensor Fusion for 3D Object Detection. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7337--7345.Google Scholar
Cross Ref
- Tsung-Yi Lin, Piotr Dollá r, Ross B. Girshick, Kaiming He, Bharath Hariharan, and Serge J. Belongie. 2017. Feature Pyramid Networks for Object Detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 936--944.Google Scholar
- Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He, and Piotr Dollá r. 2020. Focal Loss for Dense Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 42, 2 (2020), 318--327.Google Scholar
Cross Ref
- Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single Shot Multibox Detector. In European Conference on Computer Vision (ECCV). Springer, 21--37.Google Scholar
- Zhe Liu, Xin Zhao, Tengteng Huang, Ruolan Hu, Yu Zhou, and Xiang Bai. 2020. TANet: Robust 3D Object Detection from Point Clouds with Triple Attention. In 2020 AAAI Conference on Artificial Intelligence (AAAI). 11677--11684.Google Scholar
Cross Ref
- Ilya Loshchilov and Frank Hutter. 2017. Fixing Weight Decay Regularization in Adam. CoRR, Vol. abs/1711.05101 (2017). arxiv: 1711.05101Google Scholar
- Shujie Luo, Hang Dai, Ling Shao, and Yong Ding. 2020. C4AV: Learning Cross-Modal Representations from Transformers. In European Conference on Computer Vision (ECCV). Springer, 33--38.Google Scholar
Cross Ref
- Shujie Luo, Hang Dai, Ling Shao, and Yong Ding. 2021. M3DSSD: Monocular 3D single stage object detector. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6145--6154.Google Scholar
Cross Ref
- Su Pang, Daniel D. Morris, and Hayder Radha. 2020. CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 10386--10393.Google Scholar
Cross Ref
- Charles R. Qi, Wei Liu, Chenxia Wu, Hao Su, and Leonidas J. Guibas. 2018. Frustum PointNets for 3D Object Detection from RGB-D Data. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 918--927.Google Scholar
- Charles Ruizhongtai Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017a. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 77--85.Google Scholar
- Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J. Guibas. 2017b. PointNet+: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Conference on Neural Information Processing Systems (NeurIPS). 5099--5108. Google Scholar
Digital Library
- Zengyi Qin, Jinglu Wang, and Yan Lu. 2020. Weakly Supervised 3D Object Detection from Point Clouds. In MM '20: The 28th ACM International Conference on Multimedia. ACM, 4144--4152. Google Scholar
Digital Library
- Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 779--788.Google Scholar
Cross Ref
- Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, 6 (2017), 1137--1149. Google Scholar
Digital Library
- Roni Permana Saputra, Nemanja Rakicevic, and Petar Kormushev. 2019. Sim-to-Real Learning for Casualty Detection from Ground Projected Point Cloud Data. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 3918--3925.Google Scholar
Digital Library
- Shaoshuai Shi, Chaoxu Guo, Li Jiang, Zhe Wang, Jianping Shi, Xiaogang Wang, and Hongsheng Li. 2020 a. PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10526--10535.Google Scholar
Cross Ref
- Shaoshuai Shi, Xiaogang Wang, and Hongsheng Li. 2019. PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 770--779.Google Scholar
Cross Ref
- Shaoshuai Shi, Zhe Wang, Jianping Shi, Xiaogang Wang, and Hongsheng Li. 2020 b. From Points to Parts: 3D Object Detection from Point Cloud with Part-aware and Part-aggregation Network. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020), 1--1.Google Scholar
- Weijing Shi and Raj Rajkumar. 2020. Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1708--1716.Google Scholar
Cross Ref
- Leslie N. Smith and Nicholay Topin. 2019. Super-convergence: Very Fast Training of Neural Networks Using Large Learning Rates. In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, Vol. 11006. 1100612.Google Scholar
- Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Yu Zhang, Jonathon Shlens, Zhifeng Chen, and Dragomir Anguelov. 2020 a. Scalability in Perception for Autonomous Driving: Waymo Open Dataset. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2443--2451.Google Scholar
- Xuebin Sun, Sukai Wang, Miaohui Wang, Shing Shin Cheng, and Ming Liu. 2020 b. An Advanced LiDAR Point Cloud Sequence Coding Scheme for Autonomous Driving. In MM '20: The 28th ACM International Conference on Multimedia. ACM, 2793--2801. Google Scholar
Digital Library
- Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2019. FCOS: Fully Convolutional One-Stage Object Detection. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 9626--9635.Google Scholar
- Sourabh Vora, Alex H. Lang, Bassam Helou, and Oscar Beijbom. 2020. PointPainting: Sequential Fusion for 3D Object Detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4603--4611.Google Scholar
Cross Ref
- Yue Wang, Alireza Fathi, Abhijit Kundu, David A. Ross, Caroline Pantofaru, Thomas A. Funkhouser, and Justin Solomon. 2020. Pillar-Based Object Detection for Autonomous Driving. In European Conference on Computer Vision (ECCV), Vol. 12367. 18--34.Google Scholar
- Zhiyu Wang, Hao Fu, Li Wang, Liang Xiao, and Bin Dai. 2019. SCNet: Subdivision Coding Network for Object Detection Based on 3D Point Cloud. IEEE Access, Vol. 7 (2019), 120449--120462.Google Scholar
Cross Ref
- Xin Wen, Zhizhong Han, Geunhyuk Youk, and Yu-Shen Liu. 2020. CF-SIS: Semantic-Instance Segmentation of 3D Point Clouds by Context Fusion with Self-Attention. In MM '20: The 28th ACM International Conference on Multimedia. ACM, 1661--1669. Google Scholar
Digital Library
- Jian Wu, Jianbo Jiao, Qingxiong Yang, Zheng-Jun Zha, and Xuejin Chen. 2019. Ground-Aware Point Cloud Semantic Segmentation for Autonomous Driving. In MM '19: The 27th ACM International Conference on Multimedia. ACM, 971--979. Google Scholar
Digital Library
- Xiongwei Wu, Doyen Sahoo, and Steven C. H. Hoi. 2020 b. Meta-RCNN: Meta Learning for Few-Shot Object Detection. In MM '20: The 28th ACM International Conference on Multimedia. ACM, 1679--1687. Google Scholar
Digital Library
- Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. 2020 a. A Comprehensive Survey on Graph Neural Networks. IEEE Transactions on Neural Networks and Learning Systems (2020), 1--21.Google Scholar
- L. Xie, C. Xiang, Z. Yu, G. Xu, Z. Yang, D. Cai, and X. He. 2020. PI-RCNN: An Efficient Multi-Sensor 3D Object Detector with Point-Based Attentive Cont-Conv Fusion Module. In 2020 AAAI Conference on Artificial Intelligence (AAAI). 12460--12467.Google Scholar
- Yan Yan, Yuxing Mao, and Bo Li. 2018. SECOND: Sparsely Embedded Convolutional Detection. Sensors, Vol. 18, 10 (2018), 3337--3354.Google Scholar
- Ze Yang, Shaohui Liu, Han Hu, Liwei Wang, and Stephen Lin. 2019 a. RepPoints: Point Set Representation for Object Detection. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 9656--9665.Google Scholar
- Zetong Yang, Yanan Sun, Shu Liu, and Jiaya Jia. 2020. 3DSSD: Point-Based 3D Single Stage Object Detector. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11037--11045.Google Scholar
Cross Ref
- Zetong Yang, Yanan Sun, Shu Liu, Xiaoyong Shen, and Jiaya Jia. 2019 b. STD: Sparse-to-Dense 3D Object Detector for Point Cloud. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 1951--1960.Google Scholar
Cross Ref
- Maosheng Ye, Shuangjie Xu, and Tongyi Cao. 2020. HVNet: Hybrid Voxel Network for LiDAR Based 3D Object Detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1628--1637.Google Scholar
Cross Ref
- Shifeng Zhang, Cheng Chi, Yongqiang Yao, Zhen Lei, and Stan Z. Li. 2020. Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 9756--9765.Google Scholar
- Xingyi Zhou, Dequan Wang, and Philipp Kr"a henbü hl. 2019 b. Objects as Points. CoRR, Vol. abs/1904.07850 (2019). arxiv: 1904.07850Google Scholar
- Xingyi Zhou, Jiacheng Zhuo, and Philipp Kr"a henbü hl. 2019 c. Bottom-Up Object Detection by Grouping Extreme and Center Points. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 850--859.Google Scholar
Cross Ref
- Yin Zhou, Pei Sun, Yu Zhang, Dragomir Anguelov, Jiyang Gao, Tom Ouyang, James Guo, Jiquan Ngiam, and Vijay Vasudevan. 2019 a. End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds. In 2019 Annual Conference on Robot Learning (CoRL), Vol. 100. PMLR, 923--932.Google Scholar
- Yin Zhou and Oncel Tuzel. 2018. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4490--4499.Google Scholar
Cross Ref
- Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai. 2019. Deformable ConvNets V2: More Deformable, Better Results. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 9300--9308.Google Scholar
Cross Ref
Index Terms
- Anchor-free 3D Single Stage Detector with Mask-Guided Attention for Point Cloud
Recommendations
Anchor-free object detection with mask attention
AbstractThe anchor-free method based on key point detection has made great progress. However, the anchor-free method is too dependent on using a convolutional network to generate a rough heatmap. This is difficult to detect for objects with a large size ...
FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection
Computer Vision – ECCV 2022AbstractRecently, promising applications in robotics and augmented reality have attracted considerable attention to 3D object detection from point clouds. In this paper, we present FCAF3D—a first-in-class fully convolutional anchor-free indoor 3D object ...
DetOH: An Anchor-Free Object Detector with Only Heatmaps
Advanced Data Mining and ApplicationsAbstractIn object detection, the anchor-based method relies on too much manual design, and the training and prediction process is too inefficient. In recent years, one-stage anchor-free methods such as Fully Convolutional One-stage Object Detector (FCOS) ...





Comments