ABSTRACT
In this paper, we present an Intersection-over-Union (IoU) guided two-stage 3D object detector with a voxel-to-point decoder. To preserve the necessary information from all raw points and maintain the high box recall in voxel based Region Proposal Network (RPN), we propose a residual voxel-to-point decoder to extract the point features in addition to the map-view features from the voxel based RPN. We use a 3D Region of Interest (RoI) alignment to crop and align the features with the proposal boxes for accurately perceiving the object position. The RoI-Aligned features are finally aggregated with the corner geometry embeddings that can provide the potentially missing corner information in the box refinement stage. We propose a simple and efficient method to align the estimated IoUs to the refined proposal boxes as a more relevant localization confidence. The comprehensive experiments on KITTI and Waymo Open Dataset demonstrate that our method achieves significant improvements with novel architectures against the existing methods. The code is available on Github URLhttps://github.com/jialeli1/From-Voxel-to-Point .
- Alejandro Barrera, Carlos Guindel, Jorge Beltrán, and Fernando García. 2020. BirdNet+: End-to-End 3D Object Detection in LiDAR Bird's Eye View. In 23rd IEEE International Conference on Intelligent Transportation Systems (ITSC). 1--6.Google Scholar
Cross Ref
- Qi Chen, Lin Sun, Zhixin Wang, Kui Jia, and Alan Yuille. 2020. Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots. In European Conference on Computer Vision (ECCV). 68--84.Google Scholar
Cross Ref
- Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, and Tian Xia. 2017. Multi-view 3D Object Detection Network for Autonomous Driving. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 6526--6534.Google Scholar
Cross Ref
- Yilun Chen, Shu Liu, Xiaoyong Shen, and Jiaya Jia. 2019. Fast Point R-CNN. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 9774--9783.Google Scholar
- Hang Dai, Shujie Luo, Yong Ding, and Ling Shao. 2020. Commands for autonomous vehicles by progressively stacking visual-linguistic representations. In European Conference on Computer Vision (ECCV). Springer, 27--32.Google Scholar
Cross Ref
- Liang Du, Xiaoqing Ye, Xiao Tan, Jianfeng Feng, Zhenbo Xu, Errui Ding, and Shilei Wen. 2020. Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13326--13335.Google Scholar
Cross Ref
- Juncong Fei, Wenbo Chen, Philipp Heidenreich, Sascha Wirges, and Christoph Stiller. 2020. SemanticVoxels: Sequential Fusion for 3D Pedestrian Detection using LiDAR Point Cloud and Semantic Segmentation. In IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems. 185--190.Google Scholar
Cross Ref
- Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence (CVPR). 3354--3361. Google Scholar
Digital Library
- Ben Graham. 2015. Sparse 3D Convolutional Neural Networks. In 2015 British Machine Vision Conference (BMVC). 150.1--150.9.Google Scholar
Cross Ref
- Benjamin Graham, Martin Engelcke, and Laurens van der Maaten. 2018. 3D Semantic Segmentation With Submanifold Sparse Convolutional Networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 9224--9232.Google Scholar
- Chenhang He, Hui Zeng, Jianqiang Huang, Xian-Sheng Hua, and Lei Zhang. 2020. Structure Aware Single-Stage 3D Object Detection From Point Cloud. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11870--11879.Google Scholar
Cross Ref
- Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross B. Girshick. 2017. Mask R-CNN. In 2017 IEEE International Conference on Computer Vision (ICCV). 2980--2988.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778.Google Scholar
- Borui Jiang, Ruixuan Luo, Jiayuan Mao, Tete Xiao, and Yuning Jiang. 2018. Acquisition of Localization Confidence for Accurate Object Detection. In European Conference on Computer Vision (ECCV). 784--799.Google Scholar
- Jason Ku, Melissa Mozifian, Jungwook Lee, Ali Harakeh, and Steven L. Waslander. 2018. Joint 3D Proposal Generation and Object Detection from View Aggregation. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 1--8.Google Scholar
- Alex H. Lang, Sourabh Vora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. 2019. PointPillars: Fast Encoders for Object Detection From Point Clouds. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12689--12697.Google Scholar
- Jiale Li, Shujie Luo, Ziqi Zhu, Hang Dai, Andrey S Krylov, Yong Ding, and Ling Shao. 2020. 3D IoU-Net: IoU guided 3D object detector for point clouds. arXiv preprint arXiv:2004.04962 (2020).Google Scholar
- Jiale Li, Yu Sun, Shujie Luo, Ziqi Zhu, Hang Dai, Andrey S Krylov, Yong Ding, and Ling Shao. 2021. P2V-RCNN: Point to Voxel Feature Learning for 3D Object Detection from Point Clouds. IEEE Access (2021).Google Scholar
- P. Li, X. Chen, and S. Shen. 2019. Stereo R-CNN Based 3D Object Detection for Autonomous Driving. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7636--7644.Google Scholar
- Ming Liang, Bin Yang, Yun Chen, Rui Hu, and Raquel Urtasun. 2019. Multi-Task Multi-Sensor Fusion for 3D Object Detection. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7337--7345.Google Scholar
Cross Ref
- Ming Liang, Bin Yang, Shenlong Wang, and Raquel Urtasun. 2018. Deep Continuous Fusion for Multi-sensor 3D Object Detection. In European Conference on Computer Vision (ECCV). 641--656.Google Scholar
Cross Ref
- Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He, and Piotr Dollár. 2020. Focal Loss for Dense Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 2 (2020), 318--327.Google Scholar
Cross Ref
- Xinhai Liu, Zhizhong Han, Xin Wen, Yu-Shen Liu, and Matthias Zwicker. 2019. L2G Auto-encoder: Understanding Point Clouds by Local-to-Global Reconstruction with Hierarchical Self-Attention. In MM '19: The 27th ACM International Conference on Multimedia. ACM, 989--997. Google Scholar
Digital Library
- Zhe Liu, Xin Zhao, Tengteng Huang, Ruolan Hu, Yu Zhou, and Xiang Bai. 2020. TANet: Robust 3D Object Detection from Point Clouds with Triple Attention. In 2020 AAAI Conference on Artificial Intelligence (AAAI). 11677--11684.Google Scholar
Cross Ref
- Ilya Loshchilov and Frank Hutter. 2017. Fixing Weight Decay Regularization in Adam. CoRR abs/1711.05101 (2017). arXiv:1711.05101Google Scholar
- Shujie Luo, Hang Dai, Ling Shao, and Yong Ding. 2020. C4AV: Learning Cross-Modal Representations from Transformers. In European Conference on Computer Vision (ECCV). Springer, 33--38.Google Scholar
Cross Ref
- Shujie Luo, Hang Dai, Ling Shao, and Yong Ding. 2021. M3DSSD: Monocular 3D single stage object detector. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6145--6154.Google Scholar
Cross Ref
- Charles R. Qi, Wei Liu, Chenxia Wu, Hao Su, and Leonidas J. Guibas. 2018. Frustum PointNets for 3D Object Detection from RGB-D Data. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 918--927.Google Scholar
- Charles Ruizhongtai Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 77--85.Google Scholar
- Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J. Guibas. 2017. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Conference on Neural Information Processing Systems (NeurIPS). 5099--5108. Google Scholar
Digital Library
- Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2017), 1137--1149. Google Scholar
Digital Library
- Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In 2015 Medical Image Computing and Computer-Assisted Intervention (MICCAI), Vol. 9351. 234--241.Google Scholar
Cross Ref
- Shaoshuai Shi, Chaoxu Guo, Li Jiang, Zhe Wang, Jianping Shi, Xiaogang Wang, and Hongsheng Li. 2020. PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10526--10535.Google Scholar
Cross Ref
- Shaoshuai Shi, Xiaogang Wang, and Hongsheng Li. 2019. PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 770--779.Google Scholar
Cross Ref
- Shaoshuai Shi, Zhe Wang, Jianping Shi, Xiaogang Wang, and Hongsheng Li. 2020. From Points to Parts: 3D Object Detection from Point Cloud with Part-aware and Part-aggregation Network. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020), 1--1.Google Scholar
- Weijing Shi and Raj Rajkumar. 2020. Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1708--1716.Google Scholar
Cross Ref
- Leslie N. Smith and Nicholay Topin. 2019. Super-convergence: Very fast training of neural networks using large learning rates. In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, Vol. 11006. 1100612.Google Scholar
- Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Yu Zhang, Jonathon Shlens, Zhifeng Chen, and Dragomir Anguelov. 2020. Scalability in Perception for Autonomous Driving: Waymo Open Dataset. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2443--2451.Google Scholar
- Sourabh Vora, Alex H. Lang, Bassam Helou, and Oscar Beijbom. 2020. PointPainting: Sequential Fusion for 3D Object Detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4603--4611.Google Scholar
Cross Ref
- Yue Wang, Alireza Fathi, Abhijit Kundu, David A. Ross, Caroline Pantofaru, Thomas A. Funkhouser, and Justin Solomon. 2020. Pillar-Based Object Detection for Autonomous Driving. In European Conference on Computer Vision (ECCV), Vol. 12367. Springer, 18--34.Google Scholar
- Yizhou Wang, Yen-Ting Huang, and Jenq-Neng Hwang. 2019. Monocular Visual Object 3D Localization in Road Scenes. In MM '19: The 27th ACM International Conference on Multimedia. ACM, 917--925. Google Scholar
Digital Library
- Z. Wang, H. Fu, L. Wang, L. Xiao, and B. Dai. 2019. SCNet: Subdivision Coding Network for Object Detection Based on 3D Point Cloud. IEEE Access 7 (2019), 120449--120462.Google Scholar
Cross Ref
- Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. 2020. A Comprehensive Survey on Graph Neural Networks. IEEE Transactions on Neural Networks and Learning Systems (2020), 1--21.Google Scholar
- L. Xie, C. Xiang, Z. Yu, G. Xu, Z. Yang, D. Cai, and X. He. 2020. PI-RCNN: An Efficient Multi-Sensor 3D Object Detector with Point-Based Attentive Cont- Conv Fusion Module. In 2020 AAAI Conference on Artificial Intelligence (AAAI). 12460--12467.Google Scholar
- Yan Yan, Yuxing Mao, and Bo Li. 2018. SECOND: Sparsely Embedded Convolutional Detection. Sensors 18, 10 (2018), 3337--3354.Google Scholar
- Zetong Yang, Yanan Sun, Shu Liu, and Jiaya Jia. 2020. 3DSSD: Point-Based 3D Single Stage Object Detector. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11037--11045.Google Scholar
Cross Ref
- Zetong Yang, Yanan Sun, Shu Liu, Xiaoyong Shen, and Jiaya Jia. 2019. STD: Sparse-to-Dense 3D Object Detector for Point Cloud. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 1951--1960.Google Scholar
Cross Ref
- Dingfu Zhou, Jin Fang, Xibin Song, Liu Liu, Junbo Yin, Yuchao Dai, Hongdong Li, and Ruigang Yang. 2020. Joint 3D Instance Segmentation and Object Detection for Autonomous Driving. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1836--1846.Google Scholar
Cross Ref
- Yin Zhou, Pei Sun, Yu Zhang, Dragomir Anguelov, Jiyang Gao, Tom Ouyang, James Guo, Jiquan Ngiam, and Vijay Vasudevan. 2019. End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds. In 2019 Annual Conference on Robot Learning (CoRL), Vol. 100. 923--932.Google Scholar
- Yin Zhou and Oncel Tuzel. 2018. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4490--4499.Google Scholar
Cross Ref
Index Terms
- From Voxel to Point: IoU-guided 3D Object Detection for Point Cloud with Voxel-to-Point Decoder
Recommendations
On the Repeatability of 3D Point Cloud Segmentation Based on Interest Points
CRV '12: Proceedings of the 2012 Ninth Conference on Computer and Robot VisionObject recognition systems that use 3D point cloud as the input data are potentially subjected to the problems of signal attenuation at a local level, or occlusions in cluttered scenes. In an attempt to develop more robust methods in handling these ...
Semantic-aware 3D-voxel CenterNet for point cloud object detection
AbstractIt is still challenging to accurately detect the small point cloud objects in the density-varying scenes, such as autonomous driving scenes where the measurements in nearby regions are much more than they in farther-away regions. Most previous ...
Highlights- A voxel-wise class-aware segmentation branch is designed for the center location learning.
- A cluster-based strategy is applied to sample all potential objects.
- The tasks of proposal regression and center learning are decoupling.
3D Object Modeling and Segmentation Based on Edge-Point Matching with Local Descriptors
ISVC '08: Proceedings of the 4th International Symposium on Advances in Visual Computing3D object modeling is a crucial issue for environment recognition. A difficult problem is how to separate objects from the background clutter. This paper presents a method of 3D object modeling and segmentation from images for specific object ...





Comments