skip to main content
10.1145/3474085.3475314acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

From Voxel to Point: IoU-guided 3D Object Detection for Point Cloud with Voxel-to-Point Decoder

Authors Info & Claims
Published:17 October 2021Publication History

ABSTRACT

In this paper, we present an Intersection-over-Union (IoU) guided two-stage 3D object detector with a voxel-to-point decoder. To preserve the necessary information from all raw points and maintain the high box recall in voxel based Region Proposal Network (RPN), we propose a residual voxel-to-point decoder to extract the point features in addition to the map-view features from the voxel based RPN. We use a 3D Region of Interest (RoI) alignment to crop and align the features with the proposal boxes for accurately perceiving the object position. The RoI-Aligned features are finally aggregated with the corner geometry embeddings that can provide the potentially missing corner information in the box refinement stage. We propose a simple and efficient method to align the estimated IoUs to the refined proposal boxes as a more relevant localization confidence. The comprehensive experiments on KITTI and Waymo Open Dataset demonstrate that our method achieves significant improvements with novel architectures against the existing methods. The code is available on Github URLhttps://github.com/jialeli1/From-Voxel-to-Point .

References

  1. Alejandro Barrera, Carlos Guindel, Jorge Beltrán, and Fernando García. 2020. BirdNet+: End-to-End 3D Object Detection in LiDAR Bird's Eye View. In 23rd IEEE International Conference on Intelligent Transportation Systems (ITSC). 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  2. Qi Chen, Lin Sun, Zhixin Wang, Kui Jia, and Alan Yuille. 2020. Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots. In European Conference on Computer Vision (ECCV). 68--84.Google ScholarGoogle ScholarCross RefCross Ref
  3. Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, and Tian Xia. 2017. Multi-view 3D Object Detection Network for Autonomous Driving. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 6526--6534.Google ScholarGoogle ScholarCross RefCross Ref
  4. Yilun Chen, Shu Liu, Xiaoyong Shen, and Jiaya Jia. 2019. Fast Point R-CNN. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 9774--9783.Google ScholarGoogle Scholar
  5. Hang Dai, Shujie Luo, Yong Ding, and Ling Shao. 2020. Commands for autonomous vehicles by progressively stacking visual-linguistic representations. In European Conference on Computer Vision (ECCV). Springer, 27--32.Google ScholarGoogle ScholarCross RefCross Ref
  6. Liang Du, Xiaoqing Ye, Xiao Tan, Jianfeng Feng, Zhenbo Xu, Errui Ding, and Shilei Wen. 2020. Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13326--13335.Google ScholarGoogle ScholarCross RefCross Ref
  7. Juncong Fei, Wenbo Chen, Philipp Heidenreich, Sascha Wirges, and Christoph Stiller. 2020. SemanticVoxels: Sequential Fusion for 3D Pedestrian Detection using LiDAR Point Cloud and Semantic Segmentation. In IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems. 185--190.Google ScholarGoogle ScholarCross RefCross Ref
  8. Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence (CVPR). 3354--3361. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ben Graham. 2015. Sparse 3D Convolutional Neural Networks. In 2015 British Machine Vision Conference (BMVC). 150.1--150.9.Google ScholarGoogle ScholarCross RefCross Ref
  10. Benjamin Graham, Martin Engelcke, and Laurens van der Maaten. 2018. 3D Semantic Segmentation With Submanifold Sparse Convolutional Networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 9224--9232.Google ScholarGoogle Scholar
  11. Chenhang He, Hui Zeng, Jianqiang Huang, Xian-Sheng Hua, and Lei Zhang. 2020. Structure Aware Single-Stage 3D Object Detection From Point Cloud. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11870--11879.Google ScholarGoogle ScholarCross RefCross Ref
  12. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross B. Girshick. 2017. Mask R-CNN. In 2017 IEEE International Conference on Computer Vision (ICCV). 2980--2988.Google ScholarGoogle Scholar
  13. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778.Google ScholarGoogle Scholar
  14. Borui Jiang, Ruixuan Luo, Jiayuan Mao, Tete Xiao, and Yuning Jiang. 2018. Acquisition of Localization Confidence for Accurate Object Detection. In European Conference on Computer Vision (ECCV). 784--799.Google ScholarGoogle Scholar
  15. Jason Ku, Melissa Mozifian, Jungwook Lee, Ali Harakeh, and Steven L. Waslander. 2018. Joint 3D Proposal Generation and Object Detection from View Aggregation. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 1--8.Google ScholarGoogle Scholar
  16. Alex H. Lang, Sourabh Vora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. 2019. PointPillars: Fast Encoders for Object Detection From Point Clouds. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12689--12697.Google ScholarGoogle Scholar
  17. Jiale Li, Shujie Luo, Ziqi Zhu, Hang Dai, Andrey S Krylov, Yong Ding, and Ling Shao. 2020. 3D IoU-Net: IoU guided 3D object detector for point clouds. arXiv preprint arXiv:2004.04962 (2020).Google ScholarGoogle Scholar
  18. Jiale Li, Yu Sun, Shujie Luo, Ziqi Zhu, Hang Dai, Andrey S Krylov, Yong Ding, and Ling Shao. 2021. P2V-RCNN: Point to Voxel Feature Learning for 3D Object Detection from Point Clouds. IEEE Access (2021).Google ScholarGoogle Scholar
  19. P. Li, X. Chen, and S. Shen. 2019. Stereo R-CNN Based 3D Object Detection for Autonomous Driving. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7636--7644.Google ScholarGoogle Scholar
  20. Ming Liang, Bin Yang, Yun Chen, Rui Hu, and Raquel Urtasun. 2019. Multi-Task Multi-Sensor Fusion for 3D Object Detection. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7337--7345.Google ScholarGoogle ScholarCross RefCross Ref
  21. Ming Liang, Bin Yang, Shenlong Wang, and Raquel Urtasun. 2018. Deep Continuous Fusion for Multi-sensor 3D Object Detection. In European Conference on Computer Vision (ECCV). 641--656.Google ScholarGoogle ScholarCross RefCross Ref
  22. Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He, and Piotr Dollár. 2020. Focal Loss for Dense Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 2 (2020), 318--327.Google ScholarGoogle ScholarCross RefCross Ref
  23. Xinhai Liu, Zhizhong Han, Xin Wen, Yu-Shen Liu, and Matthias Zwicker. 2019. L2G Auto-encoder: Understanding Point Clouds by Local-to-Global Reconstruction with Hierarchical Self-Attention. In MM '19: The 27th ACM International Conference on Multimedia. ACM, 989--997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Zhe Liu, Xin Zhao, Tengteng Huang, Ruolan Hu, Yu Zhou, and Xiang Bai. 2020. TANet: Robust 3D Object Detection from Point Clouds with Triple Attention. In 2020 AAAI Conference on Artificial Intelligence (AAAI). 11677--11684.Google ScholarGoogle ScholarCross RefCross Ref
  25. Ilya Loshchilov and Frank Hutter. 2017. Fixing Weight Decay Regularization in Adam. CoRR abs/1711.05101 (2017). arXiv:1711.05101Google ScholarGoogle Scholar
  26. Shujie Luo, Hang Dai, Ling Shao, and Yong Ding. 2020. C4AV: Learning Cross-Modal Representations from Transformers. In European Conference on Computer Vision (ECCV). Springer, 33--38.Google ScholarGoogle ScholarCross RefCross Ref
  27. Shujie Luo, Hang Dai, Ling Shao, and Yong Ding. 2021. M3DSSD: Monocular 3D single stage object detector. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6145--6154.Google ScholarGoogle ScholarCross RefCross Ref
  28. Charles R. Qi, Wei Liu, Chenxia Wu, Hao Su, and Leonidas J. Guibas. 2018. Frustum PointNets for 3D Object Detection from RGB-D Data. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 918--927.Google ScholarGoogle Scholar
  29. Charles Ruizhongtai Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 77--85.Google ScholarGoogle Scholar
  30. Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J. Guibas. 2017. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Conference on Neural Information Processing Systems (NeurIPS). 5099--5108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2017), 1137--1149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In 2015 Medical Image Computing and Computer-Assisted Intervention (MICCAI), Vol. 9351. 234--241.Google ScholarGoogle ScholarCross RefCross Ref
  33. Shaoshuai Shi, Chaoxu Guo, Li Jiang, Zhe Wang, Jianping Shi, Xiaogang Wang, and Hongsheng Li. 2020. PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10526--10535.Google ScholarGoogle ScholarCross RefCross Ref
  34. Shaoshuai Shi, Xiaogang Wang, and Hongsheng Li. 2019. PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 770--779.Google ScholarGoogle ScholarCross RefCross Ref
  35. Shaoshuai Shi, Zhe Wang, Jianping Shi, Xiaogang Wang, and Hongsheng Li. 2020. From Points to Parts: 3D Object Detection from Point Cloud with Part-aware and Part-aggregation Network. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020), 1--1.Google ScholarGoogle Scholar
  36. Weijing Shi and Raj Rajkumar. 2020. Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1708--1716.Google ScholarGoogle ScholarCross RefCross Ref
  37. Leslie N. Smith and Nicholay Topin. 2019. Super-convergence: Very fast training of neural networks using large learning rates. In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, Vol. 11006. 1100612.Google ScholarGoogle Scholar
  38. Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Yu Zhang, Jonathon Shlens, Zhifeng Chen, and Dragomir Anguelov. 2020. Scalability in Perception for Autonomous Driving: Waymo Open Dataset. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2443--2451.Google ScholarGoogle Scholar
  39. Sourabh Vora, Alex H. Lang, Bassam Helou, and Oscar Beijbom. 2020. PointPainting: Sequential Fusion for 3D Object Detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4603--4611.Google ScholarGoogle ScholarCross RefCross Ref
  40. Yue Wang, Alireza Fathi, Abhijit Kundu, David A. Ross, Caroline Pantofaru, Thomas A. Funkhouser, and Justin Solomon. 2020. Pillar-Based Object Detection for Autonomous Driving. In European Conference on Computer Vision (ECCV), Vol. 12367. Springer, 18--34.Google ScholarGoogle Scholar
  41. Yizhou Wang, Yen-Ting Huang, and Jenq-Neng Hwang. 2019. Monocular Visual Object 3D Localization in Road Scenes. In MM '19: The 27th ACM International Conference on Multimedia. ACM, 917--925. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Z. Wang, H. Fu, L. Wang, L. Xiao, and B. Dai. 2019. SCNet: Subdivision Coding Network for Object Detection Based on 3D Point Cloud. IEEE Access 7 (2019), 120449--120462.Google ScholarGoogle ScholarCross RefCross Ref
  43. Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. 2020. A Comprehensive Survey on Graph Neural Networks. IEEE Transactions on Neural Networks and Learning Systems (2020), 1--21.Google ScholarGoogle Scholar
  44. L. Xie, C. Xiang, Z. Yu, G. Xu, Z. Yang, D. Cai, and X. He. 2020. PI-RCNN: An Efficient Multi-Sensor 3D Object Detector with Point-Based Attentive Cont- Conv Fusion Module. In 2020 AAAI Conference on Artificial Intelligence (AAAI). 12460--12467.Google ScholarGoogle Scholar
  45. Yan Yan, Yuxing Mao, and Bo Li. 2018. SECOND: Sparsely Embedded Convolutional Detection. Sensors 18, 10 (2018), 3337--3354.Google ScholarGoogle Scholar
  46. Zetong Yang, Yanan Sun, Shu Liu, and Jiaya Jia. 2020. 3DSSD: Point-Based 3D Single Stage Object Detector. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11037--11045.Google ScholarGoogle ScholarCross RefCross Ref
  47. Zetong Yang, Yanan Sun, Shu Liu, Xiaoyong Shen, and Jiaya Jia. 2019. STD: Sparse-to-Dense 3D Object Detector for Point Cloud. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 1951--1960.Google ScholarGoogle ScholarCross RefCross Ref
  48. Dingfu Zhou, Jin Fang, Xibin Song, Liu Liu, Junbo Yin, Yuchao Dai, Hongdong Li, and Ruigang Yang. 2020. Joint 3D Instance Segmentation and Object Detection for Autonomous Driving. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1836--1846.Google ScholarGoogle ScholarCross RefCross Ref
  49. Yin Zhou, Pei Sun, Yu Zhang, Dragomir Anguelov, Jiyang Gao, Tom Ouyang, James Guo, Jiquan Ngiam, and Vijay Vasudevan. 2019. End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds. In 2019 Annual Conference on Robot Learning (CoRL), Vol. 100. 923--932.Google ScholarGoogle Scholar
  50. Yin Zhou and Oncel Tuzel. 2018. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4490--4499.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. From Voxel to Point: IoU-guided 3D Object Detection for Point Cloud with Voxel-to-Point Decoder

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          MM '21: Proceedings of the 29th ACM International Conference on Multimedia
          October 2021
          5796 pages
          ISBN:9781450386517
          DOI:10.1145/3474085

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 October 2021

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate995of4,171submissions,24%

          Upcoming Conference

          MM '24
          MM '24: The 32nd ACM International Conference on Multimedia
          October 28 - November 1, 2024
          Melbourne , VIC , Australia

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader