Abstract
In this article, we propose a Multi-feature Fusion VoteNet (MFFVoteNet) framework for improving the 3D object detection performance in cluttered and heavily occluded scenes. Our method takes the point cloud and the synchronized RGB image as inputs to provide object detection results in 3D space. Our detection architecture is built on VoteNet with three key designs. First, we augment the VoteNet input with point color information to enhance the difference of various instances in a scene. Next, we integrate an image feature module into the VoteNet to provide a strong object class signal that can facilitate deterministic detections in occlusion. Moreover, we propose a Projection Non-Maximum Suppression (PNMS) method in 3D object detection to eliminate redundant proposals and hence provide more accurate positioning of 3D objects. We evaluate the proposed MFFVoteNet on two challenging 3D object detection datasets, i.e., ScanNetv2 and SUN RGB-D. Extensive experiments show that our framework can effectively improve the performance of 3D object detection.
- [1] . 2017. Soft-NMS–improving object detection with one line of code. In Proceedings of the IEEE International Conference on Computer Vision. 5561–5569.Google Scholar
Cross Ref
- [2] . 2020. A hierarchical graph network for 3D object detection on point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 392–401.Google Scholar
Cross Ref
- [3] . 2016. Monocular 3D object detection for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2147–2156.Google Scholar
Cross Ref
- [4] . 2015. 3D object proposals for accurate object class detection. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 424–432. Google Scholar
Digital Library
- [5] . 2017. Multi-view 3D object detection network for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1907–1915.Google Scholar
Cross Ref
- [6] . 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248–255.Google Scholar
Cross Ref
- [7] . 2012. 3D object detection and localization using multimodal point pair features. In Proceedings of the 2nd International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission. IEEE, 9–16. Google Scholar
Digital Library
- [8] . 2010. Indoor scene recognition through object detection. In Proceedings of the IEEE International Conference on Robotics and Automation. IEEE, 1406–1413.Google Scholar
Cross Ref
- [9] . 2017. Synthesizing training data for object detection in indoor scenes. arXiv preprint arXiv:1702.07836 (2017).Google Scholar
- [10] . 2017. Informal settlement classification using point-cloud and image-based features from UAV data. ISPRS J. Photogram. Rem. Sens. 125 (2017), 225–236.Google Scholar
Cross Ref
- [11] . 2016. Seq-NMS for video object detection. arXiv preprint arXiv:1602.08465 (2016).Google Scholar
- [12] . 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 2961–2969.Google Scholar
Cross Ref
- [13] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google Scholar
Cross Ref
- [14] . 2015. Multimodal deep autoencoder for human pose recovery. IEEE Trans. Image Process. 24, 12 (2015), 5659–5670.Google Scholar
Digital Library
- [15] . 2019. 3D-SIS: 3D semantic instance segmentation of RGB-D scans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4421–4430.Google Scholar
Cross Ref
- [16] . 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.Google Scholar
Cross Ref
- [17] . 2019. 3D scene reconstruction of landslide topography based on data fusion between laser point cloud and UAV image. Environ. Earth Sci. 78, 17 (2019), 1–12.Google Scholar
Cross Ref
- [18] . 2017. 2D-driven 3D object detection in RGB-D images. In Proceedings of the IEEE International Conference on Computer Vision. 4622–4630.Google Scholar
Cross Ref
- [19] . 2016. Vehicle detection from 3D lidar using fully convolutional network. arXiv preprint arXiv:1608.07916 (2016).Google Scholar
- [20] . 2013. Holistic scene understanding for 3D object detection with RGB-D cameras. In Proceedings of the IEEE International Conference on Computer Vision. 1417–1424. Google Scholar
Digital Library
- [21] . 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2117–2125.Google Scholar
Cross Ref
- [22] . 2019. Deep learning on point clouds and its application: A survey. Sensors 19, 19 (2019), 4188.
DOI: https://doi.org/10.3390/s19194188Google ScholarCross Ref
- [23] . 2020. Deep learning for real-time 3D multi-object detection, localisation, and tracking: Application to smart mobility. Sensors 20, 2 (2020), 532.
DOI: https://doi.org/10.3390/s20020532Google Scholar - [24] . 2020. PCQM: A full-reference quality metric for colored 3D point clouds. In Proceedings of the 12th International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 1–6.Google Scholar
Cross Ref
- [25] . 2016. 3D point cloud object detection with multi-view convolutional neural network. In Proceedings of the 23rd International Conference on Pattern Recognition (ICPR). IEEE, 585–590.Google Scholar
Cross Ref
- [26] . 2017. Colored point cloud registration revisited. In Proceedings of the IEEE International Conference on Computer Vision. 143–152.Google Scholar
Cross Ref
- [27] . 2019. PyTorch: An imperative style, high-performance deep learning library. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 8026–8037. Google Scholar
Digital Library
- [28] . 2020. Imvotenet: Boosting 3d object detection in point clouds with image votes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4404–4413.Google Scholar
Cross Ref
- [29] . 2019. Deep Hough voting for 3D object detection in point clouds. In Proceedings of the IEEE International Conference on Computer Vision. 9277–9286.Google Scholar
Cross Ref
- [30] . 2018. Frustum pointnets for 3D object detection from RGB-D data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 918–927.Google Scholar
Cross Ref
- [31] . 2017. Pointnet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 652–660.Google Scholar
- [32] . 2017. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Proceedings of the Conference on Advances in neural information processing systems. 5099–5108. Google Scholar
Digital Library
- [33] . 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779–788.Google Scholar
Cross Ref
- [34] . 2016. Three-dimensional object detection and layout prediction using clouds of oriented gradients. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1525–1533.Google Scholar
Cross Ref
- [35] . 2000. A statistical method for 3D object detection applied to faces and cars. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition.IEEE, 746–751. Google Scholar
Digital Library
- [36] . 2019. PointRCNN: 3D object proposal generation and detection from point cloud. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–779.Google Scholar
Cross Ref
- [37] . 2016. Deep sliding shapes for amodal 3D object detection in RGB-D images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 808–816.Google Scholar
Cross Ref
- [38] . 2012. A framework for 3D model reconstruction in reverse engineering. Comput. Industr. Eng. 63, 4 (2012), 1189–1200. Google Scholar
Digital Library
- [39] . 2016. Cluttered indoor scene modeling via functional part-guided graph matching. Comput-aided Geom. Des. 43 (2016), 82–94. Google Scholar
Digital Library
- [40] . 2013. Consolidation of low-quality point clouds from outdoor scenes. In Computer Graphics Forum, Vol. 32. Wiley Online Library, 207–216. Google Scholar
Digital Library
- [41] . 2014. Robust reconstruction of 2D curves from scattered noisy point data. Comput.-aided Des. 50 (2014), 27–40. Google Scholar
Digital Library
- [42] . 2013. Feature-preserving surface reconstruction from unoriented, noisy point data. In Computer Graphics Forum, Vol. 32. Wiley Online Library, 164–176.Google Scholar
- [43] . 2019. Real-time person orientation estimation using colored pointclouds. In Proceedings of the European Conference on Mobile Robots (ECMR). IEEE, 1–7.Google Scholar
Cross Ref
- [44] . 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV). 3–19.Google Scholar
Digital Library
- [45] . 2014. Beyond PASCAL: A benchmark for 3D object detection in the wild. In IEEE Winter Conference on Applications of Computer Vision. IEEE, 75–82.Google Scholar
Cross Ref
- [46] . 2020. MLVCNet: Multi-level context votenet for 3D object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10447–10456.Google Scholar
- [47] . 2017. Object detection and tracking under occlusion for object-level RGB-D video segmentation. IEEE Trans. Multim. 20, 3 (2017), 580–592. Google Scholar
Digital Library
- [48] . 2012. Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Trans. Multim. 15, 3 (2012), 572–581. Google Scholar
Digital Library
- [49] . 2017. Urban building reconstruction from raw LiDAR point data. Comput.-aided Des. 93 (2017), 1–14.Google Scholar
Cross Ref
- [50] . 2019. GSPN: Generative shape proposal network for 3D instance segmentation in point cloud. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3947–3956.Google Scholar
Cross Ref
- [51] . 2013. Exploiting click constraints and multi-view features for image re-ranking. IEEE Trans. Multim. 16, 1 (2013), 159–168.Google Scholar
Cross Ref
- [52] . 2014. Click prediction for web image reranking using multimodal sparse coding. IEEE Trans. Image Process. 23, 5 (2014), 2019–2032.Google Scholar
Cross Ref
- [53] . 2014. Learning to rank using user clicks and visual features for image retrieval. IEEE Trans. Cyber. 45, 4 (2014), 767–779.Google Scholar
Cross Ref
- [54] . 2013. Point cloud normal estimation via low-rank subspace clustering. Comput. Graph. 37, 6 (2013), 697–706. Google Scholar
Digital Library
- [55] . 2018. VoxelNet: End-to-end learning for point cloud based 3D object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4490–4499.Google Scholar
Cross Ref
Index Terms
Multi-feature Fusion VoteNet for 3D Object Detection
Recommendations
3D object detection based on the fusion of projected point cloud and image features
EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer EngineeringThe complementary advantages of point cloud and image can provide more accurate 3D and semantic information to the model. Aiming at the problems that most existing methods adopt a single fusion strategy and thus fail to achieve deep fusion of image and ...
Multi-modal feature fusion for 3D object detection in the production workshop
Abstract3D object detection technology is of great significance to realize intelligent perception and ensure the production safety of a workshop. Existing 3D object detection relies on large-scale, high-quality 3D annotation data and is ...
Highlights- A production workshop object dataset (PWOD) with RGB and depth image samples is established.
Deep multi-scale and multi-modal fusion for 3D object detection
Highlights- We propose a multi-scale feature fusion method from different resolution feature maps for 3D object detection.
AbstractThe perception of 3D objects in the scene is the basis of autonomous driving. Most autonomous driving cars are equipped with cameras and Lidar to obtain 3D spatial information. RGB images taken from the camera and point cloud produced ...






Comments