skip to main content
research-article

Multi-feature Fusion VoteNet for 3D Object Detection

Authors Info & Claims
Published:27 January 2022Publication History
Skip Abstract Section

Abstract

In this article, we propose a Multi-feature Fusion VoteNet (MFFVoteNet) framework for improving the 3D object detection performance in cluttered and heavily occluded scenes. Our method takes the point cloud and the synchronized RGB image as inputs to provide object detection results in 3D space. Our detection architecture is built on VoteNet with three key designs. First, we augment the VoteNet input with point color information to enhance the difference of various instances in a scene. Next, we integrate an image feature module into the VoteNet to provide a strong object class signal that can facilitate deterministic detections in occlusion. Moreover, we propose a Projection Non-Maximum Suppression (PNMS) method in 3D object detection to eliminate redundant proposals and hence provide more accurate positioning of 3D objects. We evaluate the proposed MFFVoteNet on two challenging 3D object detection datasets, i.e., ScanNetv2 and SUN RGB-D. Extensive experiments show that our framework can effectively improve the performance of 3D object detection.

REFERENCES

  1. [1] Bodla Navaneeth, Singh Bharat, Chellappa Rama, and Davis Larry S.. 2017. Soft-NMS–improving object detection with one line of code. In Proceedings of the IEEE International Conference on Computer Vision. 55615569.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Chen Jintai, Lei Biwen, Song Qingyu, Ying Haochao, Chen Danny Z., and Wu Jian. 2020. A hierarchical graph network for 3D object detection on point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 392401.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Chen Xiaozhi, Kundu Kaustav, Zhang Ziyu, Ma Huimin, Fidler Sanja, and Urtasun Raquel. 2016. Monocular 3D object detection for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 21472156.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Chen Xiaozhi, Kundu Kaustav, Zhu Yukun, Berneshawi Andrew G., Ma Huimin, Fidler Sanja, and Urtasun Raquel. 2015. 3D object proposals for accurate object class detection. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 424432. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Chen Xiaozhi, Ma Huimin, Wan Ji, Li Bo, and Xia Tian. 2017. Multi-view 3D object detection network for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 19071915.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Deng Jia, Dong Wei, Socher Richard, Li Li-Jia, Li Kai, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248255.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Drost Bertram and Ilic Slobodan. 2012. 3D object detection and localization using multimodal point pair features. In Proceedings of the 2nd International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission. IEEE, 916. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Espinace Pablo, Kollar Thomas, Soto Alvaro, and Roy Nicholas. 2010. Indoor scene recognition through object detection. In Proceedings of the IEEE International Conference on Robotics and Automation. IEEE, 14061413.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Georgakis Georgios, Mousavian Arsalan, Berg Alexander C., and Kosecka Jana. 2017. Synthesizing training data for object detection in indoor scenes. arXiv preprint arXiv:1702.07836 (2017).Google ScholarGoogle Scholar
  10. [10] Gevaert C. M., Persello C., Sliuzas R., and Vosselman G.. 2017. Informal settlement classification using point-cloud and image-based features from UAV data. ISPRS J. Photogram. Rem. Sens. 125 (2017), 225236.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Han Wei, Khorrami Pooya, Paine Tom Le, Ramachandran Prajit, Babaeizadeh Mohammad, Shi Honghui, Li Jianan, Yan Shuicheng, and Huang Thomas S.. 2016. Seq-NMS for video object detection. arXiv preprint arXiv:1602.08465 (2016).Google ScholarGoogle Scholar
  12. [12] He Kaiming, Gkioxari Georgia, Dollár Piotr, and Girshick Ross. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 29612969.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Hong Chaoqun, Yu Jun, Wan Jian, Tao Dacheng, and Wang Meng. 2015. Multimodal deep autoencoder for human pose recovery. IEEE Trans. Image Process. 24, 12 (2015), 56595670.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Hou Ji, Dai Angela, and Nießner Matthias. 2019. 3D-SIS: 3D semantic instance segmentation of RGB-D scans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 44214430.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Hu Jie, Shen Li, and Sun Gang. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 71327141.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Ji Haowei and Luo Xianqi. 2019. 3D scene reconstruction of landslide topography based on data fusion between laser point cloud and UAV image. Environ. Earth Sci. 78, 17 (2019), 112.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Lahoud Jean and Ghanem Bernard. 2017. 2D-driven 3D object detection in RGB-D images. In Proceedings of the IEEE International Conference on Computer Vision. 46224630.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Li Bo, Zhang Tianlei, and Xia Tian. 2016. Vehicle detection from 3D lidar using fully convolutional network. arXiv preprint arXiv:1608.07916 (2016).Google ScholarGoogle Scholar
  20. [20] Lin Dahua, Fidler Sanja, and Urtasun Raquel. 2013. Holistic scene understanding for 3D object detection with RGB-D cameras. In Proceedings of the IEEE International Conference on Computer Vision. 14171424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Lin Tsung-Yi, Dollár Piotr, Girshick Ross, He Kaiming, Hariharan Bharath, and Belongie Serge. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 21172125.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Liu Weiping, Sun Jia, Li Wanyi, Hu Ting, and Wang Peng. 2019. Deep learning on point clouds and its application: A survey. Sensors 19, 19 (2019), 4188. DOI: https://doi.org/10.3390/s19194188Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Mauri Antoine, Khemmar Redouane, Decoux Benoit, Ragot Nicolas, Rossi Romain, Trabelsi Rim, Boutteau Rémi, Ertaud Jean-Yves, and Savatier Xavier. 2020. Deep learning for real-time 3D multi-object detection, localisation, and tracking: Application to smart mobility. Sensors 20, 2 (2020), 532. DOI: https://doi.org/10.3390/s20020532Google ScholarGoogle Scholar
  24. [24] Meynet Gabriel, Nehmé Yana, Digne Julie, and Lavoué Guillaume. 2020. PCQM: A full-reference quality metric for colored 3D point clouds. In Proceedings of the 12th International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Pang Guan and Neumann Ulrich. 2016. 3D point cloud object detection with multi-view convolutional neural network. In Proceedings of the 23rd International Conference on Pattern Recognition (ICPR). IEEE, 585590.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Park Jaesik, Zhou Qian-Yi, and Koltun Vladlen. 2017. Colored point cloud registration revisited. In Proceedings of the IEEE International Conference on Computer Vision. 143152.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Paszke Adam, Gross Sam, Massa Francisco, Lerer Adam, Bradbury James, Chanan Gregory, Killeen Trevor, Lin Zeming, Gimelshein Natalia, Antiga Luca, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 80268037. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Qi Charles R., Chen Xinlei, Litany Or, and Guibas Leonidas J.. 2020. Imvotenet: Boosting 3d object detection in point clouds with image votes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 44044413.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Qi Charles R., Litany Or, He Kaiming, and Guibas Leonidas J.. 2019. Deep Hough voting for 3D object detection in point clouds. In Proceedings of the IEEE International Conference on Computer Vision. 92779286.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Qi Charles R., Liu Wei, Wu Chenxia, Su Hao, and Guibas Leonidas J.. 2018. Frustum pointnets for 3D object detection from RGB-D data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 918927.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Qi Charles R., Su Hao, Mo Kaichun, and Guibas Leonidas J.. 2017. Pointnet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 652660.Google ScholarGoogle Scholar
  32. [32] Qi Charles Ruizhongtai, Yi Li, Su Hao, and Guibas Leonidas J.. 2017. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Proceedings of the Conference on Advances in neural information processing systems. 50995108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Redmon Joseph, Divvala Santosh, Girshick Ross, and Farhadi Ali. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779788.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Ren Zhile and Sudderth Erik B.. 2016. Three-dimensional object detection and layout prediction using clouds of oriented gradients. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 15251533.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Schneiderman Henry and Kanade Takeo. 2000. A statistical method for 3D object detection applied to faces and cars. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition.IEEE, 746751. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Shi Shaoshuai, Wang Xiaogang, and Li Hongsheng. 2019. PointRCNN: 3D object proposal generation and detection from point cloud. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770779.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Song Shuran and Xiao Jianxiong. 2016. Deep sliding shapes for amodal 3D object detection in RGB-D images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 808816.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Wang Jun, Gu Dongxiao, Yu Zeyun, Tan Changbai, and Zhou Laishui. 2012. A framework for 3D model reconstruction in reverse engineering. Comput. Industr. Eng. 63, 4 (2012), 11891200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Wang Jun, Xie Qian, Xu Yabin, Zhou Laishui, and Ye Nan. 2016. Cluttered indoor scene modeling via functional part-guided graph matching. Comput-aided Geom. Des. 43 (2016), 8294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Wang Jun, Xu Kai, Liu Ligang, Cao Junjie, Liu Shengjun, Yu Zeyun, and Gu Xianfeng David. 2013. Consolidation of low-quality point clouds from outdoor scenes. In Computer Graphics Forum, Vol. 32. Wiley Online Library, 207216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Wang Jun, Yu Zeyun, Zhang Weizhong, Wei Mingqiang, Tan Changbai, Dai Ning, and Zhang Xi. 2014. Robust reconstruction of 2D curves from scattered noisy point data. Comput.-aided Des. 50 (2014), 2740. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Wang Jun, Yu Z., Zhu W, and Cao J.. 2013. Feature-preserving surface reconstruction from unoriented, noisy point data. In Computer Graphics Forum, Vol. 32. Wiley Online Library, 164176.Google ScholarGoogle Scholar
  43. [43] Wengefeld Tim, Lewandowski Benjamin, Seichter Daniel, Pfennig Lennard, and Gross Horst-Michael. 2019. Real-time person orientation estimation using colored pointclouds. In Proceedings of the European Conference on Mobile Robots (ECMR). IEEE, 17.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Woo Sanghyun, Park Jongchan, Lee Joon-Young, and Kweon In So. 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV). 319.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Xiang Yu, Mottaghi Roozbeh, and Savarese Silvio. 2014. Beyond PASCAL: A benchmark for 3D object detection in the wild. In IEEE Winter Conference on Applications of Computer Vision. IEEE, 7582.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Xie Qian, Lai Yu-Kun, Wu Jing, Wang Zhoutao, Zhang Yiming, Xu Kai, and Wang Jun. 2020. MLVCNet: Multi-level context votenet for 3D object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1044710456.Google ScholarGoogle Scholar
  47. [47] Xie Qian, Remil Oussama, Guo Yanwen, Wang Meng, Wei Mingqiang, and Wang Jun. 2017. Object detection and tracking under occlusion for object-level RGB-D video segmentation. IEEE Trans. Multim. 20, 3 (2017), 580592. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Yang Yi, Song Jingkuan, Huang Zi, Ma Zhigang, Sebe Nicu, and Hauptmann Alexander G.. 2012. Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Trans. Multim. 15, 3 (2012), 572581. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Yi Cheng, Zhang Yuan, Wu Qiaoyun, Xu Yabin, Remil Oussama, Wei Mingqiang, and Wang Jun. 2017. Urban building reconstruction from raw LiDAR point data. Comput.-aided Des. 93 (2017), 114.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Yi Li, Zhao Wang, Wang He, Sung Minhyuk, and Guibas Leonidas J.. 2019. GSPN: Generative shape proposal network for 3D instance segmentation in point cloud. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 39473956.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Yu Jun, Rui Yong, and Chen Bo. 2013. Exploiting click constraints and multi-view features for image re-ranking. IEEE Trans. Multim. 16, 1 (2013), 159168.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Yu Jun, Rui Yong, and Tao Dacheng. 2014. Click prediction for web image reranking using multimodal sparse coding. IEEE Trans. Image Process. 23, 5 (2014), 20192032.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Yu Jun, Tao Dacheng, Wang Meng, and Rui Yong. 2014. Learning to rank using user clicks and visual features for image retrieval. IEEE Trans. Cyber. 45, 4 (2014), 767779.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Zhang Jie, Cao Junjie, Liu Xiuping, Wang Jun, Liu Jian, and Shi Xiquan. 2013. Point cloud normal estimation via low-rank subspace clustering. Comput. Graph. 37, 6 (2013), 697706. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] Zhou Yin and Tuzel Oncel. 2018. VoxelNet: End-to-end learning for point cloud based 3D object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 44904499.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Multi-feature Fusion VoteNet for 3D Object Detection

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 1
      January 2022
      517 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3505205
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 January 2022
      • Accepted: 1 April 2021
      • Revised: 1 March 2021
      • Received: 1 December 2020
      Published in tomm Volume 18, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!