skip to main content
research-article

Cyclic Self-attention for Point Cloud Recognition

Authors Info & Claims
Published:23 January 2023Publication History
Skip Abstract Section

Abstract

Point clouds provide a flexible geometric representation for computer vision research. However, the harsh demands for the number of input points and computer hardware are still significant challenges, which hinder their deployment in real applications. To address these challenges, we design a simple and effective module named cyclic self-attention module (CSAM). Specifically, three attention maps of the same input are obtained by cyclically pairing the feature maps, thus exploring the features sufficiently of the attention space of the original input. CSAM can adequately explore the correlation between points to obtain sufficient feature information despite the multiplicative decrease in inputs. Meanwhile, it can direct the computational power to the more essential features, relieving the burden on the computer hardware. We build a point cloud classification network by simply stacking CSAM called cyclic self-attention network (CSAN). We also propose a novel framework for point cloud semantic segmentation called full cyclic self-attention network (FCSAN). By adaptively fusing the original mapping features and the CSAM extracted features, it can better capture the context information of point clouds. Extensive experiments on several benchmark datasets show that our methods can achieve competitive performance in classification and segmentation tasks.

REFERENCES

  1. [1] Ainam Jean-Paul, Qin Ke, Liu Guisong, Luo Guangchun, and Agyemang Brighter. 2020. Enforcing affinity feature learning through self-attention for person re-identification. ACM Trans. Multim. Comput. Commun. Applic. 16, 1 (2020), 122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Armeni Iro, Sener Ozan, Zamir Amir R., Jiang Helen, Brilakis Ioannis, Fischer Martin, and Savarese Silvio. 2016. 3D semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 15341543.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Ben-Shabat Yizhak, Lindenbaum Michael, and Fischer Anath. 2018. 3DmFV: Three-dimensional point cloud classification in real-time using convolutional neural networks. IEEE Robot. Automat. Lett. 3, 4 (2018), 31453152.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Boulch Alexandre. 2020. ConvPoint: Continuous convolutions for point cloud processing. Comput. Graph. 88 (2020), 2434.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Carion Nicolas, Massa Francisco, Synnaeve Gabriel, Usunier Nicolas, Kirillov Alexander, and Zagoruyko Sergey. 2020. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision. Springer, 213229.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Chen Chao, Li Guanbin, Xu Ruijia, Chen Tianshui, Wang Meng, and Lin Liang. 2019. ClusterNet: Deep hierarchical cluster network with rigorously rotation-invariant representation for point cloud analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 49945002.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Chen Xuzhan, Chen Youping, and Najjaran Homayoun. 2017. 3D object classification with point convolution network. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 783788.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Dosovitskiy Alexey, Beyer Lucas, Kolesnikov Alexander, Weissenborn Dirk, Zhai Xiaohua, Unterthiner Thomas, Dehghani Mostafa, Minderer Matthias, Heigold Georg, Gelly Sylvain, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).Google ScholarGoogle Scholar
  9. [9] Engelmann Francis, Kontogianni Theodora, Schult Jonas, and Leibe Bastian. 2018. Know what your neighbors do: 3D semantic segmentation of point clouds. In Proceedings of the European Conference on Computer Vision (ECCV). 00.Google ScholarGoogle Scholar
  10. [10] Feng Mingtao, Zhang Liang, Lin Xuefei, Gilani Syed Zulqarnain, and Mian Ajmal. 2020. Point attention network for semantic segmentation of 3D point clouds. Patt. Recog. 107 (2020), 107446.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Feng Yifan, Zhang Zizhao, Zhao Xibin, Ji Rongrong, and Gao Yue. 2018. GVCNN: Group-view convolutional neural networks for 3D shape recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 264272.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Gao Yongbin, Liu Xuebing, Li Jun, Fang Zhijun, Jiang Xiaoyan, and Huq Kazi Mohammed Saidul. 2022. LFT-Net: Local feature transformer network for point clouds analysis. IEEE Trans. Intell. Transport. Syst. (2022).Google ScholarGoogle Scholar
  13. [13] Goyal Ankit, Law Hei, Liu Bowei, Newell Alejandro, and Deng Jia. 2021. Revisiting point cloud shape classification with a simple and effective baseline. In Proceedings of the International Conference on Machine Learning. PMLR, 38093820.Google ScholarGoogle Scholar
  14. [14] Guo Meng-Hao, Cai Jun-Xiong, Liu Zheng-Ning, Mu Tai-Jiang, Martin Ralph R., and Hu Shi-Min. 2020. PCT: Point cloud transformer. arXiv preprint arXiv:2012.09688 (2020).Google ScholarGoogle Scholar
  15. [15] Han Wenkai, Wen Chenglu, Wang Cheng, Li Xin, and Li Qing. 2020. Point2Node: Correlation learning of dynamic-node for point cloud feature modeling. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 1092510932.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Huang Feiran, Wei Kaimin, Weng Jian, and Li Zhoujun. 2020. Attention-based modality-gated networks for image-text sentiment analysis. ACM Trans. Multim. Comput. Commun. Applic. 16, 3 (2020), 119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Huang Qiangui, Wang Weiyue, and Neumann Ulrich. 2018. Recurrent slice networks for 3D segmentation of point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 26262635.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Kaul Chaitanya, Pears Nick, and Manandhar Suresh. 2021. FatNet: A feature-attentive network for 3D point cloud processing. In Proceedings of the 25th International Conference on Pattern Recognition (ICPR). IEEE, 72117218.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Khan Salman, Naseer Muzammal, Hayat Munawar, Zamir Syed Waqas, Khan Fahad Shahbaz, and Shah Mubarak. 2021. Transformers in vision: A survey. ACM Comput. Surv. (2021).Google ScholarGoogle Scholar
  20. [20] Le Truc and Duan Ye. 2018. PointGrid: A deep network for 3D shape understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 92049214.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Li Yangyan, Bu Rui, Sun Mingchao, Wu Wei, Di Xinhan, and Chen Baoquan. 2018. PointCNN: Convolution on x-transformed points. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 820830.Google ScholarGoogle Scholar
  22. [22] Lin Kevin, Wang Lijuan, and Liu Zicheng. 2021. End-to-end human pose and mesh reconstruction with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19541963.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Liu Jinxian, Ni Bingbing, Li Caiyuan, Yang Jiancheng, and Tian Qi. 2019. Dynamic points agglomeration for hierarchical point sets learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 75467555.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Liu Yongcheng, Fan Bin, Xiang Shiming, and Pan Chunhong. 2019. Relation-shape convolutional neural network for point cloud analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 88958904.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Liu Zhijian, Tang Haotian, Lin Yujun, and Han Song. 2019. Point-voxel CNN for efficient 3D deep learning. Adv. Neural Inf. Process. Syst. 32 (2019).Google ScholarGoogle Scholar
  26. [26] Liu Zhe, Zhao Xin, Huang Tengteng, Hu Ruolan, Zhou Yu, and Bai Xiang. 2020. TANet: Robust 3D object detection from point clouds with triple attention. In Proceedings of the AAAI Conference on Artificial Intelligence. 1167711684.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Loshchilov Ilya and Hutter Frank. 2016. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016).Google ScholarGoogle Scholar
  28. [28] Mao Jiageng, Wang Xiaogang, and Li Hongsheng. 2019. Interpolated convolutional networks for 3D point cloud understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 15781587.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Maturana Daniel and Scherer Sebastian. 2015. Voxnet: A 3D convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 922928.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Misra Ishan, Girdhar Rohit, and Joulin Armand. 2021. An end-to-end transformer model for 3D object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 29062917.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Nie Weizhi, Liang Qi, Wang Yixin, Wei Xing, and Su Yuting. 2020. MMFN: Multimodal information fusion networks for 3D model classification and retrieval. ACM Trans. Multim. Comput. Commun. Applic. 16, 4 (2020), 122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Qi Charles R., Su Hao, Mo Kaichun, and Guibas Leonidas J.. 2017. Pointnet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 652660.Google ScholarGoogle Scholar
  33. [33] Qi Charles Ruizhongtai, Yi Li, Su Hao, and Guibas Leonidas J.. 2017. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 30 (2017), 50995108.Google ScholarGoogle Scholar
  34. [34] Qiu Shi, Anwar Saeed, and Barnes Nick. 2021. Dense-resolution network for point cloud classification and segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 38133822.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Rao Yongming, Lu Jiwen, and Zhou Jie. 2019. Spherical fractal convolutional neural networks for point cloud recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 452460.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Riegler Gernot, Ulusoy Ali Osman, and Geiger Andreas. 2017. OctNet: Learning deep 3D representations at high resolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 35773586.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Simonovsky Martin and Komodakis Nikos. 2017. Dynamic edge-conditioned filters in convolutional neural networks on graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 36933702.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Su Hang, Maji Subhransu, Kalogerakis Evangelos, and Learned-Miller Erik. 2015. Multi-view convolutional neural networks for 3D shape recognition. In Proceedings of the IEEE International Conference on Computer Vision. 945953.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Thomas Hugues, Qi Charles R., Deschaud Jean-Emmanuel, Marcotegui Beatriz, Goulette François, and Guibas Leonidas J.. 2019. KPConv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE International Conference on Computer Vision. 64116420.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Touvron Hugo, Cord Matthieu, Douze Matthijs, Massa Francisco, Sablayrolles Alexandre, and Jégou Hervé. 2020. Training data-efficient image transformers & distillation through attention. arXiv preprint arXiv:2012.12877 (2020).Google ScholarGoogle Scholar
  41. [41] Uy Mikaela Angelina, Pham Quang-Hieu, Hua Binh-Son, Nguyen Duc Thanh, and Yeung Sai-Kit. 2019. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In Proceedings of the International Conference on Computer Vision (ICCV).Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).Google ScholarGoogle Scholar
  43. [43] Wang Huiyu, Zhu Yukun, Green Bradley, Adam Hartwig, Yuille Alan, and Chen Liang-Chieh. 2020. Axial-DeepLab: Stand-alone axial-attention for panoptic segmentation. In Proceedings of the European Conference on Computer Vision. Springer, 108126.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Wang Lei, Huang Yuchun, Hou Yaolin, Zhang Shenman, and Shan Jie. 2019. Graph attention convolution for point cloud semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1029610305.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Wang Peng-Shuai, Liu Yang, Guo Yu-Xiao, Sun Chun-Yu, and Tong Xin. 2017. O-CNN: Octree-based convolutional neural networks for 3D shape analysis. ACM Comput. Graph. 36, 4 (2017), 111.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Wang Yue, Sun Yongbin, Liu Ziwei, Sarma Sanjay E., Bronstein Michael M., and Solomon Justin M.. 2019. Dynamic graph CNN for learning on point clouds. ACM Comput. Graph. 38, 5 (2019), 112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Wu Wenxuan, Qi Zhongang, and Fuxin Li. 2019. PointConv: Deep convolutional networks on 3D point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 96219630.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Wu Zhirong, Song Shuran, Khosla Aditya, Yu Fisher, Zhang Linguang, Tang Xiaoou, and Xiao Jianxiong. 2015. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 19121920.Google ScholarGoogle Scholar
  49. [49] Xie Saining, Liu Sainan, Chen Zeyu, and Tu Zhuowen. 2018. Attentional ShapeContextNet for point cloud recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 46064615.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Xie Zhuyang, Chen Junzhou, and Peng Bo. 2020. Point clouds learning with attention-based graph convolution networks. Neurocomputing 402 (2020), 245255.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Xu Chenfeng, Zhai Bohan, Wu Bichen, Li Tian, Zhan Wei, Vajda Peter, Keutzer Kurt, and Tomizuka Masayoshi. 2021. You only group once: Efficient point-cloud processing with token representation and relation inference module. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 45894596.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Xu Qiangeng, Sun Xudong, Wu Cho-Ying, Wang Panqu, and Neumann Ulrich. 2020. Grid-GCN for fast and scalable point cloud learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 56615670.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Xu Yifan, Fan Tianqi, Xu Mingye, Zeng Long, and Qiao Yu. 2018. SpiderCNN: Deep learning on point sets with parameterized convolutional filters. In Proceedings of the European Conference on Computer Vision (ECCV). 87102.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Yan Xu, Zheng Chaoda, Li Zhen, Wang Sheng, and Cui Shuguang. 2020. PointASNL: Robust point clouds processing using nonlocal neural networks with adaptive sampling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 55895598.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Yang Ze and Wang Liwei. 2019. Learning relationships for multi-view 3D object recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 75057514.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Yu Tan, Meng Jingjing, and Yuan Junsong. 2018. Multi-view harmonized bilinear network for 3D object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 186194.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Yu Zhou, Yu Jun, Cui Yuhao, Tao Dacheng, and Tian Qi. 2019. Deep modular co-attention networks for visual question answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 62816290.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Zeng Wei and Gevers Theo. 2018. 3DContextNet: K-d tree guided hierarchical learning of point clouds using local and global contextual cues. In Proceedings of the European Conference on Computer Vision (ECCV).Google ScholarGoogle Scholar
  59. [59] Zhang Min, Kadam Pranav, Liu Shan, and Kuo C.-C. Jay. 2021. GSIP: Green semantic segmentation of large-scale indoor point clouds. arXiv preprint arXiv:2109.11835 (2021).Google ScholarGoogle Scholar
  60. [60] Zhang Min, You Haoxuan, Kadam Pranav, Liu Shan, and Kuo C.-C. Jay. 2020. PointHop: An explainable machine learning method for point cloud classification. IEEE Trans. Multim. 22, 7 (2020), 17441755.Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Zhang Zhiyuan, Hua Binh-Son, Rosen David W., and Yeung Sai-Kit. 2019. Rotation invariant convolutions for 3D point clouds deep learning. In Proceedings of the International Conference on 3D Vision (3DV). IEEE, 204213.Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Zhang Zhiyuan, Hua Binh-Son, and Yeung Sai-Kit. 2019. ShellNet: Efficient point cloud convolutional neural networks using concentric shells statistics. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 16071616.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Zhao Chen, Yang Jiaqi, Xiong Xin, Zhu Angfan, Cao Zhiguo, and Li Xin. 2022. Rotation invariant point cloud analysis: Where local geometry meets global topology. Patt. Recog. 127 (2022), 108626.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. [64] Zhao Hengshuang, Jia Jiaya, and Koltun Vladlen. 2020. Exploring self-attention for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1007610085.Google ScholarGoogle ScholarCross RefCross Ref
  65. [65] Zhou Wei, Cao Xin, Zhang Xiaodan, Hao Xingxing, Wang Dekui, and He Ying. 2021. Multi point-voxel convolution (MPVConv) for deep learning on point clouds. arXiv preprint arXiv:2107.13152 (2021).Google ScholarGoogle Scholar

Index Terms

  1. Cyclic Self-attention for Point Cloud Recognition

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 1s
      February 2023
      504 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3572859
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 January 2023
      • Online AM: 24 June 2022
      • Accepted: 16 May 2022
      • Revised: 23 March 2022
      • Received: 13 September 2021
      Published in tomm Volume 19, Issue 1s

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)269
      • Downloads (Last 6 weeks)18

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!