Abstract
Point clouds provide a flexible geometric representation for computer vision research. However, the harsh demands for the number of input points and computer hardware are still significant challenges, which hinder their deployment in real applications. To address these challenges, we design a simple and effective module named cyclic self-attention module (CSAM). Specifically, three attention maps of the same input are obtained by cyclically pairing the feature maps, thus exploring the features sufficiently of the attention space of the original input. CSAM can adequately explore the correlation between points to obtain sufficient feature information despite the multiplicative decrease in inputs. Meanwhile, it can direct the computational power to the more essential features, relieving the burden on the computer hardware. We build a point cloud classification network by simply stacking CSAM called cyclic self-attention network (CSAN). We also propose a novel framework for point cloud semantic segmentation called full cyclic self-attention network (FCSAN). By adaptively fusing the original mapping features and the CSAM extracted features, it can better capture the context information of point clouds. Extensive experiments on several benchmark datasets show that our methods can achieve competitive performance in classification and segmentation tasks.
- [1] . 2020. Enforcing affinity feature learning through self-attention for person re-identification. ACM Trans. Multim. Comput. Commun. Applic. 16, 1 (2020), 1–22.Google Scholar
Digital Library
- [2] . 2016. 3D semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1534–1543.Google Scholar
Cross Ref
- [3] . 2018. 3DmFV: Three-dimensional point cloud classification in real-time using convolutional neural networks. IEEE Robot. Automat. Lett. 3, 4 (2018), 3145–3152.Google Scholar
Cross Ref
- [4] . 2020. ConvPoint: Continuous convolutions for point cloud processing. Comput. Graph. 88 (2020), 24–34.Google Scholar
Cross Ref
- [5] . 2020. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision. Springer, 213–229.Google Scholar
Digital Library
- [6] . 2019. ClusterNet: Deep hierarchical cluster network with rigorously rotation-invariant representation for point cloud analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4994–5002.Google Scholar
Cross Ref
- [7] . 2017. 3D object classification with point convolution network. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 783–788.Google Scholar
Digital Library
- [8] . 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).Google Scholar
- [9] . 2018. Know what your neighbors do: 3D semantic segmentation of point clouds. In Proceedings of the European Conference on Computer Vision (ECCV). 0–0.Google Scholar
- [10] . 2020. Point attention network for semantic segmentation of 3D point clouds. Patt. Recog. 107 (2020), 107446.Google Scholar
Cross Ref
- [11] . 2018. GVCNN: Group-view convolutional neural networks for 3D shape recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 264–272.Google Scholar
Cross Ref
- [12] . 2022. LFT-Net: Local feature transformer network for point clouds analysis. IEEE Trans. Intell. Transport. Syst. (2022).Google Scholar
- [13] . 2021. Revisiting point cloud shape classification with a simple and effective baseline. In Proceedings of the International Conference on Machine Learning. PMLR, 3809–3820.Google Scholar
- [14] . 2020. PCT: Point cloud transformer. arXiv preprint arXiv:2012.09688 (2020).Google Scholar
- [15] . 2020. Point2Node: Correlation learning of dynamic-node for point cloud feature modeling. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 10925–10932.Google Scholar
Cross Ref
- [16] . 2020. Attention-based modality-gated networks for image-text sentiment analysis. ACM Trans. Multim. Comput. Commun. Applic. 16, 3 (2020), 1–19.Google Scholar
Digital Library
- [17] . 2018. Recurrent slice networks for 3D segmentation of point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2626–2635.Google Scholar
Cross Ref
- [18] . 2021. FatNet: A feature-attentive network for 3D point cloud processing. In Proceedings of the 25th International Conference on Pattern Recognition (ICPR). IEEE, 7211–7218.Google Scholar
Cross Ref
- [19] . 2021. Transformers in vision: A survey. ACM Comput. Surv. (2021).Google Scholar
- [20] . 2018. PointGrid: A deep network for 3D shape understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9204–9214.Google Scholar
Cross Ref
- [21] . 2018. PointCNN: Convolution on x-transformed points. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 820–830.Google Scholar
- [22] . 2021. End-to-end human pose and mesh reconstruction with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1954–1963.Google Scholar
Cross Ref
- [23] . 2019. Dynamic points agglomeration for hierarchical point sets learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7546–7555.Google Scholar
Cross Ref
- [24] . 2019. Relation-shape convolutional neural network for point cloud analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8895–8904.Google Scholar
Cross Ref
- [25] . 2019. Point-voxel CNN for efficient 3D deep learning. Adv. Neural Inf. Process. Syst. 32 (2019).Google Scholar
- [26] . 2020. TANet: Robust 3D object detection from point clouds with triple attention. In Proceedings of the AAAI Conference on Artificial Intelligence. 11677–11684.Google Scholar
Cross Ref
- [27] . 2016. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016).Google Scholar
- [28] . 2019. Interpolated convolutional networks for 3D point cloud understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1578–1587.Google Scholar
Cross Ref
- [29] . 2015. Voxnet: A 3D convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 922–928.Google Scholar
Digital Library
- [30] . 2021. An end-to-end transformer model for 3D object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2906–2917.Google Scholar
Cross Ref
- [31] . 2020. MMFN: Multimodal information fusion networks for 3D model classification and retrieval. ACM Trans. Multim. Comput. Commun. Applic. 16, 4 (2020), 1–22.Google Scholar
Digital Library
- [32] . 2017. Pointnet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 652–660.Google Scholar
- [33] . 2017. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 30 (2017), 5099–5108.Google Scholar
- [34] . 2021. Dense-resolution network for point cloud classification and segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 3813–3822.Google Scholar
Cross Ref
- [35] . 2019. Spherical fractal convolutional neural networks for point cloud recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 452–460.Google Scholar
Cross Ref
- [36] . 2017. OctNet: Learning deep 3D representations at high resolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3577–3586.Google Scholar
Cross Ref
- [37] . 2017. Dynamic edge-conditioned filters in convolutional neural networks on graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3693–3702.Google Scholar
Cross Ref
- [38] . 2015. Multi-view convolutional neural networks for 3D shape recognition. In Proceedings of the IEEE International Conference on Computer Vision. 945–953.Google Scholar
Digital Library
- [39] . 2019. KPConv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE International Conference on Computer Vision. 6411–6420.Google Scholar
Cross Ref
- [40] . 2020. Training data-efficient image transformers & distillation through attention. arXiv preprint arXiv:2012.12877 (2020).Google Scholar
- [41] . 2019. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In Proceedings of the International Conference on Computer Vision (ICCV).Google Scholar
Cross Ref
- [42] . 2017. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).Google Scholar
- [43] . 2020. Axial-DeepLab: Stand-alone axial-attention for panoptic segmentation. In Proceedings of the European Conference on Computer Vision. Springer, 108–126.Google Scholar
Digital Library
- [44] . 2019. Graph attention convolution for point cloud semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10296–10305.Google Scholar
Cross Ref
- [45] . 2017. O-CNN: Octree-based convolutional neural networks for 3D shape analysis. ACM Comput. Graph. 36, 4 (2017), 1–11.Google Scholar
Digital Library
- [46] . 2019. Dynamic graph CNN for learning on point clouds. ACM Comput. Graph. 38, 5 (2019), 1–12.Google Scholar
Digital Library
- [47] . 2019. PointConv: Deep convolutional networks on 3D point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9621–9630.Google Scholar
Cross Ref
- [48] . 2015. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1912–1920.Google Scholar
- [49] . 2018. Attentional ShapeContextNet for point cloud recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4606–4615.Google Scholar
Cross Ref
- [50] . 2020. Point clouds learning with attention-based graph convolution networks. Neurocomputing 402 (2020), 245–255.Google Scholar
Cross Ref
- [51] . 2021. You only group once: Efficient point-cloud processing with token representation and relation inference module. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 4589–4596.Google Scholar
Digital Library
- [52] . 2020. Grid-GCN for fast and scalable point cloud learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5661–5670.Google Scholar
Cross Ref
- [53] . 2018. SpiderCNN: Deep learning on point sets with parameterized convolutional filters. In Proceedings of the European Conference on Computer Vision (ECCV). 87–102.Google Scholar
Digital Library
- [54] . 2020. PointASNL: Robust point clouds processing using nonlocal neural networks with adaptive sampling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5589–5598.Google Scholar
Cross Ref
- [55] . 2019. Learning relationships for multi-view 3D object recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7505–7514.Google Scholar
Cross Ref
- [56] . 2018. Multi-view harmonized bilinear network for 3D object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 186–194.Google Scholar
Cross Ref
- [57] . 2019. Deep modular co-attention networks for visual question answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6281–6290.Google Scholar
Cross Ref
- [58] . 2018. 3DContextNet: K-d tree guided hierarchical learning of point clouds using local and global contextual cues. In Proceedings of the European Conference on Computer Vision (ECCV).Google Scholar
- [59] . 2021. GSIP: Green semantic segmentation of large-scale indoor point clouds. arXiv preprint arXiv:2109.11835 (2021).Google Scholar
- [60] . 2020. PointHop: An explainable machine learning method for point cloud classification. IEEE Trans. Multim. 22, 7 (2020), 1744–1755.Google Scholar
Cross Ref
- [61] . 2019. Rotation invariant convolutions for 3D point clouds deep learning. In Proceedings of the International Conference on 3D Vision (3DV). IEEE, 204–213.Google Scholar
Cross Ref
- [62] . 2019. ShellNet: Efficient point cloud convolutional neural networks using concentric shells statistics. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1607–1616.Google Scholar
Cross Ref
- [63] . 2022. Rotation invariant point cloud analysis: Where local geometry meets global topology. Patt. Recog. 127 (2022), 108626.Google Scholar
Digital Library
- [64] . 2020. Exploring self-attention for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10076–10085.Google Scholar
Cross Ref
- [65] . 2021. Multi point-voxel convolution (MPVConv) for deep learning on point clouds. arXiv preprint arXiv:2107.13152 (2021).Google Scholar
Index Terms
Cyclic Self-attention for Point Cloud Recognition
Recommendations
1D Self-Attention Network for Point Cloud Semantic Segmentation Using Omnidirectional LiDAR
Pattern RecognitionAbstractUnderstanding environment around the vehicle is essential for automated driving technology. For this purpose, an omnidirectional LiDAR is used for obtaining surrounding information and point cloud based semantic segmentation methods have been ...
Point cloud classification network based on self-attention mechanism
AbstractPointNet makes it possible to process point cloud data directly. However, PointNet only extracts global features and cannot capture fine local features. How to build a refined local feature extractor is the main goal of the research. ...
Graphical abstractDisplay Omitted
Handwritten Mathematical Expression Recognition with Self-Attention
ACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial IntelligenceAttention-based encoder-decoder models have made great success on handwritten mathematical expression recognition in recent years. However, this kind of method has the problem of attention drift, because under the local attention mechanism based on RNN, ...






Comments