skip to main content
research-article

Dual-Stream Structured Graph Convolution Network for Skeleton-Based Action Recognition

Authors Info & Claims
Published:12 November 2021Publication History
Skip Abstract Section

Abstract

In this work, we propose a dual-stream structured graph convolution network (DS-SGCN) to solve the skeleton-based action recognition problem. The spatio-temporal coordinates and appearance contexts of the skeletal joints are jointly integrated into the graph convolution learning process on both the video and skeleton modalities. To effectively represent the skeletal graph of discrete joints, we create a structured graph convolution module specifically designed to encode partitioned body parts along with their dynamic interactions in the spatio-temporal sequence. In more detail, we build a set of structured intra-part graphs, each of which can be adopted to represent a distinctive body part (e.g., left arm, right leg, head). The inter-part graph is then constructed to model the dynamic interactions across different body parts; here each node corresponds to an intra-part graph built above, while an edge between two nodes is used to express these internal relationships of human movement. We implement the graph convolution learning on both intra- and inter-part graphs in order to obtain the inherent characteristics and dynamic interactions, respectively, of human action. After integrating the intra- and inter-levels of spatial context/coordinate cues, a convolution filtering process is conducted on time slices to capture these temporal dynamics of human motion. Finally, we fuse two streams of graph convolution responses in order to predict the category information of human action in an end-to-end fashion. Comprehensive experiments on five single/multi-modal benchmark datasets (including NTU RGB+D 60, NTU RGB+D 120, MSR-Daily 3D, N-UCLA, and HDM05) demonstrate that the proposed DS-SGCN framework achieves encouraging performance on the skeleton-based action recognition task.

REFERENCES

  1. [1] Atwood James and Towsley Don. 2016. Diffusion-convolutional neural networks. In Advances in Neural Information Processing Systems. 19932001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Baradel Fabien, Wolf Christian, and Mille Julien. 2017. Human action recognition: Pose-based attention draws focus to hands. In IEEE International Conference on Computer Vision Workshops. 604613.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Baradel Fabien, Wolf Christian, and Mille Julien. 2018. Human activity recognition with pose-driven attention to RGB. In British Machine Vision Conference. 200.Google ScholarGoogle Scholar
  4. [4] Baradel Fabien, Wolf Christian, Mille Julien, and Taylor Graham W.. 2018. Glimpse clouds: Human activity recognition from unstructured feature points. In IEEE Conference on Computer Vision and Pattern Recognition. 469478.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Bojchevski Aleksandar, Shchur Oleksandr, Zügner Daniel, and Günnemann Stephan. 2018. Netgan: Generating graphs via random walks. In International Conference on Machine Learning. 609–618.Google ScholarGoogle Scholar
  6. [6] Bottou Léon. 2010. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT 2010. 177186.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Bruna Joan, Zaremba Wojciech, Szlam Arthur, and LeCun Yann. 2013. Spectral networks and locally connected networks on graphs. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  8. [8] Cai Xingyang, Zhou Wengang, Wu Lei, Luo Jiebo, and Li Houqiang. 2015. Effective active skeleton representation for low latency human action recognition. IEEE Transactions on Multimedia 18, 2 (2015), 141154.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Carreira Joao and Zisserman Andrew. 2017. Quo vadis, action recognition? A new model and the kinetics dataset. In IEEE Conference on Computer Vision and Pattern Recognition. 62996308.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Cheng Ke, Zhang Yifan, He Xiangyu, Chen Weihan, Cheng Jian, and Lu Hanqing. 2020. Skeleton-based action recognition with shift graph convolutional network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’20).Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Chéron Guilhem, Laptev Ivan, and Schmid Cordelia. 2015. P-CNN: Pose-based CNN features for action recognition. In Proceedings of the IEEE International Conference on Computer Vision. 32183226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Das Srijan, Chaudhary Arpit, Bremond Francois, and Thonnat Monique. 2019. Where to focus on for human action recognition? In IEEE Winter Conference on Applications of Computer Vision. 7180.Google ScholarGoogle Scholar
  13. [13] Das Srijan, Dai Rui, Koperski Michal, Minciullo Luca, Garattoni Lorenzo, Bremond Francois, and Francesca Gianpiero. 2019. Toyota smarthome: Real-world activities of daily living. In Proceedings of the IEEE International Conference on Computer Vision. 833842.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Das Srijan, Dai Rui, Koperski Michal, Minciullo Luca, Garattoni Lorenzo, Bremond Francois, and Francesca Gianpiero. 2019. Toyota smarthome: Real-world activities of daily living. In IEEE International Conference on Computer Vision. 833842.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Das Srijan, Sharma Saurav, Dai Rui, Brémond Francois, and Thonnat Monique. 2020. VPN: Learning video-pose embedding for activities of daily living. The 16th European Conference Computer Vision(2020), 7290.Google ScholarGoogle Scholar
  16. [16] Defferrard Michaël, Bresson Xavier, and Vandergheynst Pierre. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Annual Conference on Neural Information Processing Systems. 38373845. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Du Yong, Wang Wei, and Wang Liang. 2015. Hierarchical recurrent neural network for skeleton based action recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 11101118.Google ScholarGoogle Scholar
  18. [18] Hamilton Will, Ying Zhitao, and Leskovec Jure. 2017. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems. 10241034. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2014. Spatial pyramid pooling in deep convolutional networks for visual recognition. In European Conference on Computer Vision. 346361.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Henaff Mikael, Bruna Joan, and LeCun Yann. 2015. Deep Convolutional Networks on Graph-Structured Data. https://arxiv.org/abs/1506.05163.Google ScholarGoogle Scholar
  22. [22] Hu Jian-Fang, Zheng Wei-Shi, Lai Jianhuang, and Zhang Jianguo. 2015. Jointly learning heterogeneous features for RGB-D activity recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 53445352.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Hu Jian-Fang, Zheng Wei-Shi, Pan Jiahui, Lai Jianhuang, and Zhang Jianguo. 2018. Deep bilinear learning for RGB-D action recognition. In European Conference on Computer Vision. 335351.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Huang Zhiwu and Gool Luc J. Van. 2017. A Riemannian network for SPD matrix learning. In AAAI Conference on Artificial Intelligence. 20362042. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Huang Zhiwu, Wan Chengde, Probst Thomas, and Gool Luc Van. 2017. Deep learning on lie groups for skeleton-based action recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 12431252.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Jain Ashesh, Zamir Amir R., Savarese Silvio, and Saxena Ashutosh. 2016. Structural-RNN: Deep learning on spatio-temporal graphs. In IEEE Conference on Computer Vision and Pattern Recognition. 53085317.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Jiang Jiatao, Cui Zhen, Xu Chunyan, and Yang Jian. 2019. Gaussian-induced convolution for graphs. In AAAI Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Ke Qiuhong, Bennamoun Mohammed, An Senjian, Sohel Ferdous, and Boussaid Farid. 2017. A new representation of skeleton sequences for 3D action recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 32883297.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Ketkar Nikhil. 2017. Introduction to PyTorch. In Deep Learning with Python. 195208.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Kipf Thomas N. and Welling Max. 2016. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  31. [31] Kipf Thomas N. and Welling Max. 2016. Variational graph auto-encoders. https://arxiv.org/abs/1611.07308.Google ScholarGoogle Scholar
  32. [32] Koppula Hema S. and Saxena Ashutosh. 2016. Anticipating human activities using object affordances for reactive robotic response. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 1 (2016), 1429. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Lee Inwoong, Kim Doyoung, Kang Seoungyoon, and Lee Sanghoon. 2017. Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks. In IEEE International Conference on Computer Vision. 10121020.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Li Bo, Dai Yuchao, Cheng Xuelian, Chen Huahui, Lin Yi, and He Mingyi. 2017. Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN. In IEEE International Conference on Multimedia & Expo Workshops. 601604.Google ScholarGoogle Scholar
  35. [35] Li Chaolong, Cui Zhen, Zheng Wenming, Xu Chunyan, Ji Rongrong, and Yang Jian. 2018. Action-attending graphic neural network. IEEE Transactions on Image Processing 27, 7 (2018), 36573670.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Li Chaolong, Cui Zhen, Zheng Wenming, Xu Chunyan, and Yang Jian. 2018. Spatio-temporal graph convolution for skeleton based action recognition. In AAAI Conference on Artificial Intelligence. 34823489. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Li Chao, Zhong Qiaoyong, Xie Di, and Pu Shiliang. 2018. Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In International Joint Conference on Artificial Intelligence. 786792. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Li Maosen, Chen Siheng, Chen Xu, Zhang Ya, Wang Yanfeng, and Tian Qi. 2019. Actional-structural graph convolutional networks for skeleton-based action recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 35953603.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Li Yanghao, Lan Cuiling, Xing Junliang, Zeng Wenjun, Yuan Chunfeng, and Liu Jiaying. 2016. Online human action detection using joint classification-regression recurrent neural networks. European Conference on Computer Vision.Google ScholarGoogle Scholar
  40. [40] Liang Duohan, Fan Guoliang, Lin Guangfeng, Chen Wanjun, Pan Xiaorong, and Zhu Hong. 2019. Three-stream convolutional neural network with multi-task and ensemble learning for 3D action recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Liang Duohan, Fan Guoliang, Lin Guangfeng, Chen Wanjun, Pan Xiaorong, and Zhu Hong. 2019. Three-stream convolutional neural network with multi-task and ensemble learning for 3D action recognition. In IEEE Conference on Computer Vision and Pattern Recognition Workshops. 00.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Liu Chunhui, Hu Yueyu, Li Yanghao, Song Sijie, and Liu Jiaying. 2017. PKU-MMD: A Large Scale Benchmark for Continuous Multi-modal Human Action Understanding. https://arxiv.org/abs/1703.07475. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Liu Jun, Shahroudy Amir, Perez Mauricio, Wang Gang, Duan Ling-Yu, and Kot Alex C.. 2020. NTU RGB+D 120: A large-scale benchmark for 3D human activity understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 10 (2020), 26842701.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Liu Jun, Shahroudy Amir, Xu Dong, and Wang Gang. 2016. Spatio-temporal LSTM with trust gates for 3D human action recognition. In European Conference on Computer Vision. 816833.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Liu Mengyuan, Liu Hong, and Chen Chen. 2017. Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognition 68 (2017), 346362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Liu M., Nie L., Wang X., Tian Q., and Chen B.. 2019. Online data organizer: Micro-video categorization by structure-guided multimodal dictionary learning. IEEE Transactions on Image Processing 28, 3 (2019), 12351247.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Liu Mengyuan and Yuan Junsong. 2018. Recognizing human actions as the evolution of pose estimation maps. In IEEE Conference on Computer Vision and Pattern Recognition. 11591168.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Liu Rong, Xu Chunyan, Zhang Tong, Zhao Wenting, Cui Zhen, and Yang Jian. 2019. Si-GCN: Structure-induced graph convolution network for skeleton-based action recognition. In 2019 International Joint Conference on Neural Networks (IJCNN’19). IEEE, 18.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Liu Ziyu, Zhang Hongwen, Chen Zhenghao, Wang Zhiyong, and Ouyang Wanli. 2020. Disentangling and unifying graph convolutions for skeleton-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 143152.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Memmesheimer Raphael, Theisen Nick, and Paulus Dietrich. 2020. Gimme signals: Discriminative signal encoding for multimodal activity recognition. In International Conference on Intelligent Robots and Systems. 1039410401.Google ScholarGoogle Scholar
  51. [51] Müller M., Röder T., Clausen M., Eberhardt B., Krüger B., and Weber A.. 2007. Documentation Mocap Database HDM05. Technical Report CG-2007-2. Universität Bonn.Google ScholarGoogle Scholar
  52. [52] Nguyen Tien-Nam, Pham Dinh-Tan, Le Thi-Lan, Vu Hai, and Tran Thanh-Hai. 2018. Novel skeleton-based action recognition using covariance descriptors on most informative joints. In International Conference on Knowledge and Systems Engineering. 5055.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Niepert Mathias, Ahmed Mohamed, and Kutzkov Konstantin. 2016. Learning convolutional neural networks for graphs. In International Conference on Machine Learning. 20142023. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Pérez-Rúa Juan-Manuel, Vielzeuf Valentin, Pateux Stéphane, Baccouche Moez, and Jurie Frédéric. 2019. MFAS: Multi-modal fusion architecture search. In IEEE Conference on Computer Vision and Pattern Recognition. 69666975.Google ScholarGoogle Scholar
  55. [55] Presti Liliana Lo and Cascia Marco La. 2016. 3D skeleton-based human action classification: A survey. Pattern Recognition 53 (2016), 130147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Qin Xiaolei, Ge Yongxin, Zhan Liuwei, Li Guangrui, Huang Sheng, Wang Hongxing, and Chen Feiyu. 2018. Joint deep learning for RGB-D action recognition. In IEEE Visual Communications and Image Processing. 16.Google ScholarGoogle Scholar
  57. [57] Ramezani Mohsen and Yaghmaee Farzin. 2016. A review on human action analysis in videos for retrieval applications. Artificial Intelligence Review 46, 4 (2016), 485514. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Ryoo M. S., Fuchs Thomas J., Xia Lu, Aggarwal Jake K., and Matthies Larry. 2015. Robot-centric activity prediction from first-person videos: What will they do to me? In ACM/IEEE International Conference on Human-Robot Interaction. 295302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Shahroudy Amir, Liu Jun, Ng Tian-Tsong, and Wang Gang. 2016. NTU RGB+D: A large scale dataset for 3D human activity analysis. In IEEE Conference on Computer Vision and Pattern Recognition. 10101019.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Shahroudy Amir, Ng Tian-Tsong, Gong Yihong, and Wang Gang. 2017. Deep multimodal feature analysis for action recognition in RGB+ D videos. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 5 (2017), 10451058.Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Shahroudy Amir, Ng Tian-Tsong, Yang Qingxiong, and Wang Gang. 2015. Multimodal multipart learning for action recognition in depth videos. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 10 (2015), 21232129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. [62] Shi Lei, Zhang Yifan, Cheng Jian, and Lu Hanqing. 2019. Skeleton-based action recognition with directed graph neural networks. In IEEE Conference on Computer Vision and Pattern Recognition. 79127921.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Shi Lei, Zhang Yifan, Cheng Jian, and Lu Hanqing. 2019. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1202612035.Google ScholarGoogle ScholarCross RefCross Ref
  64. [64] Shi Lei, Zhang Yifan, Cheng Jian, and Lu Hanqing. 2020. Decoupled spatial-temporal attention network for skeleton-based action recognition. European Conference on Computer Vision (2020), 536553.Google ScholarGoogle Scholar
  65. [65] Si Chenyang, Chen Wentao, Wang Wei, Wang Liang, and Tan Tieniu. 2019. An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 12271236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. [66] Si Chenyang, Jing Ya, Wang Wei, Wang Liang, and Tan Tieniu. 2018. Skeleton-based action recognition with spatial reasoning and temporal stack learning. In European Conference on Computer Vision. 103118.Google ScholarGoogle ScholarCross RefCross Ref
  67. [67] Si Chenyang, Jing Ya, Wang Wei, Wang Liang, and Tan Tieniu. 2020. Skeleton-based action recognition with hierarchical spatial reasoning and temporal stack learning network. Pattern Recognition 107 (2020), 107511.Google ScholarGoogle ScholarCross RefCross Ref
  68. [68] Snoek Cees G. M., Huurnink Bouke, Hollink Laura, Rijke Maarten De, Schreiber Guus, and Worring Marcel. 2007. Adding semantics to detectors for video retrieval. IEEE Transactions on Multimedia 9, 5 (2007), 975986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. [69] Song Sijie, Lan Cuiling, Xing Junliang, Zeng Wenjun, and Liu Jiaying. 2017. An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In AAAI Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. [70] Song Sijie, Lan Cuiling, Xing Junliang, Zeng Wenjun, and Liu Jiaying. 2018. Skeleton-indexed deep multi-modal feature learning for high performance human action recognition. In IEEE International Conference on Multimedia and Expo (ICME’18). 16.Google ScholarGoogle ScholarCross RefCross Ref
  71. [71] Song Sijie, Lan Cuiling, Xing Junliang, Zeng Wenjun, and Liu Jiaying. 2018. Spatio-temporal attention-based LSTM networks for 3D action recognition and detection. IEEE Transactions on Image Processing 27, 7 (2018), 34593471.Google ScholarGoogle ScholarCross RefCross Ref
  72. [72] Such Felipe Petroski, Sah Shagan, Dominguez Miguel Alexander, Pillai Suhas, Zhang Chao, Michael Andrew, Cahill Nathan D., and Ptucha Raymond. 2017. Robust spatial filtering with graph convolutional neural networks. IEEE Journal of Selected Topics in Signal Processing 11, 6 (2017), 884896.Google ScholarGoogle ScholarCross RefCross Ref
  73. [73] Tang Yansong, Tian Yi, Lu Jiwen, Li Peiyang, and Zhou Jie. 2018. Deep progressive reinforcement learning for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 53235332.Google ScholarGoogle ScholarCross RefCross Ref
  74. [74] Tsai T. J., Stolcke Andreas, and Slaney Malcolm. 2015. A study of multimodal addressee detection in human-human-computer interaction. IEEE Transactions on Multimedia 17, 9 (2015), 15501561.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. [75] Vemulapalli Raviteja, Arrate Felipe, and Chellappa Rama. 2014. Human action recognition by representing 3D skeletons as points in a lie group. In The IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. [76] Wang Dongang, Ouyang Wanli, Li Wen, and Xu Dong. 2018. Dividing and aggregating network for multi-view action recognition. In European Conference on Computer Vision. 451467.Google ScholarGoogle ScholarCross RefCross Ref
  77. [77] Wang Jiang, Liu Zicheng, Wu Ying, and Yuan Junsong. 2012. Mining actionlet ensemble for action recognition with depth cameras. In IEEE Conference on Computer Vision and Pattern Recognition. 12901297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. [78] Wang Jiang, Nie Xiaohan, Xia Yin, Wu Ying, and Zhu Song-Chun. 2014. Cross-view action modeling, learning and recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 26492656. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. [79] Wang Limin, Xiong Yuanjun, Wang Zhe, Qiao Yu, Lin Dahua, Tang Xiaoou, and Gool Luc Van. 2016. Temporal segment networks: Towards good practices for deep action recognition. In European Conference on Computer Vision. 2036.Google ScholarGoogle ScholarCross RefCross Ref
  80. [80] Wang Pichao, Li Wanqing, Li Chuankun, and Hou Yonghong. 2018. Action recognition based on joint trajectory maps with convolutional neural networks. Knowledge-Based Systems 158 (2018), 4353.Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. [81] Wang Pei, Yuan Chunfeng, Hu Weiming, Li Bing, and Zhang Yanning. 2016. Graph based skeleton motion representation and similarity measurement for action recognition. In European Conference on Computer Vision. 370385.Google ScholarGoogle ScholarCross RefCross Ref
  82. [82] Xu Chunyan, Lu Canyi, Liang Xiaodan, Gao Junbin, Zheng Wei, Wang Tianjiang, and Yan Shuicheng. 2016. Multi-loss regularized deep neural network. IEEE Transactions on Circuits and Systems for Video Technology 26, 12 (2016), 22732283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. [83] Yan Sijie, Xiong Yuanjun, and Lin Dahua. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In AAAI Conference on Artificial Intelligence. 74447452. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. [84] Zhang Pengfei, Lan Cuiling, Xing Junliang, Zeng Wenjun, Xue Jianru, and Zheng Nanning. 2017. View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In IEEE International Conference on Computer Vision. 21172126.Google ScholarGoogle ScholarCross RefCross Ref
  85. [85] Zhang Pengfei, Lan Cuiling, Zeng Wenjun, Xue Jianru, and Zheng Nanning. 2020. Semantics-guided neural networks for efficient skeleton-based human action recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 11091118.Google ScholarGoogle Scholar
  86. [86] Zhang Pengfei, Xue Jianru, Lan Cuiling, Zeng Wenjun, Gao Zhanning, and Zheng Nanning. 2018. Adding attentiveness to the neurons in recurrent neural networks. In Proceedings of the European Conference on Computer Vision. 135151.Google ScholarGoogle ScholarCross RefCross Ref
  87. [87] Zhang Songyang, Yang Yang, Xiao Jun, Liu Xiaoming, Yang Yi, Xie Di, and Zhuang Yueting. 2018. Fusing geometric features for skeleton-based action recognition using multilayer LSTM networks. IEEE Transactions on Multimedia 20, 9 (2018), 23302343. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. [88] Zhao Rui, Wang Kang, Su Hui, and Ji Qiang. 2019. Bayesian graph convolution LSTM for skeleton based action recognition. In The IEEE International Conference on Computer Vision. 68826892.Google ScholarGoogle ScholarCross RefCross Ref
  89. [89] Zhu Jiagang, Zou Wei, Xu Liang, Hu Yiming, Zhu Zheng, Chang Manyu, Huang Junjie, Huang Guan, and Du Dalong. 2018. Action machine: Rethinking action recognition in trimmed videos. https://arxiv.org/abs/1812.05770.Google ScholarGoogle Scholar
  90. [90] Zolfaghari Mohammadreza, Oliveira Gabriel L., Sedaghat Nima, and Brox Thomas. 2017. Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection. In IEEE International Conference on Computer Vision. 29042913.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Dual-Stream Structured Graph Convolution Network for Skeleton-Based Action Recognition

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 4
      November 2021
      529 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3492437
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 November 2021
      • Accepted: 1 February 2021
      • Revised: 1 January 2021
      • Received: 1 June 2020
      Published in tomm Volume 17, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!