skip to main content
research-article

A Benchmark Dataset and Comparison Study for Multi-modal Human Action Analytics

Authors Info & Claims
Published:22 May 2020Publication History
Skip Abstract Section

Abstract

Large-scale benchmarks provide a solid foundation for the development of action analytics. Most of the previous activity benchmarks focus on analyzing actions in RGB videos. There is a lack of large-scale and high-quality benchmarks for multi-modal action analytics. In this article, we introduce PKU Multi-Modal Dataset (PKU-MMD), a new large-scale benchmark for multi-modal human action analytics. It consists of about 28,000 action instances and 6.2 million frames in total and provides high-quality multi-modal data sources, including RGB, depth, infrared radiation (IR), and skeletons. To make PKU-MMD more practical, our dataset comprises two subsets under different settings for action understanding, namely Part I and Part II. Part I contains 1,076 untrimmed video sequences with 51 action classes performed by 66 subjects, while Part II contains 1,009 untrimmed video sequences with 41 action classes performed by 13 subjects. Compared to Part I, Part II is more challenging due to short action intervals, concurrent actions and heavy occlusion. PKU-MMD can be leveraged in two scenarios: action recognition with trimmed video clips and action detection with untrimmed video sequences. For each scenario, we provide benchmark performance on both subsets by conducting different methods with different modalities under two evaluation protocols, respectively. Experimental results show that PKU-MMD is a significant challenge to many state-of-the-art methods. We further illustrate that the features learned on PKU-MMD can be well transferred to other datasets. We believe this large-scale dataset will boost the research in the field of action analytics for the community.

References

  1. S. Mohsen Amiri, Mahsa T. Pourazad, Panos Nasiopoulos, and Victor C. M. Leung. 2013. Non-intrusive human activity monitoring in a smart home environment. In Proceedings of the IEEE International Conference on E-health Networking, Application 8 Services (Healthcom’13). 606--610.Google ScholarGoogle Scholar
  2. Victoria Bloom, Dimitrios Makris, and Vasileios Argyriou. 2012. G3D: A gaming action dataset and real time action recognition evaluation framework. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 7--12.Google ScholarGoogle ScholarCross RefCross Ref
  3. Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. Activitynet: A large-scale video benchmark for human activity understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 961--970.Google ScholarGoogle ScholarCross RefCross Ref
  4. Ziyun Cai, Jungong Han, Li Liu, and Ling Shao. 2017. RGB-D datasets using microsoft kinect or similar sensors: A survey. Multimedia Tools Appl. 76, 3 (2017), 4313--4355.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? A new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4724--4733.Google ScholarGoogle ScholarCross RefCross Ref
  6. Chen Chen, Roozbeh Jafari, and Nasser Kehtarnavaz. 2015. UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In Proceedings of the IEEE International Conference on Image Processing. 168--172.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Zhongwei Cheng, Lei Qin, Yituo Ye, Qingming Huang, and Qi Tian. 2012. Human daily action analysis with multi-view and color-depth data. In Proceedings of the European Conference on Computer Vision. 52--61.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. CMU. 2003. CMU Graphics Lab Motion Capture Database. Retrieved from http://mocap.cs.cmu.edu/.Google ScholarGoogle Scholar
  9. Roeland De Geest, Efstratios Gavves, Amir Ghodrati, Zhenyang Li, Cees Snoek, and Tinne Tuytelaars. 2016. Online action detection. In Proceedings of the European Conference on Computer Vision. 269--284.Google ScholarGoogle ScholarCross RefCross Ref
  10. Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2625--2634.Google ScholarGoogle ScholarCross RefCross Ref
  11. Yong Du, Yun Fu, and Liang Wang. 2016. Representation learning of temporal dynamics for skeleton-based action recognition. IEEE Trans. Image Process. 25, 7 (2016), 3010--3022.Google ScholarGoogle ScholarCross RefCross Ref
  12. Victor Escorcia, Fabian Caba Heilbron, Juan Carlos Niebles, and Bernard Ghanem. 2016. DAPs: Deep action proposals for action understanding. In Proceedings of the European Conference on Computer Vision. 768--784.Google ScholarGoogle ScholarCross RefCross Ref
  13. Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88, 2 (2010), 303--338.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Christoph Feichtenhofer, Axel Pinz, and Andrew Zisserman. 2016. Convolutional two-stream network fusion for video action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1933--1941.Google ScholarGoogle ScholarCross RefCross Ref
  15. Simon Fothergill, Helena Mentis, Pushmeet Kohli, and Sebastian Nowozin. 2012. Instructing people for training gestural interactive systems. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems. 1737--1746.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kaiming He, Ross B. Girshick, and Piotr Dollár. 2019. Rethinking imagenet pre-training. In Proceedings of the IEEE International Conference on Computer Vision. 4918--4927.Google ScholarGoogle ScholarCross RefCross Ref
  17. Minh Hoai and Fernando De la Torre. 2014. Max-margin early event detectors. Int. J. Comput. Vis. 107, 2 (2014), 191--202.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. F. Hu, W. S. Zheng, J. H. Lai, and J Zhang. 2016. Jointly learning heterogeneous features for RGB-D activity recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39, 11 (2016), 2186--2200.Google ScholarGoogle ScholarCross RefCross Ref
  19. Yueyu Hu, Chunhui Liu, Yanghao Li, Sijie Song, and Jiaying Liu. 2017. Temporal perceptive network for skeleton-based action recognition. In Proceedings of the British Machine Vision Conference. 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  20. Min Huang, Song-Zhi Su, Hong-Bo Zhang, Guo-Rong Cai, Dongying Gong, Donglin Cao, and Shao-Zi Li. 2018. Multifeature selection for 3D human action recognition. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2 (2018), 45.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Arpit Jain, Abhinav Gupta, Mikel Rodriguez, and Larry S. Davis. 2013. Representing videos using mid-level discriminative patches. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2571--2578.Google ScholarGoogle Scholar
  22. Manan Jain, Jan Van Gemert, Hervé Jégou, Patrick Bouthemy, and Cees G. M. Snoek. 2014. Action localization with tubelets from motion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 740--747.Google ScholarGoogle Scholar
  23. Feng Jiang, Shengping Zhang, Shen Wu, Yang Gao, and Debin Zhao. 2015. Multi-layered gesture recognition with kinect. J. Mach. Learn. Res. 16, 1 (2015), 227--254.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Yu-Gang Jiang, Qi Dai, Wei Liu, Xiangyang Xue, and Chong-Wah Ngo. 2015. Human action recognition in unconstrained videos by explicit motion modeling. IEEE Trans. Image Process. 24, 11 (2015), 3781--3795.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Zhuolin Jiang, Viktor Rozgic, and Sancar Adali. 2017. Learning spatiotemporal features for infrared action recognition with 3D convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 115--123.Google ScholarGoogle ScholarCross RefCross Ref
  26. Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1725--1732.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre. 2011. HMDB: A large video database for human motion recognition. In Proceedings of the IEEE International Conference on Computer Vision. 2556--2563.Google ScholarGoogle Scholar
  28. Ivan Laptev, Marcin Marszalek, Cordelia Schmid, and Benjamin Rozenfeld. 2008. Learning realistic human actions from movies. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  29. Bo Li, Huahui Chen, Yucheng Chen, Yuchao Dai, and Mingyi He. 2017. Skeleton boxes: Solving skeleton based action detection with a single deep convolutional neural network. In Proceedings of the IEEE International Conference on Multimedia Expo Workshops. 613--616.Google ScholarGoogle Scholar
  30. Chao Li, Qiaoyong Zhong, Di Xie, and Shiliang Pu. 2017. Skeleton-based action recognition with convolutional neural networks. In Proceedings of the IEEE International Conference on Multimedia Expo Workshops. 597--600.Google ScholarGoogle Scholar
  31. Chao Li, Qiaoyong Zhong, Di Xie, and Shiliang Pu. 2018. Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In Proceedings of the International Joint Conference on Artificial Intelligence. 786--792.Google ScholarGoogle ScholarCross RefCross Ref
  32. Wanqing Li, Zhengyou Zhang, and Zicheng Liu. 2010. Action recognition based on a bag of 3D points. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9--14.Google ScholarGoogle ScholarCross RefCross Ref
  33. Yanghao Li, Cuiling Lan, Junliang Xing, Wenjun Zeng, Chunfeng Yuan, and Jiaying Liu. 2016. Online human action detection using joint classification-regression recurrent neural networks. In Proceedings of the European Conference on Computer Vision. 203--220.Google ScholarGoogle ScholarCross RefCross Ref
  34. Ivan Lillo, Alvaro Soto, and Juan Carlos Niebles. 2014. Discriminative hierarchical modeling of spatio-temporally composable human activities. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 812--819.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. An-An Liu, Wei-Zhi Nie, Yu-Ting Su, Li Ma, Tong Hao, and Zhao-Xuan Yang. 2015. Coupled hidden conditional random fields for RGB-D human action recognition. Signal Processing 112 (2015), 74--82.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Chunhui Liu, Yueyu Hu, Yanghao Li, Sijie Song, and Jiaying Liu. 2017. PKU-MMD: A large scale benchmark for skeleton-based human action understanding. In Proceedings of the Workshop on Visual Analysis in Smart and Connected Communities. 1--8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Zelun Luo, Jun-Ting Hsieh, Lu Jiang, Juan Carlos Niebles, and Li Fei-Fei. 2018. Graph distillation for action detection with privileged modalities. In Proceedings of the European Conference on Computer Vision. 174--192.Google ScholarGoogle ScholarCross RefCross Ref
  38. Behrooz Mahasseni and Sinisa Todorovic. 2016. Regularizing long short term memory with 3D human-skeleton sequences for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3054--3062.Google ScholarGoogle ScholarCross RefCross Ref
  39. Jie Miao, Xiangmin Xu, Shuoyang Qiu, Chunmei Qing, and Dacheng Tao. 2015. Temporal variance analysis for action recognition. IEEE Trans. Image Process. 24, 12 (2015), 5904--5915.Google ScholarGoogle ScholarCross RefCross Ref
  40. M. Müller, T. Röder, M. Clausen, B. Eberhardt, B. Krüger, and A. Weber. 2007. Documentation Mocap Database HDM05. Technical Report CG-2007-2.Google ScholarGoogle Scholar
  41. Ferda Ofli, Rizwan Chaudhry, Gregorij Kurillo, René Vidal, and Ruzena Bajcsy. 2013. Berkeley MHAD: A comprehensive multimodal human action database. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 53--60.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Omar Oreifej and Zicheng Liu. 2013. Hon4D: Histogram of oriented 4D normals for activity recognition from depth sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Florent Perronnin, Jorge Sánchez, and Thomas Mensink. 2010. Improving the fisher kernel for large-scale image classification. In Proceedings of the European Conference on Computer Vision. 143--156.Google ScholarGoogle ScholarCross RefCross Ref
  44. Hossein Rahmani, Arif Mahmood, Du Huynh, and Ajmal Mian. 2016. Histogram of oriented principal components for cross-view action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38, 12 (2016), 2430--2443.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Hossein Rahmani, Arif Mahmood, Du Q. Huynh, and Ajmal Mian. 2014. HOPC: Histogram of oriented principal components of 3D pointclouds for action recognition. In Proceedings of the European Conference on Computer Vision. 742--757.Google ScholarGoogle ScholarCross RefCross Ref
  46. Michalis Raptis, Iasonas Kokkinos, and Stefano Soatto. 2012. Discovering discriminative action parts from mid-level video representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1242--1249.Google ScholarGoogle ScholarCross RefCross Ref
  47. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-Fei Li. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (2015), 211--252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Michael S. Ryoo. 2011. Human activity prediction: Early recognition of ongoing activities from streaming videos. In Proceedings of the IEEE International Conference on Computer Vision. 1036--1043.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. NTU RGB+D: A large scale dataset for 3D human activity analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1010--1019.Google ScholarGoogle ScholarCross RefCross Ref
  50. Amr Sharaf, Marwan Torki, Mohamed E. Hussein, and Motaz El-Saban. 2015. Real-time multi-scale action detection from 3D skeleton data. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 998--1005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Zhiyuan Shi and Tae-Kyun Kim. 2017. Learning and refining of privileged information-based RNNs for action recognition from depth sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3461--3470.Google ScholarGoogle ScholarCross RefCross Ref
  52. Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Proceedings of the Advances in Neural Information Processing Systems. 568--576.Google ScholarGoogle Scholar
  53. Sijie Song, Cuiling Lan, Junliang Xing, Wenjun Zeng, and Jiaying Liu. 2017. An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In Proceedings of the AAAI Conference on Artificial Intelligence. 4263--4270.Google ScholarGoogle ScholarCross RefCross Ref
  54. Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. Arxiv Preprint Arxiv:1212.0402 (2012).Google ScholarGoogle Scholar
  55. Jaeyong Sung, Colin Ponce, Bart Selman, and Ashutosh Saxena. 2011. Human activity detection from RGBD images. In Proceedings of the AAAI Conference on Artificial Intelligence. 47--55.Google ScholarGoogle Scholar
  56. Jaeyong Sung, Colin Ponce, Bart Selman, and Ashutosh Saxena. 2012. Unstructured human activity detection from RGBD images. In Proceedings of the IEEE International Conference on Robotics and Automation. 842--849.Google ScholarGoogle Scholar
  57. Yicong Tian, Rahul Sukthankar, and Mubarak Shah. 2013. Spatiotemporal deformable part models for action detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2642--2649.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Raviteja Vemulapalli and Rama Chellapa. 2016. Rolling rotations for recognizing human actions from 3D skeletal data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4471--4479.Google ScholarGoogle ScholarCross RefCross Ref
  59. Heng Wang and Cordelia Schmid. 2013. Action recognition with improved trajectories. In Proceedings of the IEEE International Conference on Computer Vision. 3551--3558.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Hongsong Wang and Liang Wang. 2017. Learning robust representations using recurrent neural networks for skeleton based action classification and detection. In Proceedings of the IEEE International Conference on Multimedia Expo Workshops. 591--596.Google ScholarGoogle Scholar
  61. Jiang Wang, Zicheng Liu, Ying Wu, and Junsong Yuan. 2012. Mining actionlet ensemble for action recognition with depth cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1290--1297.Google ScholarGoogle ScholarCross RefCross Ref
  62. Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, and Song-Chun Zhu. 2014. Cross-view action modeling, learning and recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2649--2656.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Keze Wang, Xiaolong Wang, Liang Lin, Meng Wang, and Wangmeng Zuo. 2014. 3D human activity recognition with reconfigurable convolutional neural networks. In Proceedings of the ACM International Conference on Multimedia. 97--106.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Limin Wang. 2014. Action recognition and detection by combining motion and appearance features. THUMOS (2014).Google ScholarGoogle Scholar
  65. LiMin Wang, Yu Qiao, and Xiaoou Tang. 2013. Motionlets: Mid-level 3d parts for human motion recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2674--2681.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Limin Wang, Yu Qiao, and Xiaoou Tang. 2014. Video action detection with relational dynamic-poselets. In Proceedings of the European Conference on Computer Vision. 565--580.Google ScholarGoogle ScholarCross RefCross Ref
  67. Limin Wang, Zhe Wang, Yuanjun Xiong, and Yu Qiao. 2015. CUHK8SIAT submission for THUMOS15 action recognition challenge. THUMOS (2015).Google ScholarGoogle Scholar
  68. Limin Wang, Yuanjun Xiong, Dahua Lin, and Luc Van Gool. 2017. Untrimmednets for weakly supervised action recognition and detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4325--4334.Google ScholarGoogle ScholarCross RefCross Ref
  69. Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. In Proceedings of the European Conference on Computer Vision. 20--36.Google ScholarGoogle ScholarCross RefCross Ref
  70. Ping Wei, Yibiao Zhao, Nanning Zheng, and Song-Chun Zhu. 2013. Modeling 4D human-object interactions for event and object recognition. In Proceedings of the IEEE International Conference on Computer Vision. 3272--3279.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Ping Wei, Nanning Zheng, Yibiao Zhao, and Song-Chun Zhu. 2013. Concurrent action detection with structural prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3136--3143.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Chenxia Wu, Jiemi Zhang, Silvio Savarese, and Ashutosh Saxena. 2015. Watch-n-patch: Unsupervised understanding of actions and relations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4362--4370.Google ScholarGoogle ScholarCross RefCross Ref
  73. Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, and Xiangyang Xue. 2015. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In Proceedings of the ACM International Conference on Multimedia. 461--470.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Lu Xia, Chia-Chih Chen, and J. K. Aggarwal. 2012. View invariant human action recognition using histograms of 3D joints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 20--27.Google ScholarGoogle Scholar
  75. Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence. 7444--7452.Google ScholarGoogle ScholarCross RefCross Ref
  76. Xiaodong Yang and YingLi Tian. 2014. Super normal vector for activity recognition using depth sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 804--811.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Jun Ye, Hao Hu, Guo-Jun Qi, and Kien A. Hua. 2017. A temporal order modeling approach to human action recognition from multimodal sensor data. ACM Trans. Multimedia Comput. Commun. Appl. 13, 2 (2017), 14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Kiwon Yun, Jean Honorio, Debaleena Chattopadhyay, Tamara L. Berg, and Dimitris Samaras. 2012. Two-person interaction detection using body-pose features and multiple instance learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 28--35.Google ScholarGoogle ScholarCross RefCross Ref
  79. Mihai Zanfir, Marius Leordeanu, and Cristian Sminchisescu. 2013. The moving pose: An efficient 3D kinematics descriptor for low-latency action recognition and detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2752--2759.Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Jing Zhang, Wanqing Li, Philip O. Ogunbona, Pichao Wang, and Chang Tang. 2016. RGB-D-based action recognition datasets: A survey. Pattern Recogn. 60, 1 (2016), 86--105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Lei Zhang, Shengping Zhang, Feng Jiang, Yuankai Qi, Jun Zhang, Yuliang Guo, and Huiyu Zhou. 2017. BoMW: Bag of manifold words for one-shot learning gesture recognition from kinect. IEEE Trans. Circ. Syst. Vid. Technol. 28, 10 (2017), 2562--2573.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Pengfei Zhang, Cuiling Lan, Junliang Xing, Wenjun Zeng, Jianru Xue, and Nanning Zheng. 2017. View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In Proceedings of the IEEE International Conference on Computer Vision. 2117--2126.Google ScholarGoogle ScholarCross RefCross Ref
  83. Yue Zhao, Yuanjun Xiong, Limin Wang, Zhirong Wu, Dahua Lin, and Xiaoou Tang. 2017. Temporal action detection with structured segment networks. In Proceedings of the IEEE International Conference on Computer Vision. 2914--2923.Google ScholarGoogle ScholarCross RefCross Ref
  84. Jun Zhu, Baoyuan Wang, Xiaokang Yang, Wenjun Zhang, and Zhuowen Tu. 2013. Action recognition with actions. In Proceedings of the IEEE International Conference on Computer Vision. 3559--3566.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Wentao Zhu, Cuiling Lan, Junliang Xing, Wenjun Zeng, Yanghao Li, Li Shen, and Xiaohui Xie. 2016. Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In Proceedings of the AAAI Conference on Artificial Intelligence. 3697--3703.Google ScholarGoogle ScholarCross RefCross Ref
  86. Mohammadreza Zolfaghari, Gabriel L. Oliveira, Nima Sedaghat, and Thomas Brox. 2017. Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2923--2932.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Benchmark Dataset and Comparison Study for Multi-modal Human Action Analytics

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 16, Issue 2
        May 2020
        390 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3401894
        Issue’s Table of Contents

        Copyright © 2020 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 May 2020
        • Online AM: 7 May 2020
        • Accepted: 1 September 2019
        • Revised: 1 August 2019
        • Received: 1 May 2019
        Published in tomm Volume 16, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!