Abstract
Large-scale benchmarks provide a solid foundation for the development of action analytics. Most of the previous activity benchmarks focus on analyzing actions in RGB videos. There is a lack of large-scale and high-quality benchmarks for multi-modal action analytics. In this article, we introduce PKU Multi-Modal Dataset (PKU-MMD), a new large-scale benchmark for multi-modal human action analytics. It consists of about 28,000 action instances and 6.2 million frames in total and provides high-quality multi-modal data sources, including RGB, depth, infrared radiation (IR), and skeletons. To make PKU-MMD more practical, our dataset comprises two subsets under different settings for action understanding, namely Part I and Part II. Part I contains 1,076 untrimmed video sequences with 51 action classes performed by 66 subjects, while Part II contains 1,009 untrimmed video sequences with 41 action classes performed by 13 subjects. Compared to Part I, Part II is more challenging due to short action intervals, concurrent actions and heavy occlusion. PKU-MMD can be leveraged in two scenarios: action recognition with trimmed video clips and action detection with untrimmed video sequences. For each scenario, we provide benchmark performance on both subsets by conducting different methods with different modalities under two evaluation protocols, respectively. Experimental results show that PKU-MMD is a significant challenge to many state-of-the-art methods. We further illustrate that the features learned on PKU-MMD can be well transferred to other datasets. We believe this large-scale dataset will boost the research in the field of action analytics for the community.
- S. Mohsen Amiri, Mahsa T. Pourazad, Panos Nasiopoulos, and Victor C. M. Leung. 2013. Non-intrusive human activity monitoring in a smart home environment. In Proceedings of the IEEE International Conference on E-health Networking, Application 8 Services (Healthcom’13). 606--610.Google Scholar
- Victoria Bloom, Dimitrios Makris, and Vasileios Argyriou. 2012. G3D: A gaming action dataset and real time action recognition evaluation framework. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 7--12.Google Scholar
Cross Ref
- Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. Activitynet: A large-scale video benchmark for human activity understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 961--970.Google Scholar
Cross Ref
- Ziyun Cai, Jungong Han, Li Liu, and Ling Shao. 2017. RGB-D datasets using microsoft kinect or similar sensors: A survey. Multimedia Tools Appl. 76, 3 (2017), 4313--4355.Google Scholar
Digital Library
- Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? A new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4724--4733.Google Scholar
Cross Ref
- Chen Chen, Roozbeh Jafari, and Nasser Kehtarnavaz. 2015. UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In Proceedings of the IEEE International Conference on Image Processing. 168--172.Google Scholar
Digital Library
- Zhongwei Cheng, Lei Qin, Yituo Ye, Qingming Huang, and Qi Tian. 2012. Human daily action analysis with multi-view and color-depth data. In Proceedings of the European Conference on Computer Vision. 52--61.Google Scholar
Digital Library
- CMU. 2003. CMU Graphics Lab Motion Capture Database. Retrieved from http://mocap.cs.cmu.edu/.Google Scholar
- Roeland De Geest, Efstratios Gavves, Amir Ghodrati, Zhenyang Li, Cees Snoek, and Tinne Tuytelaars. 2016. Online action detection. In Proceedings of the European Conference on Computer Vision. 269--284.Google Scholar
Cross Ref
- Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2625--2634.Google Scholar
Cross Ref
- Yong Du, Yun Fu, and Liang Wang. 2016. Representation learning of temporal dynamics for skeleton-based action recognition. IEEE Trans. Image Process. 25, 7 (2016), 3010--3022.Google Scholar
Cross Ref
- Victor Escorcia, Fabian Caba Heilbron, Juan Carlos Niebles, and Bernard Ghanem. 2016. DAPs: Deep action proposals for action understanding. In Proceedings of the European Conference on Computer Vision. 768--784.Google Scholar
Cross Ref
- Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88, 2 (2010), 303--338.Google Scholar
Digital Library
- Christoph Feichtenhofer, Axel Pinz, and Andrew Zisserman. 2016. Convolutional two-stream network fusion for video action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1933--1941.Google Scholar
Cross Ref
- Simon Fothergill, Helena Mentis, Pushmeet Kohli, and Sebastian Nowozin. 2012. Instructing people for training gestural interactive systems. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems. 1737--1746.Google Scholar
Digital Library
- Kaiming He, Ross B. Girshick, and Piotr Dollár. 2019. Rethinking imagenet pre-training. In Proceedings of the IEEE International Conference on Computer Vision. 4918--4927.Google Scholar
Cross Ref
- Minh Hoai and Fernando De la Torre. 2014. Max-margin early event detectors. Int. J. Comput. Vis. 107, 2 (2014), 191--202.Google Scholar
Digital Library
- J. F. Hu, W. S. Zheng, J. H. Lai, and J Zhang. 2016. Jointly learning heterogeneous features for RGB-D activity recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39, 11 (2016), 2186--2200.Google Scholar
Cross Ref
- Yueyu Hu, Chunhui Liu, Yanghao Li, Sijie Song, and Jiaying Liu. 2017. Temporal perceptive network for skeleton-based action recognition. In Proceedings of the British Machine Vision Conference. 1--12.Google Scholar
Cross Ref
- Min Huang, Song-Zhi Su, Hong-Bo Zhang, Guo-Rong Cai, Dongying Gong, Donglin Cao, and Shao-Zi Li. 2018. Multifeature selection for 3D human action recognition. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2 (2018), 45.Google Scholar
Digital Library
- Arpit Jain, Abhinav Gupta, Mikel Rodriguez, and Larry S. Davis. 2013. Representing videos using mid-level discriminative patches. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2571--2578.Google Scholar
- Manan Jain, Jan Van Gemert, Hervé Jégou, Patrick Bouthemy, and Cees G. M. Snoek. 2014. Action localization with tubelets from motion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 740--747.Google Scholar
- Feng Jiang, Shengping Zhang, Shen Wu, Yang Gao, and Debin Zhao. 2015. Multi-layered gesture recognition with kinect. J. Mach. Learn. Res. 16, 1 (2015), 227--254.Google Scholar
Digital Library
- Yu-Gang Jiang, Qi Dai, Wei Liu, Xiangyang Xue, and Chong-Wah Ngo. 2015. Human action recognition in unconstrained videos by explicit motion modeling. IEEE Trans. Image Process. 24, 11 (2015), 3781--3795.Google Scholar
Digital Library
- Zhuolin Jiang, Viktor Rozgic, and Sancar Adali. 2017. Learning spatiotemporal features for infrared action recognition with 3D convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 115--123.Google Scholar
Cross Ref
- Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1725--1732.Google Scholar
Digital Library
- H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre. 2011. HMDB: A large video database for human motion recognition. In Proceedings of the IEEE International Conference on Computer Vision. 2556--2563.Google Scholar
- Ivan Laptev, Marcin Marszalek, Cordelia Schmid, and Benjamin Rozenfeld. 2008. Learning realistic human actions from movies. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--8.Google Scholar
Cross Ref
- Bo Li, Huahui Chen, Yucheng Chen, Yuchao Dai, and Mingyi He. 2017. Skeleton boxes: Solving skeleton based action detection with a single deep convolutional neural network. In Proceedings of the IEEE International Conference on Multimedia Expo Workshops. 613--616.Google Scholar
- Chao Li, Qiaoyong Zhong, Di Xie, and Shiliang Pu. 2017. Skeleton-based action recognition with convolutional neural networks. In Proceedings of the IEEE International Conference on Multimedia Expo Workshops. 597--600.Google Scholar
- Chao Li, Qiaoyong Zhong, Di Xie, and Shiliang Pu. 2018. Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In Proceedings of the International Joint Conference on Artificial Intelligence. 786--792.Google Scholar
Cross Ref
- Wanqing Li, Zhengyou Zhang, and Zicheng Liu. 2010. Action recognition based on a bag of 3D points. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9--14.Google Scholar
Cross Ref
- Yanghao Li, Cuiling Lan, Junliang Xing, Wenjun Zeng, Chunfeng Yuan, and Jiaying Liu. 2016. Online human action detection using joint classification-regression recurrent neural networks. In Proceedings of the European Conference on Computer Vision. 203--220.Google Scholar
Cross Ref
- Ivan Lillo, Alvaro Soto, and Juan Carlos Niebles. 2014. Discriminative hierarchical modeling of spatio-temporally composable human activities. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 812--819.Google Scholar
Digital Library
- An-An Liu, Wei-Zhi Nie, Yu-Ting Su, Li Ma, Tong Hao, and Zhao-Xuan Yang. 2015. Coupled hidden conditional random fields for RGB-D human action recognition. Signal Processing 112 (2015), 74--82.Google Scholar
Digital Library
- Chunhui Liu, Yueyu Hu, Yanghao Li, Sijie Song, and Jiaying Liu. 2017. PKU-MMD: A large scale benchmark for skeleton-based human action understanding. In Proceedings of the Workshop on Visual Analysis in Smart and Connected Communities. 1--8.Google Scholar
Digital Library
- Zelun Luo, Jun-Ting Hsieh, Lu Jiang, Juan Carlos Niebles, and Li Fei-Fei. 2018. Graph distillation for action detection with privileged modalities. In Proceedings of the European Conference on Computer Vision. 174--192.Google Scholar
Cross Ref
- Behrooz Mahasseni and Sinisa Todorovic. 2016. Regularizing long short term memory with 3D human-skeleton sequences for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3054--3062.Google Scholar
Cross Ref
- Jie Miao, Xiangmin Xu, Shuoyang Qiu, Chunmei Qing, and Dacheng Tao. 2015. Temporal variance analysis for action recognition. IEEE Trans. Image Process. 24, 12 (2015), 5904--5915.Google Scholar
Cross Ref
- M. Müller, T. Röder, M. Clausen, B. Eberhardt, B. Krüger, and A. Weber. 2007. Documentation Mocap Database HDM05. Technical Report CG-2007-2.Google Scholar
- Ferda Ofli, Rizwan Chaudhry, Gregorij Kurillo, René Vidal, and Ruzena Bajcsy. 2013. Berkeley MHAD: A comprehensive multimodal human action database. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 53--60.Google Scholar
Digital Library
- Omar Oreifej and Zicheng Liu. 2013. Hon4D: Histogram of oriented 4D normals for activity recognition from depth sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Digital Library
- Florent Perronnin, Jorge Sánchez, and Thomas Mensink. 2010. Improving the fisher kernel for large-scale image classification. In Proceedings of the European Conference on Computer Vision. 143--156.Google Scholar
Cross Ref
- Hossein Rahmani, Arif Mahmood, Du Huynh, and Ajmal Mian. 2016. Histogram of oriented principal components for cross-view action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38, 12 (2016), 2430--2443.Google Scholar
Digital Library
- Hossein Rahmani, Arif Mahmood, Du Q. Huynh, and Ajmal Mian. 2014. HOPC: Histogram of oriented principal components of 3D pointclouds for action recognition. In Proceedings of the European Conference on Computer Vision. 742--757.Google Scholar
Cross Ref
- Michalis Raptis, Iasonas Kokkinos, and Stefano Soatto. 2012. Discovering discriminative action parts from mid-level video representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1242--1249.Google Scholar
Cross Ref
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-Fei Li. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (2015), 211--252.Google Scholar
Digital Library
- Michael S. Ryoo. 2011. Human activity prediction: Early recognition of ongoing activities from streaming videos. In Proceedings of the IEEE International Conference on Computer Vision. 1036--1043.Google Scholar
Digital Library
- Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. NTU RGB+D: A large scale dataset for 3D human activity analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1010--1019.Google Scholar
Cross Ref
- Amr Sharaf, Marwan Torki, Mohamed E. Hussein, and Motaz El-Saban. 2015. Real-time multi-scale action detection from 3D skeleton data. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 998--1005.Google Scholar
Digital Library
- Zhiyuan Shi and Tae-Kyun Kim. 2017. Learning and refining of privileged information-based RNNs for action recognition from depth sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3461--3470.Google Scholar
Cross Ref
- Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Proceedings of the Advances in Neural Information Processing Systems. 568--576.Google Scholar
- Sijie Song, Cuiling Lan, Junliang Xing, Wenjun Zeng, and Jiaying Liu. 2017. An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In Proceedings of the AAAI Conference on Artificial Intelligence. 4263--4270.Google Scholar
Cross Ref
- Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. Arxiv Preprint Arxiv:1212.0402 (2012).Google Scholar
- Jaeyong Sung, Colin Ponce, Bart Selman, and Ashutosh Saxena. 2011. Human activity detection from RGBD images. In Proceedings of the AAAI Conference on Artificial Intelligence. 47--55.Google Scholar
- Jaeyong Sung, Colin Ponce, Bart Selman, and Ashutosh Saxena. 2012. Unstructured human activity detection from RGBD images. In Proceedings of the IEEE International Conference on Robotics and Automation. 842--849.Google Scholar
- Yicong Tian, Rahul Sukthankar, and Mubarak Shah. 2013. Spatiotemporal deformable part models for action detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2642--2649.Google Scholar
Digital Library
- Raviteja Vemulapalli and Rama Chellapa. 2016. Rolling rotations for recognizing human actions from 3D skeletal data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4471--4479.Google Scholar
Cross Ref
- Heng Wang and Cordelia Schmid. 2013. Action recognition with improved trajectories. In Proceedings of the IEEE International Conference on Computer Vision. 3551--3558.Google Scholar
Digital Library
- Hongsong Wang and Liang Wang. 2017. Learning robust representations using recurrent neural networks for skeleton based action classification and detection. In Proceedings of the IEEE International Conference on Multimedia Expo Workshops. 591--596.Google Scholar
- Jiang Wang, Zicheng Liu, Ying Wu, and Junsong Yuan. 2012. Mining actionlet ensemble for action recognition with depth cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1290--1297.Google Scholar
Cross Ref
- Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, and Song-Chun Zhu. 2014. Cross-view action modeling, learning and recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2649--2656.Google Scholar
Digital Library
- Keze Wang, Xiaolong Wang, Liang Lin, Meng Wang, and Wangmeng Zuo. 2014. 3D human activity recognition with reconfigurable convolutional neural networks. In Proceedings of the ACM International Conference on Multimedia. 97--106.Google Scholar
Digital Library
- Limin Wang. 2014. Action recognition and detection by combining motion and appearance features. THUMOS (2014).Google Scholar
- LiMin Wang, Yu Qiao, and Xiaoou Tang. 2013. Motionlets: Mid-level 3d parts for human motion recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2674--2681.Google Scholar
Digital Library
- Limin Wang, Yu Qiao, and Xiaoou Tang. 2014. Video action detection with relational dynamic-poselets. In Proceedings of the European Conference on Computer Vision. 565--580.Google Scholar
Cross Ref
- Limin Wang, Zhe Wang, Yuanjun Xiong, and Yu Qiao. 2015. CUHK8SIAT submission for THUMOS15 action recognition challenge. THUMOS (2015).Google Scholar
- Limin Wang, Yuanjun Xiong, Dahua Lin, and Luc Van Gool. 2017. Untrimmednets for weakly supervised action recognition and detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4325--4334.Google Scholar
Cross Ref
- Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. In Proceedings of the European Conference on Computer Vision. 20--36.Google Scholar
Cross Ref
- Ping Wei, Yibiao Zhao, Nanning Zheng, and Song-Chun Zhu. 2013. Modeling 4D human-object interactions for event and object recognition. In Proceedings of the IEEE International Conference on Computer Vision. 3272--3279.Google Scholar
Digital Library
- Ping Wei, Nanning Zheng, Yibiao Zhao, and Song-Chun Zhu. 2013. Concurrent action detection with structural prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3136--3143.Google Scholar
Digital Library
- Chenxia Wu, Jiemi Zhang, Silvio Savarese, and Ashutosh Saxena. 2015. Watch-n-patch: Unsupervised understanding of actions and relations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4362--4370.Google Scholar
Cross Ref
- Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, and Xiangyang Xue. 2015. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In Proceedings of the ACM International Conference on Multimedia. 461--470.Google Scholar
Digital Library
- Lu Xia, Chia-Chih Chen, and J. K. Aggarwal. 2012. View invariant human action recognition using histograms of 3D joints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 20--27.Google Scholar
- Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence. 7444--7452.Google Scholar
Cross Ref
- Xiaodong Yang and YingLi Tian. 2014. Super normal vector for activity recognition using depth sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 804--811.Google Scholar
Digital Library
- Jun Ye, Hao Hu, Guo-Jun Qi, and Kien A. Hua. 2017. A temporal order modeling approach to human action recognition from multimodal sensor data. ACM Trans. Multimedia Comput. Commun. Appl. 13, 2 (2017), 14.Google Scholar
Digital Library
- Kiwon Yun, Jean Honorio, Debaleena Chattopadhyay, Tamara L. Berg, and Dimitris Samaras. 2012. Two-person interaction detection using body-pose features and multiple instance learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 28--35.Google Scholar
Cross Ref
- Mihai Zanfir, Marius Leordeanu, and Cristian Sminchisescu. 2013. The moving pose: An efficient 3D kinematics descriptor for low-latency action recognition and detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2752--2759.Google Scholar
Digital Library
- Jing Zhang, Wanqing Li, Philip O. Ogunbona, Pichao Wang, and Chang Tang. 2016. RGB-D-based action recognition datasets: A survey. Pattern Recogn. 60, 1 (2016), 86--105.Google Scholar
Digital Library
- Lei Zhang, Shengping Zhang, Feng Jiang, Yuankai Qi, Jun Zhang, Yuliang Guo, and Huiyu Zhou. 2017. BoMW: Bag of manifold words for one-shot learning gesture recognition from kinect. IEEE Trans. Circ. Syst. Vid. Technol. 28, 10 (2017), 2562--2573.Google Scholar
Digital Library
- Pengfei Zhang, Cuiling Lan, Junliang Xing, Wenjun Zeng, Jianru Xue, and Nanning Zheng. 2017. View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In Proceedings of the IEEE International Conference on Computer Vision. 2117--2126.Google Scholar
Cross Ref
- Yue Zhao, Yuanjun Xiong, Limin Wang, Zhirong Wu, Dahua Lin, and Xiaoou Tang. 2017. Temporal action detection with structured segment networks. In Proceedings of the IEEE International Conference on Computer Vision. 2914--2923.Google Scholar
Cross Ref
- Jun Zhu, Baoyuan Wang, Xiaokang Yang, Wenjun Zhang, and Zhuowen Tu. 2013. Action recognition with actions. In Proceedings of the IEEE International Conference on Computer Vision. 3559--3566.Google Scholar
Digital Library
- Wentao Zhu, Cuiling Lan, Junliang Xing, Wenjun Zeng, Yanghao Li, Li Shen, and Xiaohui Xie. 2016. Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In Proceedings of the AAAI Conference on Artificial Intelligence. 3697--3703.Google Scholar
Cross Ref
- Mohammadreza Zolfaghari, Gabriel L. Oliveira, Nima Sedaghat, and Thomas Brox. 2017. Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2923--2932.Google Scholar
Cross Ref
Index Terms
A Benchmark Dataset and Comparison Study for Multi-modal Human Action Analytics
Recommendations
Multi-modal & Multi-view & Interactive Benchmark Dataset for Human Action Recognition
MM '15: Proceedings of the 23rd ACM international conference on MultimediaHuman action recognition is one of the most active research areas in both computer vision and machine learning communities. Several methods for human action recognition have been proposed in the literature and promising results have been achieved on the ...
Online temporal classification of human action using action inference graph
Highlights- Traditional deep methods recognize action in a video sequence by averaging the results of all the clips/frames in the video sequence.
AbstractNowadays, deep learning methods have achieved state-of-the-art results in human action recognition. These methods process a full video sequence to recognize an action, which is unnecessary because many frames are similar. Recently, ...
Recognizing 50 human action categories of web videos
Action recognition on large categories of unconstrained videos taken from the web is a very challenging problem compared to datasets like KTH (6 actions), IXMAS (13 actions), and Weizmann (10 actions). Challenges like camera motion, different viewpoints,...






Comments