Abstract
Online human gesture recognition has a wide range of applications in computer vision, especially in human-computer interaction applications. The recent introduction of cost-effective depth cameras brings a new trend of research on body-movement gesture recognition. However, there are two major challenges: (i) how to continuously detect gestures from unsegmented streams, and (ii) how to differentiate different styles of the same gesture from other types of gestures. In this article, we solve these two problems with a new effective and efficient feature extraction method—Structured Streaming Skeleton (SSS)—which uses a dynamic matching approach to construct a feature vector for each frame. Our comprehensive experiments on MSRC-12 Kinect Gesture, Huawei/3DLife-2013, and MSR-Action3D datasets have demonstrated superior performances than the state-of-the-art approaches. We also demonstrate model selection based on the proposed SSS feature, where the classifier of squared loss regression with l2,1 norm regularization is a recommended classifier for best performance.
- J. K. Aggarwal and M. S. Ryoo. 2011. Human activity analysis: A review. ACM Comput. Surv. 43, 3 (2011), 16. Google Scholar
Digital Library
- Jonathan Alon, Vassilis Athitsos, Quan Yuan, and Stan Sclaroff. 2009. A unified framework for gesture recognition and spatiotemporal gesture segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 31, 9 (2009), 1685--1699. Google Scholar
Digital Library
- Mihael Ankerst, Markus M Breunig, Hans-Peter Kriegel, and Jörg Sander. 1999. OPTICS: Ordering points to identify the clustering structure. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 49--60. Google Scholar
Digital Library
- Peter L. Bartlett, Stéphane Boucheron, and Gábor Lugosi. 2002. Model selection and error estimation. Machine Learn. 48, 1--3 (2002), 85--113. Google Scholar
Digital Library
- D. Berndt and J. Clifford. 1994. Using dynamic time warping to find patterns in time series. In Proceedings of the KDD Workshop, Vol. 10. 359--370.Google Scholar
- Victoria Bloom, Dimitrios Makris, and Vasileios Argyriou. 2012. G3D: A gaming action dataset and real time action recognition evaluation framework. In Proceedings of the Computer Vision and Pattern Recognition Workshops (CVPRW). 7--12.Google Scholar
Cross Ref
- Paul S. Bradley and Olvi L. Mangasarian. 1998. Feature selection via concave minimization and support vector machines. In Proceedings of the International Conference on Machine Learning (ICML), Vol. 98. 82--90. Google Scholar
Digital Library
- Jose M. Chaquet, Enrique Carmona, and Antonio Fernández-Caballero. 2013. A survey of video datasets for human action and activity recognition. Comput. Vision Image Understand. 117, 6 (2013), 633--659. Google Scholar
Digital Library
- Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine Learn. 20, 3 (1995), 273--297. Google Scholar
Digital Library
- C. Ellis, S. Z. Masood, M. F. Tappen, J. J. LaViola, and R. Sukthankar. 2013. Exploring the trade-off between accuracy and observational latency in action recognition. Int. J. Comput. Vision 101, 3 (2013), 420--436. Google Scholar
Digital Library
- Simon Fothergill, Helena M. Mentis, Pushmeet Kohli, and Sebastian Nowozin. 2012. Instructing people for training gestural interactive systems. In Proceedings of the ACM Annual Conference on Human Factors in Computing Systems (CHI). 1737--1746. Google Scholar
Digital Library
- Dian Gong, Gérard Medioni, Sikai Zhu, and Xuemei Zhao. 2012. Kernelized temporal cut for online temporal segmentation and recognition. In Proceedings of the European Conference on Computer Vision (ECCV). 229--243. Google Scholar
Digital Library
- Tanaya Guha and Rabab K. Ward. 2012. Learning sparse representations for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 34, 8 (2012), 1576--1588. Google Scholar
Digital Library
- Raj Gupta, Alex Yong-Sang Chia, and Deepu Rajan. 2013. Human activities recognition using depth images. In Proceedings of the 21st ACM International Conference on Multimedia. 283--292. Google Scholar
Digital Library
- Huawei. 2013. Huawei/3DLife ACM Multimedia Grand Challenge 2013. http://mmv.eecs.qmul.ac.uk/mmgc2013/(2013).Google Scholar
- J. A. Hartigan and M. A Wong. 1979. A k-means clustering algorithm. J. Royal Stat. Soc. C 28 (1979), 100--108.Google Scholar
- Mohamed E. Hussein, Marwan Torki, Mohammad A. Gowayyed, and Motaz El-Saban. 2013. Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 2466--2472. Google Scholar
Digital Library
- G. Johansson. 1975. Visual motion perception. Sci. Am. 232, 6 (1975), 76--88.Google Scholar
- Leonard Kaufman and Peter Rousseeuw. 1987. Clustering by means of medoids. In Statistical Data Analysis Based on the L1-Norm and Related Methods. Birkhäuser Basel, 405--416.Google Scholar
- Fei-Fei Li and Pietro Perona. 2005. A bayesian hierarchical model for learning natural scene categories. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 2. 524--531. Google Scholar
Digital Library
- Hong Li and Michael Greenspan. 2011. Model-based segmentation and recognition of dynamic gestures in continuous video streams. Pattern Recogn. 44, 8 (2011), 1614--1628. Google Scholar
Digital Library
- W. Li, Z. Zhang, and Z. Liu. 2010. Action recognition based on a bag of 3d points. In Proceedings of the CVPR Workshop. 9--14.Google Scholar
- Shih-Yao Lin, Chuen-Kai Shie, Shen-Chi Chen, and Yi-Ping Hung. 2012. Action recognition for human-marionette interaction. In Proceedings of the ACM International Conference on Multimedia (MM). 39--48. Google Scholar
Digital Library
- F. Lv and R. Nevatia. 2006. Recognition and segmentation of 3-d human action using hmm and multi-class adaboost. In Proceedings of the European Conference on Computer Vision (ECCV). 359--372. Google Scholar
Digital Library
- J. Martens and I. Sutskever. 2011. Learning recurrent neural networks with Hessian-free optimization. In Proceedings of the International Conference on Machine Learning (ICML). 1033--1040.Google Scholar
- Meinard Müller, Andreas Baak, and Hans-Peter Seidel. 2009. Efficient and robust annotation of motion capture data. In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA). 17--26. Google Scholar
Digital Library
- A. Y. Ng, M. I. Jordan, and Y. Weiss. 2002. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems (NIPS). 849--856.Google Scholar
- Panagiotis Papapetrou, Vassilis Athitsos, Michalis Potamias, George Kollios, and Dimitrios Gunopulos. 2011. Embedding-based subsequence matching in time-series databases. ACM Trans. Datab. Syst. 36, 3 (2011), 17. Google Scholar
Digital Library
- Ronald Poppe. 2010. A survey on vision-based human action recognition. Image Vision Comput. 28, 6 (2010), 976--990. Google Scholar
Digital Library
- Ryan Rifkin, Gene Yeo, and Tomaso Poggio. 2003. Regularized least-squares classification. Nato Sci. Series Sub Series III Comput. Syst. Sci. 190 (2003), 131--154.Google Scholar
- M. S. Ryoo. 2011. Human activity prediction: Early recognition of ongoing activities from streaming videos. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 1036--1043. Google Scholar
Digital Library
- Yasushi Sakurai, Christos Faloutsos, and Masashi Yamamuro. 2007. Stream monitoring under the time warping distance. In Proceedings of the IEEE International Conference on Data Engineering (ICDE). 1046--1055.Google Scholar
Cross Ref
- L. A. Schwarz, D. Mateus, V. Castañeda, and N. Navab. 2010. Manifold learning for ToF-based human body tracking and activity recognition. In Proceedings of the British Machine Vision Conference (BMVC). 1--11.Google Scholar
- Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, and Andrew Blake. 2011. Real-time human pose recognition in parts from single depth images. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). 1297--1304. Google Scholar
Digital Library
- Yale Song, David Demirdjian, and Randall Davis. 2012. Continuous body and hand gesture recognition for natural human-computer interaction. ACM Trans. Interactive Intell. Syst. 2, 1 (2012), 5. Google Scholar
Digital Library
- K. N. Tran, I. A. Kakadiaris, and S. K. Shah. 2012. Part-based motion descriptor image for human action recognition. Pattern Recog. 45, 7 (2012), 2562--2572. Google Scholar
Digital Library
- Pavan Turaga, Rama Chellappa, Venkatramana S. Subrahmanian, and Octavian Udrea. 2008. Machine recognition of human activities: A survey. IEEE Trans. Circuits Syst. Video Technol. 18, 11 (2008), 1473--1488. Google Scholar
Digital Library
- Ashok Veeraraghavan, Rama Chellappa, and Amit K. Roy-Chowdhury. 2006. The function space of an activity. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 1. 959--968. Google Scholar
Digital Library
- Jiang Wang, Zicheng Liu, Ying Wu, and Junsong Yuan. 2012a. Mining actionlet ensemble for action recognition with depth cameras. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). 1290--1297. Google Scholar
Digital Library
- Sen Wang, Yi Yang, Zhigang Ma, Xue Li, Chaoyi Pang, and Alexander G. Hauptmann. 2012b. Action recognition by exploring data distribution and feature correlation. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR). 1370--1377. Google Scholar
Digital Library
- Yi Yang, Zhigang Ma, Alexander G. Hauptmann, and Nicu Sebe. 2013a. Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans. Multimedia 15, 3 (2013), 661--669. Google Scholar
Digital Library
- Yang Yang, Imran Saleemi, and Mubarak Shah. 2013b. Discovering motion primitives for unsupervised grouping and one-shot learning of human actions, gestures, and expressions. IEEE Trans. Pattern Anal. Mach. Intell. 35, 7 (2013), 1635--1648. Google Scholar
Digital Library
- Zhang Zhang and Dacheng Tao. 2012. Slow feature analysis for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 34, 3 (2012), 436--450. Google Scholar
Digital Library
- F. Zhou, F. Torre, and J. K. Hodgins. 2008. Aligned cluster analysis for temporal segmentation of human motion. In Proceedings of the IEEE Conference on Automatic Face and Gesture Recognition (FG). 1--7.Google Scholar
Index Terms
Structured Streaming Skeleton -- A New Feature for Online Human Gesture Recognition
Recommendations
Online human gesture recognition from motion data streams
MM '13: Proceedings of the 21st ACM international conference on MultimediaOnline human gesture recognition has a wide range of applications in computer vision, especially in human-computer interaction applications. Recent introduction of cost-effective depth cameras brings on a new trend of research on body-movement gesture ...
Multi-scenario gesture recognition using Kinect
CGAMES '12: Proceedings of the 2012 17th International Conference on Computer Games: AI, Animation, Mobile, Interactive Multimedia, Educational & Serious Games (CGAMES)Hand gesture recognition (HGR) is an important research topic because some situations require silent communication with sign languages. Computational HGR systems assist silent communication, and help people learn a sign language. In this article, a ...
A Method for Hand Gesture Recognition
CSNT '14: Proceedings of the 2014 Fourth International Conference on Communication Systems and Network TechnologiesIn this paper, we present a method for hand gesture recognition using Microsoft Kinect sensor. Kinect allows capturing dense, and three dimensional scans of an object in real time. We propose a combination of modelling and learning approach for hand ...






Comments