Abstract
For the merits of high-order statistics and Riemannian geometry, covariance matrix has become a generic feature representation for action recognition. An independent action can be represented by an empirical statistics over all of its pose samples. Two major problems of covariance include the following: (1) it is prone to be singular so that actions fail to be represented properly, and (2) it is short of global action/pose-aware information so that expressive and discriminative power is limited. In this article, we propose a novel Bayesian covariance representation by a prior regularization method to solve the preceding problems. Specifically, covariance is viewed as a parametric maximum likelihood estimate of Gaussian distribution over local poses from an independent action. Then, a Global Informative Prior (GIP) is generated over global poses with sufficient statistics to regularize covariance. In this way, (1) singularity is greatly relieved due to sufficient statistics, (2) global pose information of GIP makes Bayesian covariance theoretically equivalent to a saliency weighting covariance over global action poses so that discriminative characteristics of actions can be represented more clearly. Experimental results show that our Bayesian covariance with GIP efficiently improves the performance of action recognition. In some databases, it outperforms the state-of-the-art variant methods that are based on kernels, temporal-order structures, and saliency weighting attentions, among others.
- [1] . 2007. Geometric means in a novel vector space structure on symmetric positive-definite matrices. SIAM Journal on Matrix Analysis and Applications 29, 1 (2007), 328–347.Google Scholar
Cross Ref
- [2] . 2010. Riemannian metric and geometric mean for positive semidefinite matrices of fixed rank. SIAM Journal on Matrix Analysis and Applications 31, 3 (2010), 1055–1070. Google Scholar
Digital Library
- [3] . 2009. Fine: Fisher information nonparametric embedding. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 11 (2009), 2093–2098. Google Scholar
Digital Library
- [4] . 2016. Kernelized covariance for action recognition. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR’16). IEEE, Los Alamitos, CA, 408–413.Google Scholar
Cross Ref
- [5] . 2016. Real-time human action recognition based on depth motion maps. Journal of Real-Time Image Processing 12, 1 (2016), 155–163. Google Scholar
Digital Library
- [6] . 2015. Combining unsupervised learning and discrimination for 3D action recognition. Signal Processing 110 (2015), 67–81. Google Scholar
Digital Library
- [7] . 2015. Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1110–1118.Google Scholar
- [8] . 2015. Approximate infinite-dimensional region covariance descriptors for image classification. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’15). IEEE, Los Alamitos, CA, 1364–1368.Google Scholar
Cross Ref
- [9] . 2016. Image set classification by symmetric positive semi-definite matrices. In Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV’16). IEEE, Los Alamitos, CA, 1–8.Google Scholar
Cross Ref
- [10] . 2017. Transition forests: Learning discriminative temporal transitions for action recognition and detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 407–415.Google Scholar
Cross Ref
- [11] . 2007. Actions as space-time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 12 (
Dec. 2007), 2247–2253. Google ScholarDigital Library
- [12] . 2013. Histogram of oriented displacements (HOD): Describing trajectories of human joints for action recognition. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence. Google Scholar
Digital Library
- [13] . 2014. Bregman divergences for infinite dimensional covariance matrices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1003–1010. Google Scholar
Digital Library
- [14] . 2014. From manifold to manifold: Geometry-aware dimensionality reduction for SPD matrices. In Proceedings of the European Conference on Computer Vision. 17–32.Google Scholar
Cross Ref
- [15] . 2012. Sparse coding and dictionary learning for symmetric positive definite matrices: A kernel approach. In Proceedings of the European Conference on Computer Vision. 216–229. Google Scholar
Digital Library
- [16] . 2018. Multifeature selection for 3D human action recognition. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 2 (2018), 45. Google Scholar
Digital Library
- [17] . 2017. A Riemannian network for SPD matrix learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. Google Scholar
Digital Library
- [18] . 2013. Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence. Google Scholar
Digital Library
- [19] . 1999. Exploiting generative models in discriminative classifiers. In Advances in Neural Information Processing Systems. 487–493. Google Scholar
Digital Library
- [20] . 2016. Sparse coding for third-order super-symmetric tensor descriptors with application to texture recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5395–5403.Google Scholar
Cross Ref
- [21] . 2016. Tensor representations via kernel linearization for action recognition from 3D skeletons. In Proceedings of the European Conference on Computer Vision. 37–53.Google Scholar
Cross Ref
- [22] . 2017. Regularization of the kernel matrix via covariance matrix shrinkage estimation. arXiv:1707.06156.Google Scholar
- [23] . 2014. Multi-target shrinkage estimation for covariance matrices. IEEE Transactions on Signal Processing 62, 24 (2014), 6380–6390.Google Scholar
Cross Ref
- [24] . 2010. Action recognition based on a bag of 3D points. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, Los Alamitos, CA, 9–14.Google Scholar
Cross Ref
- [25] . 2008. Visual tracking via incremental Log-Euclidean Riemannian subspace learning. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 1–8.Google Scholar
- [26] . 2017. Global context-aware attention LSTM networks for 3D action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1647–1656.Google Scholar
Cross Ref
- [27] . 2017. Covariances in computer vision and machine learning. Synthesis Lectures on Computer Vision 7, 4 (2017), 1–170. Google Scholar
Digital Library
- [28] . 2012. Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge, MA. Google Scholar
Digital Library
- [29] . 2019. Hierarchical Gaussian descriptor based on local pooling for action recognition. Machine Vision and Applications 30 (2019), 321–343. Google Scholar
Digital Library
- [30] . 2013. HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 716–723. Google Scholar
Digital Library
- [31] . 2008. Gabor-based region covariance matrices for face recognition. IEEE Transactions on Circuits and Systems for Video Technology 18, 7 (2008), 989–993. Google Scholar
Digital Library
- [32] . 2006. A Riemannian framework for tensor computing. International Journal of Computer Vision 66, 1 (2006), 41–66. Google Scholar
Digital Library
- [33] . 2005. Applied Multivariate Analysis: Using Bayesian and Frequentist Methods of Inference. Courier Corporation.Google Scholar
- [34] . 2017. Learning discriminative trajectorylet detector sets for accurate skeleton-based action recognition. Pattern Recognition 66 (2017), 202–212. Google Scholar
Digital Library
- [35] . 2016. Approximate log-Hilbert-Schmidt distances between covariance operators for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5195–5203.Google Scholar
Cross Ref
- [36] . 2008. Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems. 1177–1184. Google Scholar
Digital Library
- [37] . 2003. Gaussian processes in machine learning. In Summer School on Machine Learning. Springer, 63–71.Google Scholar
- [38] 2004. Kernel Methods for Pattern Analysis. Cambridge University Press. Google Scholar
Digital Library
- [39] . 2002. Dynamic time-alignment kernel in support vector machine. In Advances in Neural Information Processing Systems. 921–928. Google Scholar
Digital Library
- [40] . 2012. Unsupervised discovery of mid-level discriminative patches. In Proceedings of the European Conference on Computer Vision. 73–86. Google Scholar
Digital Library
- [41] . 2006. Region covariance: A fast descriptor for detection and classification. In Proceedings of the European Conference on Computer Vision. 589–600. Google Scholar
Digital Library
- [42] 2007. Human detection via classification on Riemannian manifolds. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1. 4.Google Scholar
- [43] . 2014. Human action recognition by representing 3D skeletons as points in a Lie Group. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 588–595. Google Scholar
Digital Library
- [44] . 2016. Rolling rotations for recognizing human actions from 3D skeletal data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4471–4479.Google Scholar
Cross Ref
- [45] . 1994. Kernel Smoothing. Chapman & Hall/CRC.Google Scholar
Cross Ref
- [46] . 2013. An approach to pose-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 915–922. Google Scholar
Digital Library
- [47] . 2012. Mining actionlet ensemble for action recognition with depth cameras. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 1290–1297. Google Scholar
Digital Library
- [48] . 2014. Learning actionlet ensemble for 3D human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 5 (2014), 914–927.Google Scholar
Cross Ref
- [49] . 2015. Beyond covariance: Feature representation with nonlinear kernel matrices. In Proceedings of the IEEE International Conference on Computer Vision. 4570–4578. Google Scholar
Digital Library
- [50] . 2012. Covariance discriminative learning: A natural and efficient approach to image set classification. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 2496–2503. Google Scholar
Digital Library
- [51] . 2017. Spatio-temporal naive-Bayes nearest-neighbor (ST-NBNN) for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4171–4180.Google Scholar
Cross Ref
- [52] . 2018. Simultaneous diagonalisation of the covariance and complementary covariance matrices in quaternion widely linear signal processing. Signal Processing 148 (2018), 193–204. Google Scholar
Digital Library
- [53] . 2008. Image super-resolution as sparse representation of raw image patches. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. 1–8.Google Scholar
- [54] . 2014. Super normal vector for activity recognition using depth sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 804–811. Google Scholar
Digital Library
- [55] . 2017. A temporal order modeling approach to human action recognition from multimodal sensor data. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 2 (2017), 14. Google Scholar
Digital Library
- [56] . 2013. The Moving Pose: An efficient 3D kinematics descriptor for low-latency action recognition and detection. In Proceedings of the IEEE International Conference on Computer Vision. 2752–2759. Google Scholar
Digital Library
- [57] . 2019. Discriminative saliency-pose-attention covariance for action recognition. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’19). IEEE, Los Alamitos, CA, 2132–2136.Google Scholar
Cross Ref
- [58] . 2020. Cross-covariance matrix: Time-shifted correlations for 3D action recognition. Signal Processing 171 (2020), 107499.Google Scholar
Cross Ref
- [59] . 2017. Towards 3D human pose estimation in the wild: A weakly-supervised approach. In Proceedings of the IEEE International Conference on Computer Vision. 398–407.Google Scholar
Cross Ref
- [60] . 2013. Fusing spatiotemporal features and joints for 3D action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 486–491. Google Scholar
Digital Library
Index Terms
Bayesian Covariance Representation with Global Informative Prior for 3D Action Recognition
Recommendations
3D facial expression recognition using kernel methods on Riemannian manifold
Automatic human Facial Expressions Recognition (FER) is becoming of increased interest. FER finds its applications in many emerging areas such as affective computing and intelligent human computer interaction. Most of the existing work on FER has been ...
The reference prior for complex covariance matrices with efficient implementation strategies
The paper derives the reference prior for complex covariance matrices. The reference prior is a noninformative prior that circumvents some of the weaknesses of common alternatives in multidimensional settings. As a consequence, inference based on this ...
Action Recognition From Video Using Feature Covariance Matrices
We propose a general framework for fast and accurate recognition of actions in video using empirical covariance matrices of features. A dense set of spatio-temporal feature vectors are computed from video to provide a localized description of the action,...






Comments