skip to main content
research-article

Bayesian Covariance Representation with Global Informative Prior for 3D Action Recognition

Authors Info & Claims
Published:12 November 2021Publication History
Skip Abstract Section

Abstract

For the merits of high-order statistics and Riemannian geometry, covariance matrix has become a generic feature representation for action recognition. An independent action can be represented by an empirical statistics over all of its pose samples. Two major problems of covariance include the following: (1) it is prone to be singular so that actions fail to be represented properly, and (2) it is short of global action/pose-aware information so that expressive and discriminative power is limited. In this article, we propose a novel Bayesian covariance representation by a prior regularization method to solve the preceding problems. Specifically, covariance is viewed as a parametric maximum likelihood estimate of Gaussian distribution over local poses from an independent action. Then, a Global Informative Prior (GIP) is generated over global poses with sufficient statistics to regularize covariance. In this way, (1) singularity is greatly relieved due to sufficient statistics, (2) global pose information of GIP makes Bayesian covariance theoretically equivalent to a saliency weighting covariance over global action poses so that discriminative characteristics of actions can be represented more clearly. Experimental results show that our Bayesian covariance with GIP efficiently improves the performance of action recognition. In some databases, it outperforms the state-of-the-art variant methods that are based on kernels, temporal-order structures, and saliency weighting attentions, among others.

REFERENCES

  1. [1] Arsigny Vincent, Fillard Pierre, Pennec Xavier, and Ayache Nicholas. 2007. Geometric means in a novel vector space structure on symmetric positive-definite matrices. SIAM Journal on Matrix Analysis and Applications 29, 1 (2007), 328347.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Bonnabel Silvere and Sepulchre Rodolphe. 2010. Riemannian metric and geometric mean for positive semidefinite matrices of fixed rank. SIAM Journal on Matrix Analysis and Applications 31, 3 (2010), 10551070. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Carter Kevin M., Raich Raviv, Finn William G., and III Alfred O. Hero. 2009. Fine: Fisher information nonparametric embedding. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 11 (2009), 20932098. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Cavazza Jacopo, Zunino Andrea, Biagio Marco San, and Murino Vittorio. 2016. Kernelized covariance for action recognition. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR’16). IEEE, Los Alamitos, CA, 408413.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Chen Chen, Liu Kui, and Kehtarnavaz Nasser. 2016. Real-time human action recognition based on depth motion maps. Journal of Real-Time Image Processing 12, 1 (2016), 155163. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Chen Guang, Clarke Daniel, Giuliani Manuel, Gaschler Andre, and Knoll Alois. 2015. Combining unsupervised learning and discrimination for 3D action recognition. Signal Processing 110 (2015), 6781. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Du Yong, Wang Wei, and Wang Liang. 2015. Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11101118.Google ScholarGoogle Scholar
  8. [8] Faraki Masoud, Harandi Mehrtash T., and Porikli Fatih. 2015. Approximate infinite-dimensional region covariance descriptors for image classification. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’15). IEEE, Los Alamitos, CA, 13641368.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Faraki Masoud, Harandi Mehrtash T., and Porikli Fatih. 2016. Image set classification by symmetric positive semi-definite matrices. In Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV’16). IEEE, Los Alamitos, CA, 18.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Garcia-Hernando Guillermo and Kim Tae Kyun. 2017. Transition forests: Learning discriminative temporal transitions for action recognition and detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 407415.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Gorelick Lena, Blank Moshe, Shechtman Eli, Irani Michal, and Basri Ronen. 2007. Actions as space-time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 12 (Dec. 2007), 22472253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Gowayyed Mohammad Abdelaziz, Torki Marwan, Hussein Mohammed Elsayed, and El-Saban Motaz. 2013. Histogram of oriented displacements (HOD): Describing trajectories of human joints for action recognition. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Harandi Mehrtash, Salzmann Mathieu, and Porikli Fatih. 2014. Bregman divergences for infinite dimensional covariance matrices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10031010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Harandi Mehrtash T., Salzmann Mathieu, and Hartley Richard. 2014. From manifold to manifold: Geometry-aware dimensionality reduction for SPD matrices. In Proceedings of the European Conference on Computer Vision. 1732.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Harandi Mehrtash T., Sanderson Conrad, Hartley Richard, and Lovell Brian C.. 2012. Sparse coding and dictionary learning for symmetric positive definite matrices: A kernel approach. In Proceedings of the European Conference on Computer Vision. 216229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Huang Min, Su Song-Zhi, Zhang Hong-Bo, Cai Guo-Rong, Gong Dongying, Cao Donglin, and Li Shao-Zi. 2018. Multifeature selection for 3D human action recognition. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 2 (2018), 45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Huang Zhiwu and Gool Luc Van. 2017. A Riemannian network for SPD matrix learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Hussein Mohamed E., Torki Marwan, Gowayyed Mohammad A., and El-Saban Motaz. 2013. Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Jaakkola Tommi and Haussler David. 1999. Exploiting generative models in discriminative classifiers. In Advances in Neural Information Processing Systems. 487493. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Koniusz Piotr and Cherian Anoop. 2016. Sparse coding for third-order super-symmetric tensor descriptors with application to texture recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 53955403.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Koniusz Piotr, Cherian Anoop, and Porikli Fatih. 2016. Tensor representations via kernel linearization for action recognition from 3D skeletons. In Proceedings of the European Conference on Computer Vision. 3753.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Lancewicki Tomer. 2017. Regularization of the kernel matrix via covariance matrix shrinkage estimation. arXiv:1707.06156.Google ScholarGoogle Scholar
  23. [23] M. Aldjem and T. Lancewicki. 2014. Multi-target shrinkage estimation for covariance matrices. IEEE Transactions on Signal Processing 62, 24 (2014), 63806390.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Li Wanqing, Zhang Zhengyou, and Liu Zicheng. 2010. Action recognition based on a bag of 3D points. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, Los Alamitos, CA, 914.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Li Xi, Hu Weiming, Zhang Zhongfei, Zhang Xiaoqin, Zhu Mingliang, and Cheng Jian. 2008. Visual tracking via incremental Log-Euclidean Riemannian subspace learning. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 18.Google ScholarGoogle Scholar
  26. [26] Liu Jun, Wang Gang, Hu Ping, Duan Ling-Yu, and Kot Alex C.. 2017. Global context-aware attention LSTM networks for 3D action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 16471656.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Minh Hà Quang and Murino Vittorio. 2017. Covariances in computer vision and machine learning. Synthesis Lectures on Computer Vision 7, 4 (2017), 1170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Murphy Kevin P.. 2012. Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Nguyen Xuan Son, Mouaddib Abdel-Illah, and Nguyen Thanh Phuong. 2019. Hierarchical Gaussian descriptor based on local pooling for action recognition. Machine Vision and Applications 30 (2019), 321343. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Oreifej Omar and Liu Zicheng. 2013. HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 716723. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Pang Yanwei, Yuan Yuan, and Li Xuelong. 2008. Gabor-based region covariance matrices for face recognition. IEEE Transactions on Circuits and Systems for Video Technology 18, 7 (2008), 989993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Pennec Xavier, Fillard Pierre, and Ayache Nicholas. 2006. A Riemannian framework for tensor computing. International Journal of Computer Vision 66, 1 (2006), 4166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Press S. James. 2005. Applied Multivariate Analysis: Using Bayesian and Frequentist Methods of Inference. Courier Corporation.Google ScholarGoogle Scholar
  34. [34] Qiao Ruizhi, Liu Lingqiao, Shen Chunhua, and Hengel Anton van den. 2017. Learning discriminative trajectorylet detector sets for accurate skeleton-based action recognition. Pattern Recognition 66 (2017), 202212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Minh Ha Quang, Biagio Marco San, Bazzani Loris, and Murino Vittorio. 2016. Approximate log-Hilbert-Schmidt distances between covariance operators for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 51955203.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Rahimi Ali and Recht Benjamin. 2008. Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems. 11771184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Rasmussen Carl Edward. 2003. Gaussian processes in machine learning. In Summer School on Machine Learning. Springer, 6371.Google ScholarGoogle Scholar
  38. [38] Shawe-Taylor John and Nello Cristianini2004. Kernel Methods for Pattern Analysis. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Shimodaira Hiroshi, Noma Ken-Ichi, Nakai Mitsuru, and Sagayama Shigeki. 2002. Dynamic time-alignment kernel in support vector machine. In Advances in Neural Information Processing Systems. 921928. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Singh Saurabh, Gupta Abhinav, and Efros Alexei A.. 2012. Unsupervised discovery of mid-level discriminative patches. In Proceedings of the European Conference on Computer Vision. 7386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Tuzel Oncel, Porikli Fatih, and Meer Peter. 2006. Region covariance: A fast descriptor for detection and classification. In Proceedings of the European Conference on Computer Vision. 589600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Tuzel Oncel, Porikli Fatih, and Peter Meer2007. Human detection via classification on Riemannian manifolds. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1. 4.Google ScholarGoogle Scholar
  43. [43] Vemulapalli Raviteja, Arrate Felipe, and Chellappa Rama. 2014. Human action recognition by representing 3D skeletons as points in a Lie Group. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 588595. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Vemulapalli Raviteja and Chellappa Rama. 2016. Rolling rotations for recognizing human actions from 3D skeletal data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 44714479.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Wand Matt P. and Jones M. Chris. 1994. Kernel Smoothing. Chapman & Hall/CRC.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Wang Chunyu, Wang Yizhou, and Yuille Alan L.. 2013. An approach to pose-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 915922. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Wang Jiang, Liu Zicheng, Wu Ying, and Yuan Junsong. 2012. Mining actionlet ensemble for action recognition with depth cameras. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 12901297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Wang Jiang, Liu Zicheng, Wu Ying, and Yuan Junsong. 2014. Learning actionlet ensemble for 3D human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 5 (2014), 914927.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Wang Lei, Zhang Jianjia, Zhou Luping, Tang Chang, and Li Wanqing. 2015. Beyond covariance: Feature representation with nonlinear kernel matrices. In Proceedings of the IEEE International Conference on Computer Vision. 45704578. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Wang Ruiping, Guo Huimin, Davis Larry S., and Dai Qionghai. 2012. Covariance discriminative learning: A natural and efficient approach to image set classification. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 24962503. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Weng Junwu, Weng Chaoqun, and Yuan Junsong. 2017. Spatio-temporal naive-Bayes nearest-neighbor (ST-NBNN) for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 41714180.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Xiang Min, Enshaeifar Shirin, Stott Alexander E., Took Clive Cheong, Xia Yili, Kanna Sithan, and Mandic Danilo P.. 2018. Simultaneous diagonalisation of the covariance and complementary covariance matrices in quaternion widely linear signal processing. Signal Processing 148 (2018), 193204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Yang Jianchao, Wright John, Huang Thomas, and Ma Yi. 2008. Image super-resolution as sparse representation of raw image patches. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. 18.Google ScholarGoogle Scholar
  54. [54] Yang Xiaodong and Tian YingLi. 2014. Super normal vector for activity recognition using depth sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 804811. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] Ye Jun, Hu Hao, Qi Guo-Jun, and Hua Kien A.. 2017. A temporal order modeling approach to human action recognition from multimodal sensor data. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 2 (2017), 14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Zanfir Mihai, Leordeanu Marius, and Sminchisescu Cristian. 2013. The Moving Pose: An efficient 3D kinematics descriptor for low-latency action recognition and detection. In Proceedings of the IEEE International Conference on Computer Vision. 27522759. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] Zhang Jianhai, Feng Zhiyong, Su Yong, and Xing Meng. 2019. Discriminative saliency-pose-attention covariance for action recognition. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’19). IEEE, Los Alamitos, CA, 21322136.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Zhang Jianhai, Feng Zhiyong, Su Yong, and Xing Meng. 2020. Cross-covariance matrix: Time-shifted correlations for 3D action recognition. Signal Processing 171 (2020), 107499.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Zhou Xingyi, Huang Qixing, Sun Xiao, Xue Xiangyang, and Wei Yichen. 2017. Towards 3D human pose estimation in the wild: A weakly-supervised approach. In Proceedings of the IEEE International Conference on Computer Vision. 398407.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Zhu Yu, Chen Wenbin, and Guo Guodong. 2013. Fusing spatiotemporal features and joints for 3D action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 486491. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Bayesian Covariance Representation with Global Informative Prior for 3D Action Recognition

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Multimedia Computing, Communications, and Applications
            ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 4
            November 2021
            529 pages
            ISSN:1551-6857
            EISSN:1551-6865
            DOI:10.1145/3492437
            Issue’s Table of Contents

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 12 November 2021
            • Accepted: 1 April 2021
            • Revised: 1 January 2021
            • Received: 1 July 2019
            Published in tomm Volume 17, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Full Text

          View this article in Full Text.

          View Full Text

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!