skip to main content
research-article

Robust Visual Tracking Using Kernel Sparse Coding on Multiple Covariance Descriptors

Authors Info & Claims
Published:17 April 2020Publication History
Skip Abstract Section

Abstract

In this article, we aim to improve the performance of visual tracking by combing different features of multiple modalities. The core idea is to use covariance matrices as feature descriptors and then use sparse coding to encode different features. The notion of sparsity has been successfully used in visual tracking. In this context, sparsity is used along appearance models often obtained from intensity/color information. In this work, we step outside this trend and propose to model the target appearance by local covariance descriptors (CovDs) in a pyramid structure. The proposed pyramid structure not only enables us to encode local and spatial information of the target appearance but also inherits useful properties of CovDs such as invariance to affine transforms. Since CovDs lie on a Riemannian manifold, we further propose to perform tracking through sparse coding by embedding the Riemannian manifold into an infinite-dimensional Hilbert space. Embedding the manifold into a Hilbert space allows us to perform sparse coding efficiently using the kernel trick. Our empirical study shows that the proposed tracking framework outperforms the existing state-of-the-art methods in challenging scenarios.

References

  1. Amit Adam, Ehud Rivlin, and Ilan Shimshoni. 2006. Robust fragments-based tracking using the integral histogram. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’06). 798--805.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Bao, Y. Wu, H. Ling, and H. Ji. 2012. Real time robust L1 tracker using accelerated proximal gradient approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). 1830--1837.Google ScholarGoogle Scholar
  3. Rui Caseiro, Pedro Martins, Joao F. Henriques, Fatima Silva Leite, and Jorge Batista. 2013. Rolling Riemannian manifolds to solve the multi-class classification problem. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). IEEE, Los Alamitos, CA, 41--48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Collins, Y. Liu, and M. Leordeanu. 2005. On-line selection of discriminative tracking features. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 10 (2005), 1631--1643.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. 2018. ATOM: Accurate tracking by overlap maximization. arXiv:1811.07628.Google ScholarGoogle Scholar
  6. A. Doucet, D. N. Freitas, and N. Gordon. 2001. Sequential Monte Carlo Methods In Practice. Springer, New York, NY.Google ScholarGoogle Scholar
  7. Ehsan Elhamifar and René Vidal. 2011. Sparse manifold clustering and embedding. In Proceedings of Advances in Neural Information Processing Systems (NIPS’11). 55--63.Google ScholarGoogle Scholar
  8. Shenghua Gao, Ivor Wai-Hung Tsang, and Liang-Tien Chia. 2010. Kernel sparse representation for image classification and face recognition. In Proceedings of the European Conference on Computer Vision (ECCV’10). 1--14.Google ScholarGoogle ScholarCross RefCross Ref
  9. J. M. Geusebroek, A. W. M. Smeulders, and J. van de Weijer. 2003. Fast anisotropic Gauss filtering. IEEE Transactions on Image Processing 12, 8 (2003), 938--943.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chen Gong, Keren Fu, Artur Loza, Qiang Wu, Jia Liu, and Jie Yang. 2014. Discriminative object tracking via sparse representation and online dictionary learning. IEEE Transactions on Cybernetics 44, 4 (2014), 539--553.Google ScholarGoogle ScholarCross RefCross Ref
  11. M. Harandi and M. Salzmann. 2015. Riemannian coding and dictionary learning: Kernels to the rescue. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 1--9.Google ScholarGoogle Scholar
  12. M. Harandi, C. Sanderson, R. Hartley, and B. Lovell. 2012. Sparse coding and dictionary learning for symmetric positive definite matrices: A kernel approach. In Proceedings of the European Conference on Computer Vision (ECCV’12), Vol. 3. 216--229.Google ScholarGoogle Scholar
  13. Mehrtash Tafazzoli Harandi, Mathieu Salzmann, Sadeep Jayasumana, Richard Hartley, and Hongdong Li. 2014. Expanding the family of Grassmannian kernels: An embedding perspective. In Proceedings of the European Conference on Computer Vision (ECCV’14). Vol. 7. 408--423.Google ScholarGoogle ScholarCross RefCross Ref
  14. Mehrtash Tafazzoli Harandi, Mathieu Salzmann, and Fatih Porikli. 2014. Bregman divergences for infinite dimensional covariance matrices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 1003--1010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. W. Hu, X. Li, W. Luo, X. Zhang, S. Maybank, and Z. Zhang. 2012. Single and multiple object tracking uisng log-Euclidean Riemannian subspace and block-division appearance model. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 12 (2012), 2420--2440.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Jayasumana, R. Hartley, M. Salzmann, H. Li, and M. Harandi. 2013. Kernel methods on the Riemannian manifold of symmetric positive definite matrices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). 73--80.Google ScholarGoogle Scholar
  17. A. Jepson, D. Fleet, and T. EI-Maraghi. 2003. Robust online appearance models for visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 10 (2003), 1296--1311.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. X. Jia, H. Lu, and M. Yang. 2012. Visual tracking via adaptive structural local sparse appearance model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). 1822--1829.Google ScholarGoogle Scholar
  19. Feng Jiang, Shengping Zhang, Shen Wu, Yang Gao, and Debin Zhao. 2015. Multi-layered gesture recognition with Kinect. Journal of Machine Learning Research 16 (2015), 227--254.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Xiangyuan Lan, Andy J. Ma, and Pong C. Yuen. 2014. Multi-cue visual tracking using robust feature-level fusion based on joint sparse representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 1194--1201.Google ScholarGoogle Scholar
  21. H. Lee, A. Battle, R. Raina, and A. Ng. 2006. Efficient sparse coding algorithm. In Proceedings of Advances in Neural Information Processing Systems (NIPS’06). 801--808.Google ScholarGoogle Scholar
  22. A. Li, M. Lin, Y. Wu, M. Yang, and S. Yan. 2015. NUS-PRO: A new visual tracking challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 2 (2015), 335--349. DOI:10.1109/TPAMI.2015.2417577Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hanxi Li, Yongsheng Gao, and Jun Sun. 2011. Fast kernel sparse representation. In Proceedings of International Conference on Digital Image Computing Techniques and Applications. 72--77.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Xi Li, Weiming Hu, Chunhua Shen, Zhongfei Zhang, Anthony Dick, and Anton van den Hengel. 2013. A survey of appearance models in visual object tracking. ACM Transactions on Intelligent Systems and Technology 4, 4 (2013), 58.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. Liu, J. Huang, L. Yang, and C. Kulikowski. 2011. Robust tracking using local sparse appearance model and K-selection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11).Google ScholarGoogle Scholar
  26. B. Liu, L. Yang, J. Huang, P. Meer, L. Gong, and C. Kilikowski. 2010. Robust and fast collaborative tracking with two stage sparse optimization. In Proceedings of the European Conference on Computer Vision (ECCV’10). 624--637.Google ScholarGoogle Scholar
  27. I. Matthews, T. Ishikawa, and S. Baker. 2004. The template update problem. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 6 (2004), 810--815.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. X. Mei and H. Ling. 2009. Robust visual tracking using L1 minimization. In Proceedings of the International Conference on Computer Vision (ICCV’09). 1436--1443.Google ScholarGoogle Scholar
  29. X. Mei, H. Ling, Y. Wu, E. Blasch, and L. Bai. 2011. Minimum error bounded efficient L1 tracker with occlusion detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11).Google ScholarGoogle Scholar
  30. Y. Nesterov. 1983. A method of solving a convex programming problem with convergence rate O(1/k2). Soviet Mathematics Doklady 27, 2 (1983), 372--376.Google ScholarGoogle Scholar
  31. Xavier Pennec, Pierre Fillard, and Nicholas Ayache. 2006. A Riemannian framework for tensor computing. International Journal of Computer Vision 66, 1 (2006), 41--66.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. Pérez, C. Hue, J. Vermaak, and M. Gangnet. 2002. Color-based probabilistic tracking. In Proceedings of the European Conference on Computer Vision (ECCV’02). 661--675.Google ScholarGoogle Scholar
  33. F. Porikli, O. Tuzel, and P. Meer. 2006. Covariance tracking using model update based on lie algebra. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 1. 728--735.Google ScholarGoogle Scholar
  34. Yuankai Qi, Lei Qin, Jian Zhang, Shengping Zhang, Qingming Huang, and Ming-Hsuan Yang. 2018. Structure-aware local sparse coding for visual tracking. IEEE Transactions on Image Processing 27, 8 (2018), 3857--3869.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Yuankai Qi, Shengping Zhang, Lei Qin, Hongxun Yao, Qingming Huang, Jongwoo Lim, and Ming-Hsuan Yang. 2019. Hedging deep features for visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 5 (2019), 1116--1130.Google ScholarGoogle ScholarCross RefCross Ref
  36. D. Ross, J. Lim, R. Lin, and M. Yang. 2008. Incremental learning for robust visual tracking. International Journal of Computer Vision 77 (2008), 125--141.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Samuele Salti, Andrea Cavallaro, and Luigi Di Stefano. 2012. Adaptive appearance modeling for video tracking: Survey and evaluation. IEEE Transactions on Image Processing 21, 10 (2012), 4334--4348.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Arnold W. M. Smeulders, Dung M. Chu, Rita Cucchiara, Simone Calderara, Afshin Dehghan, and Mubarak Shah. 2013. Visual tracking: An experimental survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 7 (2013), 1442--1468.Google ScholarGoogle Scholar
  39. S. Sra. 2012. A new metric on the manifold of kernel matrices with application to matrix geometric means. In Proceedings of Advances in Neural Information Processing Systems (NIPS’12). 144--152.Google ScholarGoogle Scholar
  40. Diego Tosato, Mauro Spera, Marco Cristani, and Vittorio Murino. 2013. Characterizing humans on Riemannian manifolds. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 8 (2013), 1972--1984.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. P. Tseng. 2008. On Accelerated Proximal Gradient Methods for Convex-Concave Optimization. Technical report. University of Washington, Seattle.Google ScholarGoogle Scholar
  42. O. Tuzel, F. Porikli, and P. Meer. 2006. Region covariance: A fast descriptor for detection and classification. In Proceedings of the European Conference on Computer Vision (ECCV’06), Vol. 2. 589--600.Google ScholarGoogle Scholar
  43. M. Varma and A. Zisserman. 2002. Classifying images of materials: Achieving viewpoint and illumination independence. In Proceedings of the European Conference on Computer Vision (ECCV’02). 255--271.Google ScholarGoogle Scholar
  44. Jinjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, Thomas S. Huang, and Yihong Gong. 2010. Locality-constrained linear coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). 3360--3367.Google ScholarGoogle ScholarCross RefCross Ref
  45. Naiyan Wang, Jingdong Wang, and Dit-Yan Yeung. 2013. Online robust non-negative dictionary learning for visual tracking. In Proceedings of the International Conference on Computer Vision (ICCV’13).Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Naiyan Wang and Dit-Yan Yeung. 2013. Learning a deep compact image representation for visual tracking. In Proceedings of Advances in Neural Information Processing Systems (NIPS’13). 809--817.Google ScholarGoogle Scholar
  47. Naiyan Wang and Dit-Yan Yeung. 2014. Ensemble-based tracking: Aggregating crowdsourced structured time series data. In Proceedings of the International Conference on Machine Learning (ICML’14). 1107--1115.Google ScholarGoogle Scholar
  48. Peter Wilf, Shengping Zhang, Sharat Chikkerur, Stefan A. Little, Scott L. Wing, and Thomas Serre. 2016. Computer vision cracks the leaf code. Proceedings of the National Academy of Sciences of the United States of America 113, 12 (2016), 3305--3310.Google ScholarGoogle ScholarCross RefCross Ref
  49. J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma. 2009. Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 2 (2009), 210--227.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Y. Wu, J. Cheng, J. Wang, H. Lu, J. Wang, H. Ling, E. Blasch, and L. Bai. 2012. Real-time probabilistic covariance tracking with efficient model update. IEEE Transactions on Image Processing 21, 5 (2012), 2824--2837.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Y. Wu, J. Lim, and M. Yang. 2013. Online object tracking: A benchmark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). 2411--2418.Google ScholarGoogle Scholar
  52. Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. 2015. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 9 (2015), 1834--1848. DOI:10.1109/TPAMI.2014.2388226Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Yi Wu, Haibin Ling, Erik Blasch, Li Bai, and Genshe Chen. 2011. Visual tracking based on log-Euclidean Riemannian sparse representation. In Proceedings of the 7th International Symposium on Advances in Visual Computing. 738--747.Google ScholarGoogle ScholarCross RefCross Ref
  54. Hanxuan Yang, Ling Shao, Feng Zheng, Liang Wang, and Zhan Song. 2011. Recent advances and trends in visual tracking: A review. Neurocomputing 74, 18 (2011), 3823--3831.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Yingjie Yao, Xiaohe Wu, Lei Zhang, Shiguang Shan, and Wangmeng Zuo. 2018. Joint representation and truncated inference learning for correlation filter based tracking. In Proceedings of the European Conference on Computer Vision (ECCV’18). 552--567.Google ScholarGoogle ScholarCross RefCross Ref
  56. Alper Yilmaz, Omar Javed, and Mubarak Shah. 2006. Object tracking: A survey. ACM Computing Surveys 38, 4 (2006), 13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Lei Zhang, Wen Wu, Terrence Chen, Norbert Strobel, and Dorin Comaniciu. 2015. Robust object tracking using semi-supervised appearance dictionary learning. Pattern Recognition Letters 62 (2015), 17--23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Lei Zhang, Shengping Zhang, Feng Jiang, Yuankai Qi, Jun Zhang, Yuliang Guo, and Huiyu Zhou. 2018. BoMW: Bag of manifold words for one-shot learning gesture recognition from kinect. IEEE Transactions on Circuits and Systems for Video Technology 28, 10 (2018), 2562--2573.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. L. Zhang, W. Zhou, P. Chang, J. Liu, Z. Yan, T. Wang, and F. Li. 2012. Kernel sparse representation based classifier. IEEE Transactions on Signal Processing 60, 4 (2012), 1684--1695.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Shengping Zhang, Shiva Kasiviswanathan, Pong C. Yuen, and Mehrtash Harandi. 2015. Online dictionary learning on symmetric positive definite manifolds with vision applications. In Proceedings of the 29th AAAI Conference on Artificial Intelligence. 3165--3173.Google ScholarGoogle Scholar
  61. Shengping Zhang, Xiangyuan Lan, Yuankai Qi, and Pong C. Yuen. 2017. Robust visual tracking via basis matching. IEEE Transactions on Circuits and Systems for Video Technology 27, 3 (2017), 421--430.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Shengping Zhang, Xiangyuan Lan, Hongxun Yao, Huiyu Zhou, Dacheng Tao, and Xuelong Li. 2017. A biologically inspired appearance model for robust visual tracking. IEEE Transactions on Neural Networks and Learning Systems 28, 10 (2017), 2357--2370.Google ScholarGoogle ScholarCross RefCross Ref
  63. Shengping Zhang, Yuankai Qi, Feng Jiang, Xiangyuan Lan, Pong C. Yuen, and Huiyu Zhou. 2018. Point-to-set distance metric learning on deep representations for visual tracking. IEEE Transactions on Intelligent Transportation Systems 19, 1 (2018), 187--198.Google ScholarGoogle ScholarCross RefCross Ref
  64. Shengping Zhang, Hongxun Yao, Xin Sun, and Xiusheng Lu. 2013. Sparse coding based visual tracking: Review and experimental comparison. Pattern Recognition 46, 7 (2013), 1772--1788.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Shengping Zhang, Hongxun Yao, Xin Sun, Kuanquan Wang, Jun Zhang, Xiusheng Lu, and Yanhao Zhang. 2014. Action recognition based on overcomplete independent component analysis. Information Sciences 281 (2014), 635--647.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Shengping Zhang, Hongxun Yao, Huiyu Zhou, Xin Sun, and Shaohui Liu. 2013. Robust visual tracking based on online learning sparse representation. Neurocomputing 100, 1 (2013), 31--40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Shengping Zhang, Huiyu Zhou, Feng Jiang, and Xuelong Li. 2015. Robust visual tracking using structurally random projection and weighted least squares. IEEE Transactions on Circuits and Systems for Video Technology 25, 11 (2015), 1749--1760.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Shengping Zhang, Huiyu Zhou, Hongxun Yao, Yanhao Zhang, Kuanquan Wang, and Jun Zhang. 2015. Adaptive NormalHedge for robust visual tracking. Signal Processing 110 (2015), 132--142.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Bineng Zhong, Bing Bai, Jun Li, Yulun Zhang, and Yun Fu. 2019. Hierarchical tracking by reinforcement learning-based searching and coarse-to-fine verifying. IEEE Transactions on Image Processing 28, 5 (2019), 2331--2341.Google ScholarGoogle ScholarCross RefCross Ref
  70. Bineng Zhong, Hongxun Yao, Sheng Chen, Rongrong Ji, Tat-Jun Chin, and Hanzi Wang. 2014. Visual tracking via weakly supervised learning from multiple imperfect oracles. Pattern Recognition 47, 3 (2014), 1395--1410.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. W. Zhong, H. Lu, and M. Yang. 2012. Robust object tracking via sparsity-based collaborative model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). 1838--1845.Google ScholarGoogle Scholar
  72. Qinqin Zhou, Bineng Zhong, Yulun Zhang, Jun Li, and Yun Fu. 2019. Deep alignment network based multi-person tracking with occlusion and motion reasoning. IEEE Transactions on Multimedia 21, 5 (2019), 1183--1194.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Heyan Zhu, Xinyuan Huang, Shengping Zhang, and Pong C. Yuen. 2017. Plant identification via multipath sparse coding. Multimedia Tools and Applications 76, 3 (2017), 4599--4615. DOI:https://doi.org/10.1007/s11042-016-3538-4Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Robust Visual Tracking Using Kernel Sparse Coding on Multiple Covariance Descriptors

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 16, Issue 1s
      Special Issue on Multimodal Machine Learning for Human Behavior Analysis and Special Issue on Computational Intelligence for Biomedical Data and Imaging
      January 2020
      376 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3388236
      Issue’s Table of Contents

      Copyright © 2020 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 April 2020
      • Accepted: 1 September 2019
      • Revised: 1 August 2019
      • Received: 1 April 2019
      Published in tomm Volume 16, Issue 1s

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!