Abstract
In this article, we aim to improve the performance of visual tracking by combing different features of multiple modalities. The core idea is to use covariance matrices as feature descriptors and then use sparse coding to encode different features. The notion of sparsity has been successfully used in visual tracking. In this context, sparsity is used along appearance models often obtained from intensity/color information. In this work, we step outside this trend and propose to model the target appearance by local covariance descriptors (CovDs) in a pyramid structure. The proposed pyramid structure not only enables us to encode local and spatial information of the target appearance but also inherits useful properties of CovDs such as invariance to affine transforms. Since CovDs lie on a Riemannian manifold, we further propose to perform tracking through sparse coding by embedding the Riemannian manifold into an infinite-dimensional Hilbert space. Embedding the manifold into a Hilbert space allows us to perform sparse coding efficiently using the kernel trick. Our empirical study shows that the proposed tracking framework outperforms the existing state-of-the-art methods in challenging scenarios.
- Amit Adam, Ehud Rivlin, and Ilan Shimshoni. 2006. Robust fragments-based tracking using the integral histogram. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’06). 798--805.Google Scholar
Digital Library
- C. Bao, Y. Wu, H. Ling, and H. Ji. 2012. Real time robust L1 tracker using accelerated proximal gradient approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). 1830--1837.Google Scholar
- Rui Caseiro, Pedro Martins, Joao F. Henriques, Fatima Silva Leite, and Jorge Batista. 2013. Rolling Riemannian manifolds to solve the multi-class classification problem. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). IEEE, Los Alamitos, CA, 41--48.Google Scholar
Digital Library
- R. Collins, Y. Liu, and M. Leordeanu. 2005. On-line selection of discriminative tracking features. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 10 (2005), 1631--1643.Google Scholar
Digital Library
- Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. 2018. ATOM: Accurate tracking by overlap maximization. arXiv:1811.07628.Google Scholar
- A. Doucet, D. N. Freitas, and N. Gordon. 2001. Sequential Monte Carlo Methods In Practice. Springer, New York, NY.Google Scholar
- Ehsan Elhamifar and René Vidal. 2011. Sparse manifold clustering and embedding. In Proceedings of Advances in Neural Information Processing Systems (NIPS’11). 55--63.Google Scholar
- Shenghua Gao, Ivor Wai-Hung Tsang, and Liang-Tien Chia. 2010. Kernel sparse representation for image classification and face recognition. In Proceedings of the European Conference on Computer Vision (ECCV’10). 1--14.Google Scholar
Cross Ref
- J. M. Geusebroek, A. W. M. Smeulders, and J. van de Weijer. 2003. Fast anisotropic Gauss filtering. IEEE Transactions on Image Processing 12, 8 (2003), 938--943.Google Scholar
Digital Library
- Chen Gong, Keren Fu, Artur Loza, Qiang Wu, Jia Liu, and Jie Yang. 2014. Discriminative object tracking via sparse representation and online dictionary learning. IEEE Transactions on Cybernetics 44, 4 (2014), 539--553.Google Scholar
Cross Ref
- M. Harandi and M. Salzmann. 2015. Riemannian coding and dictionary learning: Kernels to the rescue. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 1--9.Google Scholar
- M. Harandi, C. Sanderson, R. Hartley, and B. Lovell. 2012. Sparse coding and dictionary learning for symmetric positive definite matrices: A kernel approach. In Proceedings of the European Conference on Computer Vision (ECCV’12), Vol. 3. 216--229.Google Scholar
- Mehrtash Tafazzoli Harandi, Mathieu Salzmann, Sadeep Jayasumana, Richard Hartley, and Hongdong Li. 2014. Expanding the family of Grassmannian kernels: An embedding perspective. In Proceedings of the European Conference on Computer Vision (ECCV’14). Vol. 7. 408--423.Google Scholar
Cross Ref
- Mehrtash Tafazzoli Harandi, Mathieu Salzmann, and Fatih Porikli. 2014. Bregman divergences for infinite dimensional covariance matrices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 1003--1010.Google Scholar
Digital Library
- W. Hu, X. Li, W. Luo, X. Zhang, S. Maybank, and Z. Zhang. 2012. Single and multiple object tracking uisng log-Euclidean Riemannian subspace and block-division appearance model. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 12 (2012), 2420--2440.Google Scholar
Digital Library
- S. Jayasumana, R. Hartley, M. Salzmann, H. Li, and M. Harandi. 2013. Kernel methods on the Riemannian manifold of symmetric positive definite matrices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). 73--80.Google Scholar
- A. Jepson, D. Fleet, and T. EI-Maraghi. 2003. Robust online appearance models for visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 10 (2003), 1296--1311.Google Scholar
Digital Library
- X. Jia, H. Lu, and M. Yang. 2012. Visual tracking via adaptive structural local sparse appearance model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). 1822--1829.Google Scholar
- Feng Jiang, Shengping Zhang, Shen Wu, Yang Gao, and Debin Zhao. 2015. Multi-layered gesture recognition with Kinect. Journal of Machine Learning Research 16 (2015), 227--254.Google Scholar
Digital Library
- Xiangyuan Lan, Andy J. Ma, and Pong C. Yuen. 2014. Multi-cue visual tracking using robust feature-level fusion based on joint sparse representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 1194--1201.Google Scholar
- H. Lee, A. Battle, R. Raina, and A. Ng. 2006. Efficient sparse coding algorithm. In Proceedings of Advances in Neural Information Processing Systems (NIPS’06). 801--808.Google Scholar
- A. Li, M. Lin, Y. Wu, M. Yang, and S. Yan. 2015. NUS-PRO: A new visual tracking challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 2 (2015), 335--349. DOI:10.1109/TPAMI.2015.2417577Google Scholar
Digital Library
- Hanxi Li, Yongsheng Gao, and Jun Sun. 2011. Fast kernel sparse representation. In Proceedings of International Conference on Digital Image Computing Techniques and Applications. 72--77.Google Scholar
Digital Library
- Xi Li, Weiming Hu, Chunhua Shen, Zhongfei Zhang, Anthony Dick, and Anton van den Hengel. 2013. A survey of appearance models in visual object tracking. ACM Transactions on Intelligent Systems and Technology 4, 4 (2013), 58.Google Scholar
Digital Library
- B. Liu, J. Huang, L. Yang, and C. Kulikowski. 2011. Robust tracking using local sparse appearance model and K-selection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11).Google Scholar
- B. Liu, L. Yang, J. Huang, P. Meer, L. Gong, and C. Kilikowski. 2010. Robust and fast collaborative tracking with two stage sparse optimization. In Proceedings of the European Conference on Computer Vision (ECCV’10). 624--637.Google Scholar
- I. Matthews, T. Ishikawa, and S. Baker. 2004. The template update problem. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 6 (2004), 810--815.Google Scholar
Digital Library
- X. Mei and H. Ling. 2009. Robust visual tracking using L1 minimization. In Proceedings of the International Conference on Computer Vision (ICCV’09). 1436--1443.Google Scholar
- X. Mei, H. Ling, Y. Wu, E. Blasch, and L. Bai. 2011. Minimum error bounded efficient L1 tracker with occlusion detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11).Google Scholar
- Y. Nesterov. 1983. A method of solving a convex programming problem with convergence rate O(1/k2). Soviet Mathematics Doklady 27, 2 (1983), 372--376.Google Scholar
- Xavier Pennec, Pierre Fillard, and Nicholas Ayache. 2006. A Riemannian framework for tensor computing. International Journal of Computer Vision 66, 1 (2006), 41--66.Google Scholar
Digital Library
- P. Pérez, C. Hue, J. Vermaak, and M. Gangnet. 2002. Color-based probabilistic tracking. In Proceedings of the European Conference on Computer Vision (ECCV’02). 661--675.Google Scholar
- F. Porikli, O. Tuzel, and P. Meer. 2006. Covariance tracking using model update based on lie algebra. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 1. 728--735.Google Scholar
- Yuankai Qi, Lei Qin, Jian Zhang, Shengping Zhang, Qingming Huang, and Ming-Hsuan Yang. 2018. Structure-aware local sparse coding for visual tracking. IEEE Transactions on Image Processing 27, 8 (2018), 3857--3869.Google Scholar
Digital Library
- Yuankai Qi, Shengping Zhang, Lei Qin, Hongxun Yao, Qingming Huang, Jongwoo Lim, and Ming-Hsuan Yang. 2019. Hedging deep features for visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 5 (2019), 1116--1130.Google Scholar
Cross Ref
- D. Ross, J. Lim, R. Lin, and M. Yang. 2008. Incremental learning for robust visual tracking. International Journal of Computer Vision 77 (2008), 125--141.Google Scholar
Digital Library
- Samuele Salti, Andrea Cavallaro, and Luigi Di Stefano. 2012. Adaptive appearance modeling for video tracking: Survey and evaluation. IEEE Transactions on Image Processing 21, 10 (2012), 4334--4348.Google Scholar
Digital Library
- Arnold W. M. Smeulders, Dung M. Chu, Rita Cucchiara, Simone Calderara, Afshin Dehghan, and Mubarak Shah. 2013. Visual tracking: An experimental survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 7 (2013), 1442--1468.Google Scholar
- S. Sra. 2012. A new metric on the manifold of kernel matrices with application to matrix geometric means. In Proceedings of Advances in Neural Information Processing Systems (NIPS’12). 144--152.Google Scholar
- Diego Tosato, Mauro Spera, Marco Cristani, and Vittorio Murino. 2013. Characterizing humans on Riemannian manifolds. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 8 (2013), 1972--1984.Google Scholar
Digital Library
- P. Tseng. 2008. On Accelerated Proximal Gradient Methods for Convex-Concave Optimization. Technical report. University of Washington, Seattle.Google Scholar
- O. Tuzel, F. Porikli, and P. Meer. 2006. Region covariance: A fast descriptor for detection and classification. In Proceedings of the European Conference on Computer Vision (ECCV’06), Vol. 2. 589--600.Google Scholar
- M. Varma and A. Zisserman. 2002. Classifying images of materials: Achieving viewpoint and illumination independence. In Proceedings of the European Conference on Computer Vision (ECCV’02). 255--271.Google Scholar
- Jinjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, Thomas S. Huang, and Yihong Gong. 2010. Locality-constrained linear coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). 3360--3367.Google Scholar
Cross Ref
- Naiyan Wang, Jingdong Wang, and Dit-Yan Yeung. 2013. Online robust non-negative dictionary learning for visual tracking. In Proceedings of the International Conference on Computer Vision (ICCV’13).Google Scholar
Digital Library
- Naiyan Wang and Dit-Yan Yeung. 2013. Learning a deep compact image representation for visual tracking. In Proceedings of Advances in Neural Information Processing Systems (NIPS’13). 809--817.Google Scholar
- Naiyan Wang and Dit-Yan Yeung. 2014. Ensemble-based tracking: Aggregating crowdsourced structured time series data. In Proceedings of the International Conference on Machine Learning (ICML’14). 1107--1115.Google Scholar
- Peter Wilf, Shengping Zhang, Sharat Chikkerur, Stefan A. Little, Scott L. Wing, and Thomas Serre. 2016. Computer vision cracks the leaf code. Proceedings of the National Academy of Sciences of the United States of America 113, 12 (2016), 3305--3310.Google Scholar
Cross Ref
- J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma. 2009. Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 2 (2009), 210--227.Google Scholar
Digital Library
- Y. Wu, J. Cheng, J. Wang, H. Lu, J. Wang, H. Ling, E. Blasch, and L. Bai. 2012. Real-time probabilistic covariance tracking with efficient model update. IEEE Transactions on Image Processing 21, 5 (2012), 2824--2837.Google Scholar
Digital Library
- Y. Wu, J. Lim, and M. Yang. 2013. Online object tracking: A benchmark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). 2411--2418.Google Scholar
- Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. 2015. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 9 (2015), 1834--1848. DOI:10.1109/TPAMI.2014.2388226Google Scholar
Digital Library
- Yi Wu, Haibin Ling, Erik Blasch, Li Bai, and Genshe Chen. 2011. Visual tracking based on log-Euclidean Riemannian sparse representation. In Proceedings of the 7th International Symposium on Advances in Visual Computing. 738--747.Google Scholar
Cross Ref
- Hanxuan Yang, Ling Shao, Feng Zheng, Liang Wang, and Zhan Song. 2011. Recent advances and trends in visual tracking: A review. Neurocomputing 74, 18 (2011), 3823--3831.Google Scholar
Digital Library
- Yingjie Yao, Xiaohe Wu, Lei Zhang, Shiguang Shan, and Wangmeng Zuo. 2018. Joint representation and truncated inference learning for correlation filter based tracking. In Proceedings of the European Conference on Computer Vision (ECCV’18). 552--567.Google Scholar
Cross Ref
- Alper Yilmaz, Omar Javed, and Mubarak Shah. 2006. Object tracking: A survey. ACM Computing Surveys 38, 4 (2006), 13.Google Scholar
Digital Library
- Lei Zhang, Wen Wu, Terrence Chen, Norbert Strobel, and Dorin Comaniciu. 2015. Robust object tracking using semi-supervised appearance dictionary learning. Pattern Recognition Letters 62 (2015), 17--23.Google Scholar
Digital Library
- Lei Zhang, Shengping Zhang, Feng Jiang, Yuankai Qi, Jun Zhang, Yuliang Guo, and Huiyu Zhou. 2018. BoMW: Bag of manifold words for one-shot learning gesture recognition from kinect. IEEE Transactions on Circuits and Systems for Video Technology 28, 10 (2018), 2562--2573.Google Scholar
Digital Library
- L. Zhang, W. Zhou, P. Chang, J. Liu, Z. Yan, T. Wang, and F. Li. 2012. Kernel sparse representation based classifier. IEEE Transactions on Signal Processing 60, 4 (2012), 1684--1695.Google Scholar
Digital Library
- Shengping Zhang, Shiva Kasiviswanathan, Pong C. Yuen, and Mehrtash Harandi. 2015. Online dictionary learning on symmetric positive definite manifolds with vision applications. In Proceedings of the 29th AAAI Conference on Artificial Intelligence. 3165--3173.Google Scholar
- Shengping Zhang, Xiangyuan Lan, Yuankai Qi, and Pong C. Yuen. 2017. Robust visual tracking via basis matching. IEEE Transactions on Circuits and Systems for Video Technology 27, 3 (2017), 421--430.Google Scholar
Digital Library
- Shengping Zhang, Xiangyuan Lan, Hongxun Yao, Huiyu Zhou, Dacheng Tao, and Xuelong Li. 2017. A biologically inspired appearance model for robust visual tracking. IEEE Transactions on Neural Networks and Learning Systems 28, 10 (2017), 2357--2370.Google Scholar
Cross Ref
- Shengping Zhang, Yuankai Qi, Feng Jiang, Xiangyuan Lan, Pong C. Yuen, and Huiyu Zhou. 2018. Point-to-set distance metric learning on deep representations for visual tracking. IEEE Transactions on Intelligent Transportation Systems 19, 1 (2018), 187--198.Google Scholar
Cross Ref
- Shengping Zhang, Hongxun Yao, Xin Sun, and Xiusheng Lu. 2013. Sparse coding based visual tracking: Review and experimental comparison. Pattern Recognition 46, 7 (2013), 1772--1788.Google Scholar
Digital Library
- Shengping Zhang, Hongxun Yao, Xin Sun, Kuanquan Wang, Jun Zhang, Xiusheng Lu, and Yanhao Zhang. 2014. Action recognition based on overcomplete independent component analysis. Information Sciences 281 (2014), 635--647.Google Scholar
Digital Library
- Shengping Zhang, Hongxun Yao, Huiyu Zhou, Xin Sun, and Shaohui Liu. 2013. Robust visual tracking based on online learning sparse representation. Neurocomputing 100, 1 (2013), 31--40.Google Scholar
Digital Library
- Shengping Zhang, Huiyu Zhou, Feng Jiang, and Xuelong Li. 2015. Robust visual tracking using structurally random projection and weighted least squares. IEEE Transactions on Circuits and Systems for Video Technology 25, 11 (2015), 1749--1760.Google Scholar
Digital Library
- Shengping Zhang, Huiyu Zhou, Hongxun Yao, Yanhao Zhang, Kuanquan Wang, and Jun Zhang. 2015. Adaptive NormalHedge for robust visual tracking. Signal Processing 110 (2015), 132--142.Google Scholar
Digital Library
- Bineng Zhong, Bing Bai, Jun Li, Yulun Zhang, and Yun Fu. 2019. Hierarchical tracking by reinforcement learning-based searching and coarse-to-fine verifying. IEEE Transactions on Image Processing 28, 5 (2019), 2331--2341.Google Scholar
Cross Ref
- Bineng Zhong, Hongxun Yao, Sheng Chen, Rongrong Ji, Tat-Jun Chin, and Hanzi Wang. 2014. Visual tracking via weakly supervised learning from multiple imperfect oracles. Pattern Recognition 47, 3 (2014), 1395--1410.Google Scholar
Digital Library
- W. Zhong, H. Lu, and M. Yang. 2012. Robust object tracking via sparsity-based collaborative model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). 1838--1845.Google Scholar
- Qinqin Zhou, Bineng Zhong, Yulun Zhang, Jun Li, and Yun Fu. 2019. Deep alignment network based multi-person tracking with occlusion and motion reasoning. IEEE Transactions on Multimedia 21, 5 (2019), 1183--1194.Google Scholar
Digital Library
- Heyan Zhu, Xinyuan Huang, Shengping Zhang, and Pong C. Yuen. 2017. Plant identification via multipath sparse coding. Multimedia Tools and Applications 76, 3 (2017), 4599--4615. DOI:https://doi.org/10.1007/s11042-016-3538-4Google Scholar
Digital Library
Index Terms
Robust Visual Tracking Using Kernel Sparse Coding on Multiple Covariance Descriptors
Recommendations
CovLets: A Second-Order Descriptor for Modeling Multiple Features
Special Issue on Multimodal Machine Learning for Human Behavior Analysis and Special Issue on Computational Intelligence for Biomedical Data and ImagingState-of-the-art techniques for image and video classification take a bottom-up approach where local features are aggregated into a global final representation. Existing frameworks (i.e., bag of words or Fisher vectors) are specifically designed to ...
Manifold Kernel Sparse Representation of Symmetric Positive-Definite Matrices and Its Applications
The symmetric positive-definite (SPD) matrix, as a connected Riemannian manifold, has become increasingly popular for encoding image information. Most existing sparse models are still primarily developed in the Euclidean space. They do not consider the ...
Saliency Detection via Sparse Reconstruction Errors of Covariance Descriptors on Riemannian Manifolds
We present a novel visual saliency detection method using covariance matrices on a Riemannian manifold. After over-segmentation, superpixels are generated and featured by the region covariance matrix. The superpixels on image boundary are regarded as ...






Comments