Abstract
The development of multi-spectrum image sensing technology has brought great interest in exploiting the information of multiple modalities (e.g., RGB and infrared modalities) for solving computer vision problems. In this article, we investigate how to exploit information from RGB and infrared modalities to address two important issues in visual tracking: robustness and object re-detection. Although various algorithms that attempt to exploit multi-modality information in appearance modeling have been developed, they still face challenges that mainly come from the following aspects: (1) the lack of robustness to deal with large appearance changes and dynamic background, (2) failure in re-capturing the object when tracking loss happens, and (3) difficulty in determining the reliability of different modalities. To address these issues and perform effective integration of multiple modalities, we propose a new tracking-by-detection algorithm called Adaptive Spatial-temporal Regulated Multi-Modality Correlation Filter. Particularly, an adaptive spatial-temporal regularization is imposed into the correlation filter framework in which the spatial regularization can help to suppress effect from the cluttered background while the temporal regularization enables the adaptive incorporation of historical appearance cues to deal with appearance changes. In addition, a dynamic modality weight learning algorithm is integrated into the correlation filter training, which ensures that more reliable modalities gain more importance in target tracking. Experimental results demonstrate the effectiveness of the proposed method.
- Shai Avidan. 2004. Support vector tracking. IEEE Trans. Pattern Anal. Mach. Intell. 26, 8 (2004), 1064–1072. Google Scholar
Digital Library
- Boris Babenko, Ming-Hsuan Yang, and Serge Belongie. 2011. Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 33, 8 (2011), 1619–1632. Google Scholar
Digital Library
- David S. Bolme, J. Ross Beveridge, Bruce Draper, Yui Man Lui et al. 2010. Visual object tracking using adaptive correlation filters. In Proceedings of the CVPR. 2544–2550.Google Scholar
Cross Ref
- S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3, 1 (2011). Google Scholar
Digital Library
- Filiz Bunyak, Kannappan Palaniappan, Sumit Kumar Nath, and Guna Seetharaman. 2007. Geodesic active contour based fusion of visible and infrared video for persistent object tracking. In Proceedings of the WACV. Google Scholar
Digital Library
- Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. 2017. ECO: Efficient convolution operators for tracking. In Proceedings of the CVPR. 6931–6939.Google Scholar
Cross Ref
- Martin Danelljan, Gustav Hager, Fahad Shahbaz Khan, and Michael Felsberg. 2015. Learning spatially regularized correlation filters for visual tracking. In Proceedings of the ICCV. 4310–4318. Google Scholar
Digital Library
- Martin Danelljan, Gustav Hager, Fahad Shahbaz Khan, and Michael Felsberg. 2016. Adaptive decontamination of the training set: A unified formulation for discriminative visual tracking. In Proceedings of the CVPR. 1430–1438.Google Scholar
Cross Ref
- Martin Danelljan, Fahad Shahbaz Khan, Michael Felsberg, and Joost van de Weijer. 2014. Adaptive color attributes for real-time visual tracking. In Proceedings of the CVPR. IEEE, 1090–1097. Google Scholar
Digital Library
- Martin Danelljan, Andreas Robinson, Fahad Shahbaz Khan, and Michael Felsberg. 2016. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In Proceedings of the ECCV. 472–488.Google Scholar
Cross Ref
- Hamed Kiani Galoogahi, Terence Sim, and Simon Lucey. 2013. Multi-channel correlation filters. In Proceedings of the ICCV. 3072–3079. Google Scholar
Digital Library
- Helmut Grabner and Horst Bischof. 2006. On-line boosting and vision. In Proceedings of the CVPR. 260–267. Google Scholar
Digital Library
- Helmut Grabner, Christian Leistner, and Horst Bischof. 2008. Semi-supervised on-line boosting for robust tracking. In Proceedings of the ECCV. 234–247. Google Scholar
Digital Library
- Jungong Han, Eric J. Pauwels, Paul M. de Zeeuw, and Peter H. N. de With. 2012. Employing a RGB-D sensor for real-time tracking of humans across multiple re-entries in a smart environment. IEEE Trans. Consum. Electron. 58, 2 (2012), 255–263.Google Scholar
Cross Ref
- Sam Hare, Stuart Golodetz, Amir Saffari, Vibhav Vineet, Ming-Ming Cheng, Stephen L. Hicks, and Philip H. S. Torr. 2016. Struck: Structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 38, 10 (2016), 2096–2109. Google Scholar
Digital Library
- João F. Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. 2012. Exploiting the circulant structure of tracking-by-detection with kernels. In Proceedings of the ECCV. 702–715. Google Scholar
Digital Library
- João F. Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. 2015. High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37, 3 (2015), 583–596.Google Scholar
Digital Library
- Zhibin Hong, Zhe Chen, Chaohui Wang, Xue Mei, Danil Prokhorov, and Dacheng Tao. 2015. MUlti-store tracker (MUSTer): A cognitive psychology inspired approach to object tracking. In Proceedings of the CVPR. 749–758.Google Scholar
Cross Ref
- Yang Hua, Karteek Alahari, and Cordelia Schmid. 2014. Occlusion and motion reasoning for long-term tracking. In Proceedings of the ECCV. 172–187.Google Scholar
Cross Ref
- Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas. 2012. Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34, 7 (2012), 1409–1422. Google Scholar
Digital Library
- Xiangyuan Lan, A. J. Ma, P. C. Yuen, and R. Chellappa. 2015. Joint sparse representation and robust feature-level fusion for multi-cue visual tracking. IEEE Trans. Image Process. 24, 12 (Dec 2015), 5826–5841.Google Scholar
Digital Library
- Xiangyuan Lan, Andy Jinhua Ma, and Pong Chi Yuen. 2014. Multi-cue visual tracking using robust feature-level fusion based on joint sparse representation. In Proceedings of the CVPR. 1194–1201. Google Scholar
Digital Library
- Xiangyuan Lan, Mang Ye, Rui Shao, Bineng Zhong, Pong C. Yuen, and Huiyu Zhou. 2019. Learning modality-consistency feature templates: A robust RGB-infrared tracking system. IEEE Trans. Ind. Electron. 66, 12 (2019), 9887–9897.Google Scholar
Cross Ref
- Xiangyuan Lan, Mang Ye, Shengping Zhang, and Pong C. Yuen. 2018. Robust collaborative discriminative learning for RGB-infrared tracking. In Proceedings of the AAAI. 7008–7015.Google Scholar
- Xiangyuan Lan, Mang Ye, Shengping Zhang, Huiyu Zhou, and Pong C. Yuen. 2018. Modality-correlation-aware sparse representation for RGB-infrared object tracking. Pattern Recog. Lett. (2018). DOI:10.1016/j.patrec.2018.10.002Google Scholar
Digital Library
- Xiangyuan Lan, Pong C. Yuen, and Rama Chellappa. 2017. Robust MIL-based feature template learning for object tracking. In Proceedings of the AAAI. 4118–4125. Google Scholar
Digital Library
- Xiangyuan Lan, Shengping Zhang, and Pong C. Yuen. 2016. Robust joint discriminative feature learning for visual tracking. In Proceedings of the IJCAI. 3403–3410. Google Scholar
Digital Library
- Xiangyuan Lan, Shengping Zhang, Pong C. Yuen, and Rama Chellappa. 2018. Learning common and feature-specific patterns: A novel multiple-sparse-representation-based tracker. IEEE Trans. Image Process. 27, 4 (2018), 2022–2037.Google Scholar
Cross Ref
- Xiangyuan Lan, Wei Zhang, Shengping Zhang, Deepak Kumar Jain, and Huiyu Zhou. 2019. Robust multi-modality anchor graph-based label prediction for RGB-infrared tracking. IEEE Trans. Industr. Inform. (2019). DOI:10.1109/TII.2019.2947293Google Scholar
- Karel Lebeda, Simon Hadfield, Jiri Matas, and Richard Bowden. 2013. Long-term tracking through failure cases. In Proceedings of the ICCV Workshop. 153–160. Google Scholar
Digital Library
- Alex Leykin and Riad I. Hammoud. 2010. Pedestrian tracking by fusion of thermal-visible surveillance videos. Mach. Vis. Appl. 21, 4 (2010), 587–595. Google Scholar
Digital Library
- Chenglong Li, Hui Cheng, Shiyi Hu, Xiaobai Liu, Jin Tang, and Liang Lin. 2016. Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Trans. Image Process. 25, 12 (2016), 5743–5756. Google Scholar
Digital Library
- Feng Li, Cheng Tian, Wangmeng Zuo, Lei Zhang, and Ming-Hsuan Yang. 2018. Learning spatial-temporal regularized correlation filters for visual tracking. In Proceedings of the CVPR. 4904–4913.Google Scholar
Cross Ref
- Xi Li, Weiming Hu, Chunhua Shen, Zhongfei Zhang, Anthony Dick, and Anton Van Den Hengel. 2013. A survey of appearance models in visual object tracking. ACM Trans. Intell. Syst. Technol. 4, 4 (2013), 58. Google Scholar
Digital Library
- Yang Li and Jianke Zhu. 2014. A scale adaptive kernel correlation filter tracker with feature integration. In Proceedings of the ECCV Workshops. 254–265.Google Scholar
- Yang Li, Jianke Zhu, and Steven C. H. Hoi. 2015. Reliable patch trackers: Robust visual tracking by exploiting reliable patches. In Proceedings of the CVPR. 353–361.Google Scholar
- HuaPing Liu and FuChun Sun. 2012. Fusion tracking in color and infrared images using joint sparse representation. Sci. China Inf. Sci. 55, 3 (2012), 590–599.Google Scholar
Cross Ref
- Si Liu, Tianzhu Zhang, Xiaochun Cao, and Changsheng Xu. 2016. Structural correlation filter for robust visual tracking. In Proceedings of the CVPR. 4312–4320.Google Scholar
Cross Ref
- Ting Liu, Gang Wang, and Qingxiong Yang. 2015. Real-time part-based visual tracking via adaptive correlation filters. In Proceedings of the CVPR. 4902–4912.Google Scholar
Cross Ref
- Chao Ma, Jia-Bin Huang, Xiaokang Yang, and Ming-Hsuan Yang. 2015. Hierarchical convolutional features for visual tracking. In Proceedings of the ICCV. 3074–3082. Google Scholar
Digital Library
- Chao Ma, Xiaokang Yang, Chongyang Zhang, and Ming-Hsuan Yang. 2015. Long-term correlation tracking. In Proceedings of the CVPR. 5388–5396.Google Scholar
Cross Ref
- Feiping Nie, Jing Li, and Xuelong Li. 2016. Parameter-free auto-weighted multiple graph learning: A framework for multiview clustering and semi-supervised classification. In Proceedings of the IJCAI. 1881–1887. Google Scholar
Digital Library
- Yuankai Qi, Shengping Zhang, Lei Qin, Qingming Huang, Hongxun Yao, Jongwoo Lim, and Ming-Hsuan Yang. 2018. Hedging deep features for visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. (2018). DOI:https://doi.org/10.1109/TPAMI.2018.2828817Google Scholar
- Samuele Salti, Andrea Cavallaro, and Luigi di Stefano. 2012. Adaptive appearance modeling for video tracking: Survey and evaluation. IEEE Trans. Image Process. 21, 10 (2012), 4334–4348. Google Scholar
Digital Library
- Rui Shao, Xiangyuan Lan, and P. C. Yuen. 2018. Joint discriminative learning of deep dynamic textures for 3D mask face anti-spoofing. IEEE Trans. Inf. Forens. Secur. (2018). DOI:10.1109/TIFS.2018.2868230Google Scholar
- Yibing Song, Linchao Bao, Shengfeng He, Qingxiong Yang, and Ming-Hsuan Yang. 2017. Stylizing face images via multiple exemplars. Comput. Vis. Image Underst. 162 (2017), 135–145. Google Scholar
Digital Library
- James S. Supancic and Deva Ramanan. 2013. Self-paced learning for long-term tracking. In Proceedings of the CVPR. 2379–2386. Google Scholar
Digital Library
- Ming Tang and Jiayi Feng. 2015. Multi-kernel correlation filter for visual tracking. In Proceedings of the ICCV. 3038–3046. Google Scholar
Digital Library
- Yi Wu, Erik Blasch, Genshe Chen, Li Bai, and Haibin Ling. 2011. Multiple source data fusion via sparse representation for robust visual tracking. In Proceedings of the FUSION Conference. 1–8.Google Scholar
- Mang Ye, Yi Cheng, Xiangyuan Lan, and Hongyuan Zhu. 2020. Improving night-time pedestrian retrieval with distribution alignment and contextual distance. IEEE Trans. Ind. Inform. 16, 1 (2020), 615–624.Google Scholar
Cross Ref
- Mang Ye, Xiangyuan Lan, Qingming Leng, and Jianbing Shen. 2020. Cross-modality person re-identification via modality-aware collaborative ensemble learning. IEEE Trans. Image Process. 29 (2020), 9387–9399.Google Scholar
Cross Ref
- Mang Ye, Xiangyuan Lan, Jiawei Li, and Pong C. Yuen. 2018. Hierarchical discriminative learning for visible thermal person re-identification. In Proceedings of the AAAI.Google Scholar
- Mang Ye, Xiangyuan Lan, Zheng Wang, and Pong C. Yuen. 2020. Bi-directional center-constrained top-ranking for visible thermal person re-identification. IEEE Trans. Inf. Forens. Secur. 15 (2020), 407–419.Google Scholar
Digital Library
- Mang Ye, Zheng Wang, Xiangyuan Lan, and Pong C. Yuen. 2018. Visible thermal person re-identification via dual-constrained top-ranking. In Proceedings of the IJCAI. 1092–1099. Google Scholar
Digital Library
- Jianming Zhang, Shugao Ma, and Stan Sclaroff. 2014. MEEM: Robust tracking via multiple experts using entropy minimization. In Proceedings of the ECCV. 188–203.Google Scholar
Cross Ref
- Kaihua Zhang, Lei Zhang, Qingshan Liu, David Zhang, and Ming-Hsuan Yang. 2014. Fast visual tracking via dense spatio-temporal context learning. In Proceedings of the ECCV. 127–141.Google Scholar
Cross Ref
- Kaihua Zhang, Lei Zhang, and Ming-Hsuan Yang. 2014. Fast compressive tracking. IEEE Trans. Pattern Anal. Mach. Intell. 36, 10 (2014), 2002–2015.Google Scholar
Cross Ref
- Kaihua Zhang, Lei Zhang, and Ming-Hsuan Yang. 2014. Fast compressive tracking. IEEE Trans. Pattern Anal. Mach. Intell. 36, 10 (2014), 2002–2015.Google Scholar
Cross Ref
- Shengping Zhang, Xiangyuan Lan, Yuankai Qi, and Pong C. Yuen. 2017. Robust visual tracking via basis matching. IEEE Trans. Circ. Syst. Vid. Technol. 27, 3 (2017), 421–430. Google Scholar
Digital Library
- Shengping Zhang, Xiangyuan Lan, Hongxun Yao, Hhuiyu Zhou, Dacheng Tao, and Xxuelong Li. 2017. A biologically inspired appearance model for robust visual tracking. IEEE Trans. Neural Netw. Learn. Syst. 28, 10 (2017), 2357–2370.Google Scholar
Cross Ref
- Shengping Zhang, Hongxun Yao, Xin Sun, and Xiusheng Lu. 2013. Sparse coding based visual tracking: Review and experimental comparison. Pattern Recog. 46, 7 (2013), 1772–1788. Google Scholar
Digital Library
- Qinqin Zhou, Bineng Zhong, Xiangyuan Lan, Gan Sun, Yulun Zhang, Baochang Zhang, and Rongrong Ji. 2020. Fine-grained spatial alignment model for person re-identification with focal triplet loss. IEEE Trans. Image Process. 29 (2020), 7578–7589.Google Scholar
Digital Library
Index Terms
Spatial-temporal Regularized Multi-modality Correlation Filters for Tracking with Re-detection
Recommendations
Hand-Eye Camera Calibration with an Optical Tracking System
ICDSC '18: Proceedings of the 12th International Conference on Distributed Smart CamerasThis paper presents a method for hand-eye camera calibration via an optical tracking system (OTS) faciltating robotic applications. The camera pose cannot be directly tracked via the OTS. Because of this, a transformation matrix between a marker-plate ...
Integrated Detect-Track Framework for Multi-view Face Detection in Video
ICVGIP '08: Proceedings of the 2008 Sixth Indian Conference on Computer Vision, Graphics & Image ProcessingAn Experiential sampling and Meanshift tracker based Multi-view face detection in video is proposed in this paper. In this framework, instead of performing face detection at every position in a frame, we determine certain key positions to run the multi-...
Multi-views tracking within and across uncalibrated camera streams
IWVS '03: First ACM SIGMM international workshop on Video surveillanceThis paper presents novel approaches for continuous detection and tracking of moving objects observed by multiple, stationary or moving cameras. Stationary video streams are registered using a ground plane homography and the trajectories derived by ...






Comments