skip to main content
research-article

Spatial-temporal Regularized Multi-modality Correlation Filters for Tracking with Re-detection

Authors Info & Claims
Published:29 May 2021Publication History
Skip Abstract Section

Abstract

The development of multi-spectrum image sensing technology has brought great interest in exploiting the information of multiple modalities (e.g., RGB and infrared modalities) for solving computer vision problems. In this article, we investigate how to exploit information from RGB and infrared modalities to address two important issues in visual tracking: robustness and object re-detection. Although various algorithms that attempt to exploit multi-modality information in appearance modeling have been developed, they still face challenges that mainly come from the following aspects: (1) the lack of robustness to deal with large appearance changes and dynamic background, (2) failure in re-capturing the object when tracking loss happens, and (3) difficulty in determining the reliability of different modalities. To address these issues and perform effective integration of multiple modalities, we propose a new tracking-by-detection algorithm called Adaptive Spatial-temporal Regulated Multi-Modality Correlation Filter. Particularly, an adaptive spatial-temporal regularization is imposed into the correlation filter framework in which the spatial regularization can help to suppress effect from the cluttered background while the temporal regularization enables the adaptive incorporation of historical appearance cues to deal with appearance changes. In addition, a dynamic modality weight learning algorithm is integrated into the correlation filter training, which ensures that more reliable modalities gain more importance in target tracking. Experimental results demonstrate the effectiveness of the proposed method.

References

  1. Shai Avidan. 2004. Support vector tracking. IEEE Trans. Pattern Anal. Mach. Intell. 26, 8 (2004), 1064–1072. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Boris Babenko, Ming-Hsuan Yang, and Serge Belongie. 2011. Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 33, 8 (2011), 1619–1632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. David S. Bolme, J. Ross Beveridge, Bruce Draper, Yui Man Lui et al. 2010. Visual object tracking using adaptive correlation filters. In Proceedings of the CVPR. 2544–2550.Google ScholarGoogle ScholarCross RefCross Ref
  4. S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3, 1 (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Filiz Bunyak, Kannappan Palaniappan, Sumit Kumar Nath, and Guna Seetharaman. 2007. Geodesic active contour based fusion of visible and infrared video for persistent object tracking. In Proceedings of the WACV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. 2017. ECO: Efficient convolution operators for tracking. In Proceedings of the CVPR. 6931–6939.Google ScholarGoogle ScholarCross RefCross Ref
  7. Martin Danelljan, Gustav Hager, Fahad Shahbaz Khan, and Michael Felsberg. 2015. Learning spatially regularized correlation filters for visual tracking. In Proceedings of the ICCV. 4310–4318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Martin Danelljan, Gustav Hager, Fahad Shahbaz Khan, and Michael Felsberg. 2016. Adaptive decontamination of the training set: A unified formulation for discriminative visual tracking. In Proceedings of the CVPR. 1430–1438.Google ScholarGoogle ScholarCross RefCross Ref
  9. Martin Danelljan, Fahad Shahbaz Khan, Michael Felsberg, and Joost van de Weijer. 2014. Adaptive color attributes for real-time visual tracking. In Proceedings of the CVPR. IEEE, 1090–1097. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Martin Danelljan, Andreas Robinson, Fahad Shahbaz Khan, and Michael Felsberg. 2016. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In Proceedings of the ECCV. 472–488.Google ScholarGoogle ScholarCross RefCross Ref
  11. Hamed Kiani Galoogahi, Terence Sim, and Simon Lucey. 2013. Multi-channel correlation filters. In Proceedings of the ICCV. 3072–3079. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Helmut Grabner and Horst Bischof. 2006. On-line boosting and vision. In Proceedings of the CVPR. 260–267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Helmut Grabner, Christian Leistner, and Horst Bischof. 2008. Semi-supervised on-line boosting for robust tracking. In Proceedings of the ECCV. 234–247. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jungong Han, Eric J. Pauwels, Paul M. de Zeeuw, and Peter H. N. de With. 2012. Employing a RGB-D sensor for real-time tracking of humans across multiple re-entries in a smart environment. IEEE Trans. Consum. Electron. 58, 2 (2012), 255–263.Google ScholarGoogle ScholarCross RefCross Ref
  15. Sam Hare, Stuart Golodetz, Amir Saffari, Vibhav Vineet, Ming-Ming Cheng, Stephen L. Hicks, and Philip H. S. Torr. 2016. Struck: Structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 38, 10 (2016), 2096–2109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. João F. Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. 2012. Exploiting the circulant structure of tracking-by-detection with kernels. In Proceedings of the ECCV. 702–715. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. João F. Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. 2015. High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37, 3 (2015), 583–596.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Zhibin Hong, Zhe Chen, Chaohui Wang, Xue Mei, Danil Prokhorov, and Dacheng Tao. 2015. MUlti-store tracker (MUSTer): A cognitive psychology inspired approach to object tracking. In Proceedings of the CVPR. 749–758.Google ScholarGoogle ScholarCross RefCross Ref
  19. Yang Hua, Karteek Alahari, and Cordelia Schmid. 2014. Occlusion and motion reasoning for long-term tracking. In Proceedings of the ECCV. 172–187.Google ScholarGoogle ScholarCross RefCross Ref
  20. Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas. 2012. Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34, 7 (2012), 1409–1422. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Xiangyuan Lan, A. J. Ma, P. C. Yuen, and R. Chellappa. 2015. Joint sparse representation and robust feature-level fusion for multi-cue visual tracking. IEEE Trans. Image Process. 24, 12 (Dec 2015), 5826–5841.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Xiangyuan Lan, Andy Jinhua Ma, and Pong Chi Yuen. 2014. Multi-cue visual tracking using robust feature-level fusion based on joint sparse representation. In Proceedings of the CVPR. 1194–1201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Xiangyuan Lan, Mang Ye, Rui Shao, Bineng Zhong, Pong C. Yuen, and Huiyu Zhou. 2019. Learning modality-consistency feature templates: A robust RGB-infrared tracking system. IEEE Trans. Ind. Electron. 66, 12 (2019), 9887–9897.Google ScholarGoogle ScholarCross RefCross Ref
  24. Xiangyuan Lan, Mang Ye, Shengping Zhang, and Pong C. Yuen. 2018. Robust collaborative discriminative learning for RGB-infrared tracking. In Proceedings of the AAAI. 7008–7015.Google ScholarGoogle Scholar
  25. Xiangyuan Lan, Mang Ye, Shengping Zhang, Huiyu Zhou, and Pong C. Yuen. 2018. Modality-correlation-aware sparse representation for RGB-infrared object tracking. Pattern Recog. Lett. (2018). DOI:10.1016/j.patrec.2018.10.002Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Xiangyuan Lan, Pong C. Yuen, and Rama Chellappa. 2017. Robust MIL-based feature template learning for object tracking. In Proceedings of the AAAI. 4118–4125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Xiangyuan Lan, Shengping Zhang, and Pong C. Yuen. 2016. Robust joint discriminative feature learning for visual tracking. In Proceedings of the IJCAI. 3403–3410. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Xiangyuan Lan, Shengping Zhang, Pong C. Yuen, and Rama Chellappa. 2018. Learning common and feature-specific patterns: A novel multiple-sparse-representation-based tracker. IEEE Trans. Image Process. 27, 4 (2018), 2022–2037.Google ScholarGoogle ScholarCross RefCross Ref
  29. Xiangyuan Lan, Wei Zhang, Shengping Zhang, Deepak Kumar Jain, and Huiyu Zhou. 2019. Robust multi-modality anchor graph-based label prediction for RGB-infrared tracking. IEEE Trans. Industr. Inform. (2019). DOI:10.1109/TII.2019.2947293Google ScholarGoogle Scholar
  30. Karel Lebeda, Simon Hadfield, Jiri Matas, and Richard Bowden. 2013. Long-term tracking through failure cases. In Proceedings of the ICCV Workshop. 153–160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Alex Leykin and Riad I. Hammoud. 2010. Pedestrian tracking by fusion of thermal-visible surveillance videos. Mach. Vis. Appl. 21, 4 (2010), 587–595. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Chenglong Li, Hui Cheng, Shiyi Hu, Xiaobai Liu, Jin Tang, and Liang Lin. 2016. Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Trans. Image Process. 25, 12 (2016), 5743–5756. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Feng Li, Cheng Tian, Wangmeng Zuo, Lei Zhang, and Ming-Hsuan Yang. 2018. Learning spatial-temporal regularized correlation filters for visual tracking. In Proceedings of the CVPR. 4904–4913.Google ScholarGoogle ScholarCross RefCross Ref
  34. Xi Li, Weiming Hu, Chunhua Shen, Zhongfei Zhang, Anthony Dick, and Anton Van Den Hengel. 2013. A survey of appearance models in visual object tracking. ACM Trans. Intell. Syst. Technol. 4, 4 (2013), 58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Yang Li and Jianke Zhu. 2014. A scale adaptive kernel correlation filter tracker with feature integration. In Proceedings of the ECCV Workshops. 254–265.Google ScholarGoogle Scholar
  36. Yang Li, Jianke Zhu, and Steven C. H. Hoi. 2015. Reliable patch trackers: Robust visual tracking by exploiting reliable patches. In Proceedings of the CVPR. 353–361.Google ScholarGoogle Scholar
  37. HuaPing Liu and FuChun Sun. 2012. Fusion tracking in color and infrared images using joint sparse representation. Sci. China Inf. Sci. 55, 3 (2012), 590–599.Google ScholarGoogle ScholarCross RefCross Ref
  38. Si Liu, Tianzhu Zhang, Xiaochun Cao, and Changsheng Xu. 2016. Structural correlation filter for robust visual tracking. In Proceedings of the CVPR. 4312–4320.Google ScholarGoogle ScholarCross RefCross Ref
  39. Ting Liu, Gang Wang, and Qingxiong Yang. 2015. Real-time part-based visual tracking via adaptive correlation filters. In Proceedings of the CVPR. 4902–4912.Google ScholarGoogle ScholarCross RefCross Ref
  40. Chao Ma, Jia-Bin Huang, Xiaokang Yang, and Ming-Hsuan Yang. 2015. Hierarchical convolutional features for visual tracking. In Proceedings of the ICCV. 3074–3082. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Chao Ma, Xiaokang Yang, Chongyang Zhang, and Ming-Hsuan Yang. 2015. Long-term correlation tracking. In Proceedings of the CVPR. 5388–5396.Google ScholarGoogle ScholarCross RefCross Ref
  42. Feiping Nie, Jing Li, and Xuelong Li. 2016. Parameter-free auto-weighted multiple graph learning: A framework for multiview clustering and semi-supervised classification. In Proceedings of the IJCAI. 1881–1887. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Yuankai Qi, Shengping Zhang, Lei Qin, Qingming Huang, Hongxun Yao, Jongwoo Lim, and Ming-Hsuan Yang. 2018. Hedging deep features for visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. (2018). DOI:https://doi.org/10.1109/TPAMI.2018.2828817Google ScholarGoogle Scholar
  44. Samuele Salti, Andrea Cavallaro, and Luigi di Stefano. 2012. Adaptive appearance modeling for video tracking: Survey and evaluation. IEEE Trans. Image Process. 21, 10 (2012), 4334–4348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Rui Shao, Xiangyuan Lan, and P. C. Yuen. 2018. Joint discriminative learning of deep dynamic textures for 3D mask face anti-spoofing. IEEE Trans. Inf. Forens. Secur. (2018). DOI:10.1109/TIFS.2018.2868230Google ScholarGoogle Scholar
  46. Yibing Song, Linchao Bao, Shengfeng He, Qingxiong Yang, and Ming-Hsuan Yang. 2017. Stylizing face images via multiple exemplars. Comput. Vis. Image Underst. 162 (2017), 135–145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. James S. Supancic and Deva Ramanan. 2013. Self-paced learning for long-term tracking. In Proceedings of the CVPR. 2379–2386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Ming Tang and Jiayi Feng. 2015. Multi-kernel correlation filter for visual tracking. In Proceedings of the ICCV. 3038–3046. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Yi Wu, Erik Blasch, Genshe Chen, Li Bai, and Haibin Ling. 2011. Multiple source data fusion via sparse representation for robust visual tracking. In Proceedings of the FUSION Conference. 1–8.Google ScholarGoogle Scholar
  50. Mang Ye, Yi Cheng, Xiangyuan Lan, and Hongyuan Zhu. 2020. Improving night-time pedestrian retrieval with distribution alignment and contextual distance. IEEE Trans. Ind. Inform. 16, 1 (2020), 615–624.Google ScholarGoogle ScholarCross RefCross Ref
  51. Mang Ye, Xiangyuan Lan, Qingming Leng, and Jianbing Shen. 2020. Cross-modality person re-identification via modality-aware collaborative ensemble learning. IEEE Trans. Image Process. 29 (2020), 9387–9399.Google ScholarGoogle ScholarCross RefCross Ref
  52. Mang Ye, Xiangyuan Lan, Jiawei Li, and Pong C. Yuen. 2018. Hierarchical discriminative learning for visible thermal person re-identification. In Proceedings of the AAAI.Google ScholarGoogle Scholar
  53. Mang Ye, Xiangyuan Lan, Zheng Wang, and Pong C. Yuen. 2020. Bi-directional center-constrained top-ranking for visible thermal person re-identification. IEEE Trans. Inf. Forens. Secur. 15 (2020), 407–419.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Mang Ye, Zheng Wang, Xiangyuan Lan, and Pong C. Yuen. 2018. Visible thermal person re-identification via dual-constrained top-ranking. In Proceedings of the IJCAI. 1092–1099. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Jianming Zhang, Shugao Ma, and Stan Sclaroff. 2014. MEEM: Robust tracking via multiple experts using entropy minimization. In Proceedings of the ECCV. 188–203.Google ScholarGoogle ScholarCross RefCross Ref
  56. Kaihua Zhang, Lei Zhang, Qingshan Liu, David Zhang, and Ming-Hsuan Yang. 2014. Fast visual tracking via dense spatio-temporal context learning. In Proceedings of the ECCV. 127–141.Google ScholarGoogle ScholarCross RefCross Ref
  57. Kaihua Zhang, Lei Zhang, and Ming-Hsuan Yang. 2014. Fast compressive tracking. IEEE Trans. Pattern Anal. Mach. Intell. 36, 10 (2014), 2002–2015.Google ScholarGoogle ScholarCross RefCross Ref
  58. Kaihua Zhang, Lei Zhang, and Ming-Hsuan Yang. 2014. Fast compressive tracking. IEEE Trans. Pattern Anal. Mach. Intell. 36, 10 (2014), 2002–2015.Google ScholarGoogle ScholarCross RefCross Ref
  59. Shengping Zhang, Xiangyuan Lan, Yuankai Qi, and Pong C. Yuen. 2017. Robust visual tracking via basis matching. IEEE Trans. Circ. Syst. Vid. Technol. 27, 3 (2017), 421–430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Shengping Zhang, Xiangyuan Lan, Hongxun Yao, Hhuiyu Zhou, Dacheng Tao, and Xxuelong Li. 2017. A biologically inspired appearance model for robust visual tracking. IEEE Trans. Neural Netw. Learn. Syst. 28, 10 (2017), 2357–2370.Google ScholarGoogle ScholarCross RefCross Ref
  61. Shengping Zhang, Hongxun Yao, Xin Sun, and Xiusheng Lu. 2013. Sparse coding based visual tracking: Review and experimental comparison. Pattern Recog. 46, 7 (2013), 1772–1788. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Qinqin Zhou, Bineng Zhong, Xiangyuan Lan, Gan Sun, Yulun Zhang, Baochang Zhang, and Rongrong Ji. 2020. Fine-grained spatial alignment model for person re-identification with focal triplet loss. IEEE Trans. Image Process. 29 (2020), 7578–7589.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Spatial-temporal Regularized Multi-modality Correlation Filters for Tracking with Re-detection

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 2
      May 2021
      410 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3461621
      Issue’s Table of Contents

      Copyright © 2021 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 May 2021
      • Accepted: 1 October 2020
      • Revised: 1 October 2019
      • Received: 1 May 2019
      Published in tomm Volume 17, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!