skip to main content
research-article

Improving Feature Discrimination for Object Tracking by Structural-similarity-based Metric Learning

Authors Info & Claims
Published:04 March 2022Publication History
Skip Abstract Section

Abstract

Existing approaches usually form the tracking task as an appearance matching procedure. However, the discrimination ability of appearance features is insufficient in these trackers, which is caused by their weak feature supervision constraints and inadequate exploitation of spatial contexts. To tackle this issue, this article proposes a novel appearance matching tracking (AMT) method to strengthen the feature restraints and capture discriminative spatial representations. Specifically, we first utilize a triplet structural loss function, which improves the learning capability of features by applying a structural similarity constraint with a triplet metric format on the features. It leverages feature statistics to capture the complex interactions of visual parts. Second, we put forward an adaptive matching module that exploits the dual spatial enhancement module to reinforce target feature discrimination. This not only boosts the representation ability of spatial context but also realizes spatially dynamic feature selection by attending to target deformation information. Moreover, this model introduces a simple but effective matching unit to intuitively evaluate the relative appearance differences between the target and the proposals. In addition, with the obtained discriminative features, AMT is capable of providing precise localization for the target. Therefore, the impact of spatial suppression imposed by window functions can be alleviated, allowing for effective tracking of high-speed moving objects. Extensive experiments prove that AMT outperforms state-of-the-art methods on six public datasets and demonstrate the effectiveness of each component in AMT.

REFERENCES

  1. [1] Bertinetto Luca, Valmadre Jack, Henriques Joao F., Vedaldi Andrea, and Torr Philip H. S.. 2016. Fully convolutional Siamese networks for object tracking. In Proceedings of the European Conference on Computer Vision. Springer, 850865.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Bhat Goutam, Danelljan Martin, Gool Luc Van, and Timofte Radu. 2019. Learning discriminative model prediction for tracking. In Proceedings of the IEEE International Conference on Computer Vision. 61826191.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Cao Yue, Xu Jiarui, Lin Stephen, Wei Fangyun, and Hu Han. 2019. Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 00.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Chen Ke, Zhou Zhong, and Wu Wei. 2015. Progressive motion vector clustering for motion estimation and auxiliary tracking. ACM Trans. Multimedia Comput., Commun. Appl. 11, 3 (2015), 123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Danelljan Martin, Bhat Goutam, Khan Fahad Shahbaz, and Felsberg Michael. 2019. Atom: Accurate tracking by overlap maximization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 46604669.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Danelljan Martin, Gool Luc Van, and Timofte Radu. 2020. Probabilistic regression for visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 71837192.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Danelljan Martin, Robinson Andreas, Khan Fahad Shahbaz, and Felsberg Michael. 2016. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In Proceedings of the European Conference on Computer Vision. Springer, 472488.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Deng J., Dong W., Socher R., Li L., Li Kai, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248255. Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Dong Xingping and Shen Jianbing. 2018. Triplet loss in Siamese network for object tracking. In Proceedings of the European Conference on Computer Vision (ECCV’18). 459474.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Du Fei, Liu Peng, Zhao Wei, and Tang Xianglong. 2020. Correlation-guided attention for corner detection-based visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 68366845.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Fan Heng, Lin Liting, Yang Fan, Chu Peng, Deng Ge, Yu Sijia, Bai Hexin, Xu Yong, Liao Chunyuan, and Ling Haibin. 2019. Lasot: A high-quality benchmark for large-scale single object tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 53745383.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Fan Heng and Ling Haibin. 2019. Siamese cascaded region proposal networks for real-time visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 79527961.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Guo Changyong, Zhang Zhaoxin, Li Jinjiang, Jiang Xuesong, and Zhang Lei. 2020. Robust visual tracking using kernel sparse coding on multiple covariance descriptors. ACM Trans. Multimedia Comput., Commun. Appl. 16, 1s (2020), 122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] He Anfeng, Luo Chong, Tian Xinmei, and Zeng Wenjun. 2018. Towards a better match in Siamese network-based visual object tracker. In Proceedings of the European Conference on Computer Vision (ECCV’18). 00.Google ScholarGoogle Scholar
  15. [15] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Hu Jie, Shen Li, and Sun Gang. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 71327141.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Huang Lianghua, Zhao Xin, and Huang Kaiqi. 2018. Got-10k: A large high-diversity benchmark for generic object tracking in the wild. Retrieved from https://arXiv:1810.11981.Google ScholarGoogle Scholar
  18. [18] Jiang Borui, Luo Ruixuan, Mao Jiayuan, Xiao Tete, and Jiang Yuning. 2018. Acquisition of localization confidence for accurate object detection. In Proceedings of the European Conference on Computer Vision (ECCV’18). 784799.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Galoogahi Hamed Kiani, Fagg Ashton, Huang Chen, Ramanan Deva, and Lucey Simon. 2017. Need for speed: A benchmark for higher frame rate object tracking. In Proceedings of the IEEE International Conference on Computer Vision. 11251134.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Kingma Diederik P. and Ba Jimmy. 2014. Adam: A method for stochastic optimization. Retrieved from https://arXiv:1412.6980.Google ScholarGoogle Scholar
  21. [21] Kristan Matej, Leonardis Ales, Matas Jiri, Felsberg Michael, Pflugfelder Roman, Zajc Luka Cehovin, Vojir Tomas, Bhat Goutam, Lukezic Alan, Eldesokey Abdelrahman, et al. 2018. The sixth visual object tracking vot2018 challenge results. In Proceedings of the European Conference on Computer Vision (ECCV’18). 00.Google ScholarGoogle Scholar
  22. [22] Li Bo, Wu Wei, Wang Qiang, Zhang Fangyi, Xing Junliang, and Yan Junjie. 2019. Siamrpn++: Evolution of Siamese visual tracking with very deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 42824291.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Li Bo, Yan Junjie, Wu Wei, Zhu Zheng, and Hu Xiaolin. 2018. High performance visual tracking with Siamese region proposal network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 89718980.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Lin Tsung-Yi, Maire Michael, Belongie Serge, Hays James, Perona Pietro, Ramanan Deva, Dollár Piotr, and Zitnick C. Lawrence. 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740755.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Liu Qiao, Li Xin, He Zhenyu, Fan Nana, Yuan Di, Liu Wei, and Liang Yongsheng. 2020. Multi-task driven feature models for thermal infrared tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 1160411611.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Liu Qiao, Li Xin, He Zhenyu, Fan Nana, Yuan Di, and Wang Hongpeng. 2021. Learning deep multi-level similarity for thermal infrared object tracking. IEEE Trans. Multimedia 23 (2021), 21142126. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Liu Qiao, Lu Xiaohuan, He Zhenyu, Zhang Chunkai, and Chen Wen-Sheng. 2017. Deep convolutional neural networks for thermal infrared object tracking. Knowledge-Based Syst. 134 (2017), 189198.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Lu Xiankai, Ma Chao, Ni Bingbing, Yang Xiaokang, Reid Ian, and Yang Ming-Hsuan. 2018. Deep regression tracking with shrinkage loss. In Proceedings of the European Conference on Computer Vision.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Ma Chao, Huang Jia-Bin, Yang Xiaokang, and Yang Ming-Hsuan. 2015. Hierarchical convolutional features for visual tracking. In Proceedings of the IEEE International Conference on Computer Vision. 30743082.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Mueller Matthias, Smith Neil, and Ghanem Bernard. 2016. A benchmark and simulator for uav tracking. In Proceedings of the European Conference on Computer Vision. Springer, 445461.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Nam Hyeonseob and Han Bohyung. 2016. Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 42934302.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Park Sung Cheol, Park Min Kyu, and Kang Moon Gi. 2003. Super-resolution image reconstruction: A technical overview. IEEE Signal Process. Mag. 20, 3 (2003), 2136.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Russakovsky Olga, Deng Jia, Su Hao, Krause Jonathan, Satheesh Sanjeev, Ma Sean, Huang Zhiheng, Karpathy Andrej, Khosla Aditya, Fei-Fei Michael Bernstein, Alexander C. Berg, and Li. 2015. Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115, 3 (2015), 211252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Wang Guangting, Luo Chong, Xiong Zhiwei, and Zeng Wenjun. 2019. SPM-Tracker: Series-parallel matching for real-time visual object tracking. Retrieved from https://arXiv:1904.04452.Google ScholarGoogle Scholar
  35. [35] Wang Qiang, Zhang Li, Bertinetto Luca, Hu Weiming, and Torr Philip H. S.. 2019. Fast online object tracking and segmentation: A unifying approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 13281338.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Wang Xiaolong, Girshick Ross, Gupta Abhinav, and He Kaiming. 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 77947803.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Wang Zhou, Bovik Alan C., Sheikh Hamid R., Simoncelli and Eero P.. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 13, 4 (2004), 600612.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Woo Sanghyun, Park Jongchan, Lee Joonyoung, and Kweon In So. 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV’18). 319.Google ScholarGoogle Scholar
  39. [39] Wu Yi, Lim Jongwoo, and Yang Ming-Hsuan. 2015. Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37, 9 (2015), 18341848.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Yang Tianyu, Xu Pengfei, Hu Runbo, Chai Hua, and Chan Antoni B.. 2020. ROAM: Recurrently optimizing tracking model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 67186727.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Yu Yuechen, Xiong Yilei, Huang Weilin, and Scott Matthew R.. 2020. Deformable Siamese attention networks for visual object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 67286737.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Zhang Zhipeng and Peng Houwen. 2019. Deeper and wider Siamese networks for real-time visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 45914600.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Zhu Bo, Liu Jeremiah Z., Cauley Stephen F., Rosen Bruce R., and Rosen Matthew S.. 2018. Image reconstruction by domain-transform manifold learning. Nature 555, 7697 (2018), 487492.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Zhu Zheng, Wang Qiang, Li Bo, Wu Wei, Yan Junjie, and Hu Weiming. 2018. Distractor-aware Siamese networks for visual object tracking. In Proceedings of the European Conference on Computer Vision (ECCV’18). 101117.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Zolfaghari Mohammadreza, Singh Kamaljeet, and Brox Thomas. 2018. Eco: Efficient convolutional network for online video understanding. In Proceedings of the European Conference on Computer Vision (ECCV’18). 695712.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Improving Feature Discrimination for Object Tracking by Structural-similarity-based Metric Learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 4
        November 2022
        497 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3514185
        • Editor:
        • Abdulmotaleb El Saddik
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 March 2022
        • Accepted: 1 November 2021
        • Revised: 1 September 2021
        • Received: 1 March 2021
        Published in tomm Volume 18, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!