Abstract
Existing approaches usually form the tracking task as an appearance matching procedure. However, the discrimination ability of appearance features is insufficient in these trackers, which is caused by their weak feature supervision constraints and inadequate exploitation of spatial contexts. To tackle this issue, this article proposes a novel appearance matching tracking (AMT) method to strengthen the feature restraints and capture discriminative spatial representations. Specifically, we first utilize a triplet structural loss function, which improves the learning capability of features by applying a structural similarity constraint with a triplet metric format on the features. It leverages feature statistics to capture the complex interactions of visual parts. Second, we put forward an adaptive matching module that exploits the dual spatial enhancement module to reinforce target feature discrimination. This not only boosts the representation ability of spatial context but also realizes spatially dynamic feature selection by attending to target deformation information. Moreover, this model introduces a simple but effective matching unit to intuitively evaluate the relative appearance differences between the target and the proposals. In addition, with the obtained discriminative features, AMT is capable of providing precise localization for the target. Therefore, the impact of spatial suppression imposed by window functions can be alleviated, allowing for effective tracking of high-speed moving objects. Extensive experiments prove that AMT outperforms state-of-the-art methods on six public datasets and demonstrate the effectiveness of each component in AMT.
- [1] . 2016. Fully convolutional Siamese networks for object tracking. In Proceedings of the European Conference on Computer Vision. Springer, 850–865.Google Scholar
Cross Ref
- [2] . 2019. Learning discriminative model prediction for tracking. In Proceedings of the IEEE International Conference on Computer Vision. 6182–6191.Google Scholar
Cross Ref
- [3] . 2019. Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 0–0.Google Scholar
Cross Ref
- [4] . 2015. Progressive motion vector clustering for motion estimation and auxiliary tracking. ACM Trans. Multimedia Comput., Commun. Appl. 11, 3 (2015), 1–23.Google Scholar
Digital Library
- [5] . 2019. Atom: Accurate tracking by overlap maximization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4660–4669.Google Scholar
Cross Ref
- [6] . 2020. Probabilistic regression for visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7183–7192.Google Scholar
Cross Ref
- [7] . 2016. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In Proceedings of the European Conference on Computer Vision. Springer, 472–488.Google Scholar
Cross Ref
- [8] . 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248–255. Google Scholar
Cross Ref
- [9] . 2018. Triplet loss in Siamese network for object tracking. In Proceedings of the European Conference on Computer Vision (ECCV’18). 459–474.Google Scholar
Digital Library
- [10] . 2020. Correlation-guided attention for corner detection-based visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6836–6845.Google Scholar
Cross Ref
- [11] . 2019. Lasot: A high-quality benchmark for large-scale single object tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5374–5383.Google Scholar
Cross Ref
- [12] . 2019. Siamese cascaded region proposal networks for real-time visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7952–7961.Google Scholar
Cross Ref
- [13] . 2020. Robust visual tracking using kernel sparse coding on multiple covariance descriptors. ACM Trans. Multimedia Comput., Commun. Appl. 16, 1s (2020), 1–22.Google Scholar
Digital Library
- [14] . 2018. Towards a better match in Siamese network-based visual object tracker. In Proceedings of the European Conference on Computer Vision (ECCV’18). 0–0.Google Scholar
- [15] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google Scholar
Cross Ref
- [16] . 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.Google Scholar
Cross Ref
- [17] . 2018. Got-10k: A large high-diversity benchmark for generic object tracking in the wild. Retrieved from https://arXiv:1810.11981.Google Scholar
- [18] . 2018. Acquisition of localization confidence for accurate object detection. In Proceedings of the European Conference on Computer Vision (ECCV’18). 784–799.Google Scholar
Cross Ref
- [19] . 2017. Need for speed: A benchmark for higher frame rate object tracking. In Proceedings of the IEEE International Conference on Computer Vision. 1125–1134.Google Scholar
Cross Ref
- [20] . 2014. Adam: A method for stochastic optimization. Retrieved from https://arXiv:1412.6980.Google Scholar
- [21] . 2018. The sixth visual object tracking vot2018 challenge results. In Proceedings of the European Conference on Computer Vision (ECCV’18). 0–0.Google Scholar
- [22] . 2019. Siamrpn++: Evolution of Siamese visual tracking with very deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4282–4291.Google Scholar
Cross Ref
- [23] . 2018. High performance visual tracking with Siamese region proposal network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8971–8980.Google Scholar
Cross Ref
- [24] . 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740–755.Google Scholar
Cross Ref
- [25] . 2020. Multi-task driven feature models for thermal infrared tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 11604–11611.Google Scholar
Cross Ref
- [26] . 2021. Learning deep multi-level similarity for thermal infrared object tracking. IEEE Trans. Multimedia 23 (2021), 2114–2126.
DOI: Google ScholarCross Ref
- [27] . 2017. Deep convolutional neural networks for thermal infrared object tracking. Knowledge-Based Syst. 134 (2017), 189–198.Google Scholar
Cross Ref
- [28] . 2018. Deep regression tracking with shrinkage loss. In Proceedings of the European Conference on Computer Vision.Google Scholar
Cross Ref
- [29] . 2015. Hierarchical convolutional features for visual tracking. In Proceedings of the IEEE International Conference on Computer Vision. 3074–3082.Google Scholar
Digital Library
- [30] . 2016. A benchmark and simulator for uav tracking. In Proceedings of the European Conference on Computer Vision. Springer, 445–461.Google Scholar
Cross Ref
- [31] . 2016. Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4293–4302.Google Scholar
Cross Ref
- [32] . 2003. Super-resolution image reconstruction: A technical overview. IEEE Signal Process. Mag. 20, 3 (2003), 21–36.Google Scholar
Cross Ref
- [33] . 2015. Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115, 3 (2015), 211–252.Google Scholar
Digital Library
- [34] . 2019. SPM-Tracker: Series-parallel matching for real-time visual object tracking. Retrieved from https://arXiv:1904.04452.Google Scholar
- [35] . 2019. Fast online object tracking and segmentation: A unifying approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1328–1338.Google Scholar
Cross Ref
- [36] . 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7794–7803.Google Scholar
Cross Ref
- [37] . 2004. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 13, 4 (2004), 600–612.Google Scholar
Digital Library
- [38] . 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV’18). 3–19.Google Scholar
- [39] . 2015. Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37, 9 (2015), 1834–1848.Google Scholar
Digital Library
- [40] . 2020. ROAM: Recurrently optimizing tracking model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6718–6727.Google Scholar
Cross Ref
- [41] . 2020. Deformable Siamese attention networks for visual object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6728–6737.Google Scholar
Cross Ref
- [42] . 2019. Deeper and wider Siamese networks for real-time visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4591–4600.Google Scholar
Cross Ref
- [43] . 2018. Image reconstruction by domain-transform manifold learning. Nature 555, 7697 (2018), 487–492.Google Scholar
Cross Ref
- [44] . 2018. Distractor-aware Siamese networks for visual object tracking. In Proceedings of the European Conference on Computer Vision (ECCV’18). 101–117.Google Scholar
Cross Ref
- [45] . 2018. Eco: Efficient convolutional network for online video understanding. In Proceedings of the European Conference on Computer Vision (ECCV’18). 695–712.Google Scholar
Cross Ref
Index Terms
Improving Feature Discrimination for Object Tracking by Structural-similarity-based Metric Learning
Recommendations
Joint feature correspondences and appearance similarity for robust visual object tracking
A novel visual object tracking scheme is proposed by using joint point feature correspondences and object appearance similarity. For point feature-based tracking, we propose a candidate tracker that simultaneously exploits two separate sets of point ...
Visual Object Tracking Based on Mean-shift and Particle-Kalman Filter
Even though many algorithms have been developed and many applications of object tracking have been made, object tracking is still considered as a difficult task to accomplish. The existence of several problems such as illumination variation, tracking ...






Comments