Abstract
Video-based person re-identification (ReID) is challenging due to the presence of various interferences in video frames. Recent approaches handle this problem using temporal aggregation strategies. In this work, we propose a novel Context Sensing Attention Network (CSA-Net), which improves both the frame feature extraction and temporal aggregation steps. First, we introduce the Context Sensing Channel Attention (CSCA) module, which emphasizes responses from informative channels for each frame. These informative channels are identified with reference not only to each individual frame, but also to the content of the entire sequence. Therefore, CSCA explores both the individuality of each frame and the global context of the sequence. Second, we propose the Contrastive Feature Aggregation (CFA) module, which predicts frame weights for temporal aggregation. Here, the weight for each frame is determined in a contrastive manner: i.e., not only by the quality of each individual frame, but also by the average quality of the other frames in a sequence. Therefore, it effectively promotes the contribution of relatively good frames. Extensive experimental results on four datasets show that CSA-Net consistently achieves state-of-the-art performance.
- [1] . 2021. Spatio-temporal representation factorization for video-based person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 152–162.Google Scholar
Cross Ref
- [2] . 2022. Saliency and granularity: Discovering temporal coherence for video-based person re-identification. IEEE Transactions on Circuits and Systems for Video Technology 32, 9 (2022), 6100–6112.
DOI: Google ScholarCross Ref
- [3] . 2018. Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1169–1178.Google Scholar
Cross Ref
- [4] . 2020. Temporal coherence or temporal motion: Which is more critical for video-based person re-identification? In Proceedings of the European Conference on Computer Vision. Springer, 660–676.Google Scholar
Digital Library
- [5] . 2020. Frame-guided region-aligned representation for video person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence. 10591–10598.Google Scholar
Cross Ref
- [6] . 2022. Multi-task learning with coarse priors for robust part-aware person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 3 (2022), 1474–1488.Google Scholar
Cross Ref
- [7] . 2021. Video-based person re-identification with spatial and temporal memory networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12036–12045.Google Scholar
Cross Ref
- [8] . 2018. Unsupervised person re-identification: Clustering and fine-tuning. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 4 (2018), 1–18.Google Scholar
Digital Library
- [9] . 2021. Set augmented triplet loss for video person re-identification. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 464–473.Google Scholar
Cross Ref
- [10] . 2009. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 9 (2009), 1627–1645.Google Scholar
Digital Library
- [11] . 2019. Sta: Spatial-temporal attention for large-scale video-based person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence. 8287–8294.Google Scholar
Digital Library
- [12] . 2021. MSO: Multi-feature space joint optimization network for RGB-infrared person re-identification. In Proceedings of the 29th ACM International Conference on Multimedia. 5257–5265.Google Scholar
Digital Library
- [13] . 2021. Cross-camera feature prediction for intra-camera supervised person re-identification across distant scenes. In Proceedings of the 29th ACM International Conference on Multimedia. 3644–3653.Google Scholar
Digital Library
- [14] . 2022. Motion feature aggregation for video-based person re-identification. IEEE Transactions on Image Processing 31 (2022), 3908–3919.
DOI: Google ScholarDigital Library
- [15] . 2020. Appearance-preserving 3d convolution for video-based person re-identification. In Proceedings of the European Conference on Computer Vision. Springer, 228–243.Google Scholar
Digital Library
- [16] . 2019. Temporal knowledge propagation for image-to-video person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9647–9656.Google Scholar
Cross Ref
- [17] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 770–778.Google Scholar
Cross Ref
- [18] . 2021. BiCnet-TKS: Learning efficient spatial-temporal representation for video person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2014–2023.Google Scholar
Cross Ref
- [19] . 2020. Temporal complementary learning for video person re-identification. In Proceedings of the European Conference on Computer Vision. Springer, 388–405.Google Scholar
Digital Library
- [20] . 2019. Interaction-and-aggregation network for person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9317–9326.Google Scholar
Cross Ref
- [21] . 2019. VRSTC: Occlusion-free video person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7183–7192.Google Scholar
Cross Ref
- [22] . 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7132–7141.Google Scholar
Cross Ref
- [23] . 2022. Triplet ratio loss for robust person re-identification. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision. Springer, 42–54.Google Scholar
Digital Library
- [24] . 2019. Global-local temporal representations for video person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3958–3967.Google Scholar
Cross Ref
- [25] . 2019. Multi-scale 3d convolution network for video based person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 8618–8625.Google Scholar
Digital Library
- [26] . 2020. Multi-scale temporal cues learning for video person re-identification. IEEE Transactions on Image Processing 29 (2020), 4461–4473.
DOI: Google ScholarCross Ref
- [27] . 2020. Temporal aggregation with clip-level attention for video-based person re-identification. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.Google Scholar
Cross Ref
- [28] . 2018. Diversity regularized spatiotemporal attention for video-based person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 369–378.Google Scholar
Cross Ref
- [29] . 2020. Spatial preserved graph convolution networks for person re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 1s (2020), 1–14.Google Scholar
Digital Library
- [30] . 2018. Video-based person re-identification with accumulative motion context. IEEE Transactions on Circuits and Systems for Video Technology 28, 10 (2018), 2788–2802.Google Scholar
Digital Library
- [31] . 2019. Dense 3D-convolutional neural network for person re-identification in videos. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 1s (2019), 1–19.Google Scholar
Digital Library
- [32] . 2021. Viewing from frequency domain: A DCT-based information enhancement network for video person re-identification. In Proceedings of the 29th ACM International Conference on Multimedia. 227–235.Google Scholar
Digital Library
- [33] . 2021. Watching you: Global-guided reciprocal learning for video-based person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13334–13343.Google Scholar
Cross Ref
- [34] . 2017. Quality aware network for set to set recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5790–5799.Google Scholar
Cross Ref
- [35] . 2019. Spatial and temporal mutual promotion for video-based person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 8786–8793.Google Scholar
Digital Library
- [36] . 2020. Video person re-identification using learned clip similarity aggregation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2655–2664.Google Scholar
Cross Ref
- [37] . 2016. Recurrent convolutional network for video-based person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1325–1334.Google Scholar
Cross Ref
- [38] . 2022. Fully unsupervised person re-identification via selective contrastive learning. ACM Transactions on Multimedia Computing, Communications, and Applications 18, 2 (2022), 1–15.Google Scholar
Digital Library
- [39] . 2021. Fcanet: Frequency channel attention networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 783–792.Google Scholar
Cross Ref
- [40] . 2020. Exploiting temporal coherence for self-supervised one-shot video re-identification. In Proceedings of the European Conference on Computer Vision. Springer, 258–274.Google Scholar
Digital Library
- [41] . 2016. Performance measures and a data set for multi-target, multi-camera tracking. In Proceedings of the European Conference on Computer Vision. Springer, 17–35.Google Scholar
Cross Ref
- [42] . 2020. Correlation discrepancy insight network for video re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 4 (2020), 1–21.Google Scholar
Digital Library
- [43] . 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 815–823.Google Scholar
Cross Ref
- [44] . 2019. Multi-level similarity perception network for person re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 2 (2019), 1–19.Google Scholar
Digital Library
- [45] . 2018. Region-based quality estimation network for large-scale person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google Scholar
Cross Ref
- [46] . 2019. Co-segmentation inspired attention networks for video-based person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 562–572.Google Scholar
Cross Ref
- [47] . 2018. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Proceedings of the European Conference on Computer Vision. 480–496.Google Scholar
Digital Library
- [48] . 2013. On the importance of initialization and momentum in deep learning. In Proceedings of the International Conference on Machine Learning. PMLR, 1139–1147.Google Scholar
Digital Library
- [49] . 2022. Harmonious multi-branch network for person re-identification with harder triplet loss. ACM Transactions on Multimedia Computing, Communications, and Applications 18, 4 (2022), 1–21.Google Scholar
Digital Library
- [50] . 2021. IPGN: Interactiveness proposal graph network for human-object interaction detection. IEEE Transactions on Image Processing 30 (2021), 6583–6593.
DOI: Google ScholarDigital Library
- [51] . 2020. Simple and effective: Spatial rescaling for person reidentification. IEEE Transactions on Neural Networks and Learning Systems 33, 1 (2020), 145–156.
DOI: Google ScholarCross Ref
- [52] . 2021. AMC-net: Attentive modality-consistent network for visible-infrared person re-identification. Neurocomputing 463 (2021), 226–236.
DOI: Google ScholarDigital Library
- [53] . 2020. CDPM: Convolutional deformable part models for semantically aligned person re-identification. IEEE Transactions on Image Processing 29 (2020), 3416–3428.
DOI: Google ScholarDigital Library
- [54] . 2022. RA loss: Relation-aware loss for robust person re-identification. In Proceedings of the Asian Conference on Computer Vision. 177–194.Google Scholar
- [55] . 2021. Batch coherence-driven network for part-aware person re-identification. IEEE Transactions on Image Processing 30 (2021), 3405–3418.
DOI: Google ScholarDigital Library
- [56] . 2022. Quality-aware part models for occluded person re-identification. IEEE Transactions on Multimedia (2022).
DOI: Google ScholarCross Ref
- [57] . 2014. Person re-identification by video ranking. In Proceedings of the European Conference on Computer Vision. Springer, 688–703.Google Scholar
Cross Ref
- [58] . 2017. Non-local neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7794–7803.Google Scholar
- [59] . 2018. Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5177–5186.Google Scholar
Cross Ref
- [60] . 2017. Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In Proceedings of the IEEE International Conference on Computer Vision. 4733–4742.Google Scholar
Cross Ref
- [61] . 2022. BiRe-ID: Binary neural network for efficient person re-ID. ACM Transactions on Multimedia Computing, Communications, and Applications 18, 1s (2022), 1–22.Google Scholar
Digital Library
- [62] . 2020. Learning multi-granular hypergraphs for video-based person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2899–2908.Google Scholar
Cross Ref
- [63] . 2020. Spatial-temporal graph convolutional network for video-based person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3289–3299.Google Scholar
Cross Ref
- [64] . 2017. Enhancing person re-identification in a self-trained subspace. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 3 (2017), 1–23.Google Scholar
Digital Library
- [65] . 2021. Hat: Hierarchical aggregation transformers for person re-identification. In Proceedings of the 29th ACM International Conference on Multimedia. 516–525.Google Scholar
Digital Library
- [66] . 2021. Pixel-wise graph attention networks for person re-identification. In Proceedings of the 29th ACM International Conference on Multimedia. 5231–5238.Google Scholar
Digital Library
- [67] . 2020. Multi-granularity reference-aided attentive feature aggregation for video-based person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10407–10416.Google Scholar
Cross Ref
- [68] . 2019. Attribute-driven feature disentangling and temporal aggregation for video person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4913–4922.Google Scholar
Cross Ref
- [69] . 2016. Mars: A video benchmark for large-scale person re-identification. In Proceedings of the European Conference on Computer Vision. Springer, 868–884.Google Scholar
Cross Ref
- [70] . 2015. Scalable person re-identification: A benchmark. In Proceedings of the IEEE International Conference on Computer Vision. 1116–1124.Google Scholar
Digital Library
- [71] . 2019. Re-identification with consistent attentive siamese networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5735–5744.Google Scholar
Digital Library
- [72] . 2017. A discriminatively learned cnn embedding for person reidentification. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 1 (2017), 1–20.Google Scholar
Digital Library
- [73] . 2020. Random erasing data augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence. 13001–13008.Google Scholar
Cross Ref
- [74] . 2017. See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4747–4756.Google Scholar
Cross Ref
Index Terms
Context Sensing Attention Network for Video-based Person Re-identification
Recommendations
Temporal-Consistent Visual Clue Attentive Network for Video-Based Person Re-Identification
ICMR '22: Proceedings of the 2022 International Conference on Multimedia RetrievalVideo-based person re-identification (ReID) aims to match video trajectories of pedestrians across multi-view cameras and has important applications in criminal investigation and intelligent surveillance. Compared with single image re-identification, ...
Learning discriminative features with a dual-constrained guided network for video-based person re-identification
AbstractVideo-based person re-identification (ReID) aims at matching pedestrians in a large video gallery across different cameras. However, some interference factors in most real-world scenarios, such as occlusion, pose variations and new appearances, ...
Video-Based Convolutional Attention for Person Re-Identification
Image Analysis and Processing – ICIAP 2019AbstractIn this paper we consider the problem of video-based person re-identification, which is the task of associating videos of the same person captured by different and non-overlapping cameras. We propose a Siamese framework in which video frames of ...






Comments