Abstract
The RGB-D cross-modal person re-identification (re-id) task aims to identify the person of interest across the RGB and depth image modes. The tremendous discrepancy between these two modalities makes this task difficult to tackle. Few researchers pay attention to this task, and the deep networks of existing methods still cannot be trained in an end-to-end manner. Therefore, this article proposes an end-to-end module for RGB-D cross-modal person re-id. This network introduces a cross-modal relational branch to narrow the gaps between two heterogeneous images. It models the abundant correlations between any cross-modal sample pairs, which are constrained by heterogeneous interactive learning. The proposed network also exploits a dual-modal local branch, which aims to capture the common spatial contexts in two modalities. This branch adopts shared attentive pooling and mutual contextual graph networks to extract the spatial attention within each local region and the spatial relations between distinct local parts, respectively. Experimental results on two public benchmark datasets, that is, the BIWI and RobotPKU datasets, demonstrate that our method is superior to the state-of-the-art. In addition, we perform thorough experiments to prove the effectiveness of each component in the proposed method.
- [1] . 2013. Learning articulated body models for people re-identification. In Proceedings of the 21st ACM International Conference on Multimedia, Barcelona, Spain. ACM, 557–560.Google Scholar
Digital Library
- [2] . 2017. Beyond triplet loss: A deep quadruplet network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA. IEEE, 403–412.Google Scholar
Cross Ref
- [3] . 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248–255.Google Scholar
Cross Ref
- [4] . 2019. SphereReID: Deep hypersphere manifold embedding for person re-identification. Journal of Visual Communication and Image Representation 60 (2019), 51–58.Google Scholar
Digital Library
- [5] . 2019. Learning modality-specific representations for visible-infrared person re-identification. IEEE Transactions on Image Processing 29 (2019), 579–590.Google Scholar
Digital Library
- [6] . 2019. Beyond human parts: Dual part-aligned representations for person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3642–3651.Google Scholar
Cross Ref
- [7] . 2019. RGB-depth cross-modal person re-identification. In 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS’19). IEEE, 1–8.Google Scholar
Cross Ref
- [8] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google Scholar
Cross Ref
- [9] . 2017. In defense of the triplet loss for person re-identification. arXiv:1703.07737Google Scholar
- [10] . 2020. A cross-modal multi-granularity attention network for RGB-IR person re-identification. Neurocomputing 406 (2020), 59–67.Google Scholar
Cross Ref
- [11] . 2020. Modality-correlation-aware sparse representation for RGB-infrared object tracking. Pattern Recognition Letters 130 (2020), 12–20.Google Scholar
Digital Library
- [12] . 2020. Part-based structured representation learning for person re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 4 (2020), 1–22.Google Scholar
Digital Library
- [13] . 2020. Spatial preserved graph convolution networks for person re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 1s (2020), 1–14.Google Scholar
Digital Library
- [14] . 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2197–2206.Google Scholar
Cross Ref
- [15] . 2019. Improving person re-identification by attribute and identity learning. Pattern Recognition 95 (2019), 151–161. Google Scholar
Digital Library
- [16] . 2014. Person re-identification by iterative re-weighted sparse ranking. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 8 (2014), 1629–1642.Google Scholar
Digital Library
- [17] . 2017. Online RGB-D person re-identification based on metric model update. CAAI Transactions on Intelligence Technology 2, 1 (2017), 48–55.Google Scholar
Cross Ref
- [18] . 2020. Deep representation learning on long-tailed data: A learnable embedding augmentation perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2970–2979.Google Scholar
Cross Ref
- [19] . 2016. Large-margin Softmax loss for convolutional neural networks.. In International Conference on Machine Learning, New York City, NY, USA, Vol. 2. 7. Microtome Publishing, 507–516.Google Scholar
- [20] . 2019. Bag of tricks and a strong baseline for deep person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA. IEEE, 0–0.Google Scholar
Cross Ref
- [21] . 2013. Tri-modal person re-identification with RGB, depth and thermal features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Portland, OR, USA. IEEE, 301–307.Google Scholar
Digital Library
- [22] . 2013. Multimodal person re-identification using RGB-D sensors and a transient identification database. In 2013 International Workshop on Biometrics and Forensics (IWBF), Lisbon, Portugal. IEEE, 1–4.Google Scholar
- [23] . 2014. One-shot person re-identification with a consumer depth camera. In Person Re-Identification, Shaogang Gong, Marco Cristani, Shuicheng Yan, and Chen Change Loy (Eds.). Springer, 161–181.Google Scholar
Cross Ref
- [24] . 2015. Multimodal person reidentification using RGB-D cameras. IEEE Transactions on Circuits and Systems for Video Technology 26, 4 (2015), 788–799.Google Scholar
Digital Library
- [25] . 2017. Multi-scale deep learning architectures for person re-identification. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy. IEEE, 5399–5408.Google Scholar
Cross Ref
- [26] . 2018. YOLOv3: An incremental improvement. arXiv:1804.02767Google Scholar
- [27] . 2016. Embedding deep metric for person re-identification: A study against large variations. In European Conference on Computer Vision, Amsterdam, Netherlands. Springer, 732–748.Google Scholar
Cross Ref
- [28] . 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556Google Scholar
- [29] . 2017. Pose-driven deep convolutional model for person re-identification. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy. IEEE, 3960–3969.Google Scholar
Cross Ref
- [30] . 2020. Circle loss: A unified perspective of pair similarity optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA. IEEE, 6398–6407.Google Scholar
Cross Ref
- [31] . 2017. SVDnet for pedestrian retrieval. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy. IEEE, 3800–3808.Google Scholar
Cross Ref
- [32] . 2018. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Proceedings of the European Conference on Computer Vision (ECCV’18), Munich, Germany. Springer, 480–496.Google Scholar
Cross Ref
- [33] . 2016. Joint learning of single-image and cross-image representations for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA. IEEE, 1288–1296.Google Scholar
Cross Ref
- [34] . 2019. RGB-infrared cross-modality person re-identification via joint pixel and feature alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South). IEEE, 3623–3632.Google Scholar
Cross Ref
- [35] . 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA. IEEE, 7794–7803.Google Scholar
Cross Ref
- [36] . 2018. Deep cosine metric learning for person re-identification. In IEEE Winter Conference on Applications of Computer Vision (WACV’18), Lake Tahoe, NV/CA, USA. IEEE, 748–756.Google Scholar
Cross Ref
- [37] . 2017. Robust depth-based person re-identification. IEEE Transactions on Image Processing 26, 6 (2017), 2588–2603.Google Scholar
Digital Library
- [38] . 2017. RGB-infrared cross-modality person re-identification. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy. IEEE, 5380–5389.Google Scholar
Cross Ref
- [39] . 2016. Personnet: Person re-identification with deep convolutional neural networks. arXiv:1601.07255Google Scholar
- [40] . 2019. Progressive learning for person re-identification with one example. IEEE Transactions on Image Processing 28, 6 (
June 2019), 2872–2881. Google ScholarCross Ref
- [41] . 2015. Distance metric learning using privileged information for face verification and person re-identification. IEEE Transactions on Neural Networks and Learning Systems 26, 12 (2015), 3150–3162.Google Scholar
Cross Ref
- [42] . 2020. Dynamic dual-attentive aggregation learning for visible-infrared person re-identification. In Proceedings of the 16th European Conference on Computer Vision (ECCV 2020), Glasgow, UK, August 23–28, 2020, Part XVII. Springer, 229–247.Google Scholar
Digital Library
- [43] . 2020. Visible-infrared person re-identification via homogeneous augmented tri-modal learning. IEEE Transactions on Information Forensics and Security 16 (2020), 728–739.Google Scholar
Cross Ref
- [44] . 2021. Multi-scale cascading network with compact feature learning for RGB-infrared person re-identification. In 25th International Conference on Pattern Recognition (ICPR’20), Milan, Italy. IEEE, 8679–8686.Google Scholar
Cross Ref
- [45] . 2019. Top-push constrained modality-adaptive dictionary learning for cross-modality person re-identification. IEEE Transactions on Circuits and Systems for Video Technology 30, 12 (2019), 4554–4566.Google Scholar
Digital Library
- [46] . 2017. AlignedReID: Surpassing human-level performance in person re-identification. arXiv:1711.08184Google Scholar
- [47] . 2017. Deeply-learned part-aligned representations for person re-identification. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy. IEEE, 3219–3228.Google Scholar
Cross Ref
- [48] . 2017. A discriminatively learned CNN embedding for person reidentification. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 1 (2017), 1–20.Google Scholar
Digital Library
- [49] . 2017. Person re-identification on heterogeneous camera network. In CCF Chinese Conference on Computer Vision, Tianjin, China. Springer, 280–291.Google Scholar
Cross Ref
Index Terms
An End-to-end Heterogeneous Restraint Network for RGB-D Cross-modal Person Re-identification
Recommendations
A Local-Global Self-attention Interaction Network for RGB-D Cross-Modal Person Re-identification
Pattern Recognition and Computer VisionAbstractRGB-D cross-modal person re-identification (Re-ID) task aims to match the person images between the RGB and depth modalities. This task is rather challenging for the tremendous discrepancy between these two modalities in addition to common issues ...
Automatic inference of cross-modal nonverbal interactions in multiparty conversations: "who responds to whom, when, and how?" from gaze, head gestures, and utterances
ICMI '07: Proceedings of the 9th international conference on Multimodal interfacesA novel probabilistic framework is proposed for analyzing cross-modal nonverbal interactions in multiparty face-to-face conversations. The goal is to determine "who responds to whom, when, and how" from multimodal cues including gaze, head gestures, and ...
Deep multi-instance learning for end-to-end person re-identification
In this paper, we introduce a deep multi-instance learning framework to boost the instance-level person re-identification performance. Motivated by the observation of considerably dramatic and complex varieties of visual appearances in many current ...






Comments