skip to main content
research-article

An End-to-end Heterogeneous Restraint Network for RGB-D Cross-modal Person Re-identification

Authors Info & Claims
Published:04 March 2022Publication History
Skip Abstract Section

Abstract

The RGB-D cross-modal person re-identification (re-id) task aims to identify the person of interest across the RGB and depth image modes. The tremendous discrepancy between these two modalities makes this task difficult to tackle. Few researchers pay attention to this task, and the deep networks of existing methods still cannot be trained in an end-to-end manner. Therefore, this article proposes an end-to-end module for RGB-D cross-modal person re-id. This network introduces a cross-modal relational branch to narrow the gaps between two heterogeneous images. It models the abundant correlations between any cross-modal sample pairs, which are constrained by heterogeneous interactive learning. The proposed network also exploits a dual-modal local branch, which aims to capture the common spatial contexts in two modalities. This branch adopts shared attentive pooling and mutual contextual graph networks to extract the spatial attention within each local region and the spatial relations between distinct local parts, respectively. Experimental results on two public benchmark datasets, that is, the BIWI and RobotPKU datasets, demonstrate that our method is superior to the state-of-the-art. In addition, we perform thorough experiments to prove the effectiveness of each component in the proposed method.

REFERENCES

  1. [1] Baltieri Davide, Vezzani Roberto, and Cucchiara Rita. 2013. Learning articulated body models for people re-identification. In Proceedings of the 21st ACM International Conference on Multimedia, Barcelona, Spain. ACM, 557560.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Chen Weihua, Chen Xiaotang, Zhang Jianguo, and Huang Kaiqi. 2017. Beyond triplet loss: A deep quadruplet network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA. IEEE, 403412.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Deng Jia, Dong Wei, Socher Richard, Li Li-Jia, Li Kai, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248255.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Fan Xing, Jiang Wei, Luo Hao, and Fei Mengjuan. 2019. SphereReID: Deep hypersphere manifold embedding for person re-identification. Journal of Visual Communication and Image Representation 60 (2019), 5158.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Feng Zhanxiang, Lai Jianhuang, and Xie Xiaohua. 2019. Learning modality-specific representations for visible-infrared person re-identification. IEEE Transactions on Image Processing 29 (2019), 579590.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Guo Jianyuan, Yuan Yuhui, Huang Lang, Zhang Chao, Yao Jin-Ge, and Han Kai. 2019. Beyond human parts: Dual part-aligned representations for person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 36423651.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Hafner Frank M., Bhuiyan Amran, Kooij Julian F. P., and Granger Eric. 2019. RGB-depth cross-modal person re-identification. In 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS’19). IEEE, 18.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Hermans Alexander, Beyer Lucas, and Leibe Bastian. 2017. In defense of the triplet loss for person re-identification. arXiv:1703.07737Google ScholarGoogle Scholar
  10. [10] Jiang Jianguo, Jin Kaiyuan, Qi Meibin, Wang Qian, Wu Jingjing, and Chen Cuiqun. 2020. A cross-modal multi-granularity attention network for RGB-IR person re-identification. Neurocomputing 406 (2020), 5967.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Lan Xiangyuan, Ye Mang, Zhang Shengping, Zhou Huiyu, and Yuen Pong C.. 2020. Modality-correlation-aware sparse representation for RGB-infrared object tracking. Pattern Recognition Letters 130 (2020), 1220.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Li Yaoyu, Yao Hantao, Zhang Tianzhu, and Xu Changsheng. 2020. Part-based structured representation learning for person re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 4 (2020), 122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Li Zhaoju, Zhou Zongwei, Jiang Nan, Han Zhenjun, Xing Junliang, and Jiao Jianbin. 2020. Spatial preserved graph convolution networks for person re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 1s (2020), 114.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Liao Shengcai, Hu Yang, Zhu Xiangyu, and Li Stan Z.. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 21972206.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Lin Yutian, Zheng Liang, Zheng Zhedong, Wu Yu, Hu Zhilan, Yan Chenggang, and Yang Yi. 2019. Improving person re-identification by attribute and identity learning. Pattern Recognition 95 (2019), 151161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Lisanti Giuseppe, Masi Iacopo, Bagdanov Andrew D., and Bimbo Alberto Del. 2014. Person re-identification by iterative re-weighted sparse ranking. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 8 (2014), 16291642.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Liu Hong, Hu Liang, and Ma Liqian. 2017. Online RGB-D person re-identification based on metric model update. CAAI Transactions on Intelligence Technology 2, 1 (2017), 4855.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Liu Jialun, Sun Yifan, Han Chuchu, Dou Zhaopeng, and Li Wenhui. 2020. Deep representation learning on long-tailed data: A learnable embedding augmentation perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 29702979.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Liu Weiyang, Wen Yandong, Yu Zhiding, and Yang Meng. 2016. Large-margin Softmax loss for convolutional neural networks.. In International Conference on Machine Learning, New York City, NY, USA, Vol. 2. 7. Microtome Publishing, 507–516.Google ScholarGoogle Scholar
  20. [20] Luo Hao, Gu Youzhi, Liao Xingyu, Lai Shenqi, and Jiang Wei. 2019. Bag of tricks and a strong baseline for deep person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA. IEEE, 00.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Mogelmose Andreas, Bahnsen Chris, Moeslund Thomas, Clapés Albert, and Escalera Sergio. 2013. Tri-modal person re-identification with RGB, depth and thermal features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Portland, OR, USA. IEEE, 301307.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Møgelmose Andreas, Moeslund Thomas B., and Nasrollahi Kamal. 2013. Multimodal person re-identification using RGB-D sensors and a transient identification database. In 2013 International Workshop on Biometrics and Forensics (IWBF), Lisbon, Portugal. IEEE, 14.Google ScholarGoogle Scholar
  23. [23] Munaro Matteo, Fossati Andrea, Basso Alberto, Menegatti Emanuele, and Gool Luc Van. 2014. One-shot person re-identification with a consumer depth camera. In Person Re-Identification, Shaogang Gong, Marco Cristani, Shuicheng Yan, and Chen Change Loy (Eds.). Springer, 161181.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Pala Federico, Satta Riccardo, Fumera Giorgio, and Roli Fabio. 2015. Multimodal person reidentification using RGB-D cameras. IEEE Transactions on Circuits and Systems for Video Technology 26, 4 (2015), 788799.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Qian Xuelin, Fu Yanwei, Jiang Yu-Gang, Xiang Tao, and Xue Xiangyang. 2017. Multi-scale deep learning architectures for person re-identification. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy. IEEE, 53995408.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Redmon Joseph and Farhadi Ali. 2018. YOLOv3: An incremental improvement. arXiv:1804.02767Google ScholarGoogle Scholar
  27. [27] Shi Hailin, Yang Yang, Zhu Xiangyu, Liao Shengcai, Lei Zhen, Zheng Weishi, and Li Stan Z.. 2016. Embedding deep metric for person re-identification: A study against large variations. In European Conference on Computer Vision, Amsterdam, Netherlands. Springer, 732748.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Simonyan Karen and Zisserman Andrew. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556Google ScholarGoogle Scholar
  29. [29] Su Chi, Li Jianing, Zhang Shiliang, Xing Junliang, Gao Wen, and Tian Qi. 2017. Pose-driven deep convolutional model for person re-identification. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy. IEEE, 39603969.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Sun Yifan, Cheng Changmao, Zhang Yuhan, Zhang Chi, Zheng Liang, Wang Zhongdao, and Wei Yichen. 2020. Circle loss: A unified perspective of pair similarity optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA. IEEE, 63986407.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Sun Yifan, Zheng Liang, Deng Weijian, and Wang Shengjin. 2017. SVDnet for pedestrian retrieval. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy. IEEE, 38003808.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Sun Yifan, Zheng Liang, Yang Yi, Tian Qi, and Wang Shengjin. 2018. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Proceedings of the European Conference on Computer Vision (ECCV’18), Munich, Germany. Springer, 480496.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Wang Faqiang, Zuo Wangmeng, Lin Liang, Zhang David, and Zhang Lei. 2016. Joint learning of single-image and cross-image representations for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA. IEEE, 12881296.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Wang Guan’an, Zhang Tianzhu, Cheng Jian, Liu Si, Yang Yang, and Hou Zengguang. 2019. RGB-infrared cross-modality person re-identification via joint pixel and feature alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South). IEEE, 36233632.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Wang Xiaolong, Girshick Ross, Gupta Abhinav, and He Kaiming. 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA. IEEE, 77947803.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Wojke Nicolai and Bewley Alex. 2018. Deep cosine metric learning for person re-identification. In IEEE Winter Conference on Applications of Computer Vision (WACV’18), Lake Tahoe, NV/CA, USA. IEEE, 748756.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Wu Ancong, Zheng Wei-Shi, and Lai Jian-Huang. 2017. Robust depth-based person re-identification. IEEE Transactions on Image Processing 26, 6 (2017), 25882603.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Wu Ancong, Zheng Wei-Shi, Yu Hong-Xing, Gong Shaogang, and Lai Jianhuang. 2017. RGB-infrared cross-modality person re-identification. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy. IEEE, 53805389.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Wu Lin, Shen Chunhua, and Hengel Anton van den. 2016. Personnet: Person re-identification with deep convolutional neural networks. arXiv:1601.07255Google ScholarGoogle Scholar
  40. [40] Wu Yu, Lin Yutian, Dong Xuanyi, Yan Yan, Bian Wei, and Yang Yi. 2019. Progressive learning for person re-identification with one example. IEEE Transactions on Image Processing 28, 6 (June 2019), 28722881. Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Xu Xinxing, Li Wen, and Xu Dong. 2015. Distance metric learning using privileged information for face verification and person re-identification. IEEE Transactions on Neural Networks and Learning Systems 26, 12 (2015), 31503162.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Ye Mang, Shen Jianbing, Crandall David J., Shao Ling, and Luo Jiebo. 2020. Dynamic dual-attentive aggregation learning for visible-infrared person re-identification. In Proceedings of the 16th European Conference on Computer Vision (ECCV 2020), Glasgow, UK, August 23–28, 2020, Part XVII. Springer, 229247.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Ye Mang, Shen Jianbing, and Shao Ling. 2020. Visible-infrared person re-identification via homogeneous augmented tri-modal learning. IEEE Transactions on Information Forensics and Security 16 (2020), 728739.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Zhang Can, Liu Hong, Guo Wei, and Ye Mang. 2021. Multi-scale cascading network with compact feature learning for RGB-infrared person re-identification. In 25th International Conference on Pattern Recognition (ICPR’20), Milan, Italy. IEEE, 86798686.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Zhang Peng, Xu Jingsong, Wu Qiang, Huang Yan, and Zhang Jian. 2019. Top-push constrained modality-adaptive dictionary learning for cross-modality person re-identification. IEEE Transactions on Circuits and Systems for Video Technology 30, 12 (2019), 45544566.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Zhang Xuan, Luo Hao, Fan Xing, Xiang Weilai, Sun Yixiao, Xiao Qiqi, Jiang Wei, Zhang Chi, and Sun Jian. 2017. AlignedReID: Surpassing human-level performance in person re-identification. arXiv:1711.08184Google ScholarGoogle Scholar
  47. [47] Zhao Liming, Li Xi, Zhuang Yueting, and Wang Jingdong. 2017. Deeply-learned part-aligned representations for person re-identification. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy. IEEE, 32193228.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Zheng Zhedong, Zheng Liang, and Yang Yi. 2017. A discriminatively learned CNN embedding for person reidentification. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 1 (2017), 120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Zhuo Jiaxuan, Zhu Junyong, Lai Jianhuang, and Xie Xiaohua. 2017. Person re-identification on heterogeneous camera network. In CCF Chinese Conference on Computer Vision, Tianjin, China. Springer, 280291.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. An End-to-end Heterogeneous Restraint Network for RGB-D Cross-modal Person Re-identification

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 4
        November 2022
        497 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3514185
        • Editor:
        • Abdulmotaleb El Saddik
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 March 2022
        • Accepted: 1 December 2021
        • Revised: 1 November 2021
        • Received: 1 June 2021
        Published in tomm Volume 18, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!