skip to main content
research-article

JoT-GAN: A Framework for Jointly Training GAN and Person Re-Identification Model

Published:25 January 2022Publication History
Skip Abstract Section

Abstract

To cope with the problem caused by inadequate training data, many person re-identification (re-id) methods exploit generative adversarial networks (GAN) for data augmentation, where the training of GAN is typically independent of that of the re-id model. The coupling relation between them that probably brings in a performance gain of re-id is thus ignored. In this work, we propose a general framework, namely JoT-GAN, to jointly train GAN and the re-id model. It can simultaneously achieve the optima of both the generator and the re-id model, where the training is guided by each other through a discriminator. The re-id model is boosted for two reasons: (1) the adversarial training encourages it to fool the discriminator, and (2) the generated samples augment the training data. Extensive results on benchmark datasets show that for the re-id model trained with the identification loss as well as the triplet loss, the proposed joint training framework outperforms existing methods with separate training and achieves state-of-the-art re-id performance.

REFERENCES

  1. [1] Chang Xiaobin, Hospedales Timothy M., and Xiang Tao. 2018. Multi-level factorisation net for person re-identification. In IEEE Conference on Computer Vision and Pattern Recognition. 21092118.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Chen Weihua, Chen Xiaotang, Zhang Jianguo, and Huang Kaiqi. 2017. Beyond triplet loss: A deep quadruplet network for person re-identification. In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2. 403412.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Chen Ying-Cong, Zhu Xiatian, Zheng Wei-Shi, and Lai Jian-Huang. 2017. Person re-identification by camera correlation aware feature augmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 2 (2017), 392408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Dai Zihang, Yang Zhilin, Yang Fan, Cohen William W., and Salakhutdinov Ruslan R.. 2017. Good semi-supervised learning that requires a bad GAN. arXiv preprint arXiv:1705.09783. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Deng Weijian, Zheng Liang, Kang Guoliang, Yang Yi, Ye Qixiang, and Jiao Jianbin. 2018. Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person reidentification. In IEEE Conference on Computer Vision and Pattern Recognition. 9941003.Google ScholarGoogle Scholar
  6. [6] Ge Yixiao, Li Zhuowan, Zhao Haiyu, Yin Guojun, Yi Shuai, Wang Xiaogang, and Li Hongsheng. 2018. FD-GAN: Pose-guided feature distilling GAN for robust person re-identification. arXiv preprint arXiv:1810.02936. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Goodfellow Ian, Pouget-Abadie Jean, Mirza Mehdi, Xu Bing, Warde-Farley David, Ozair Sherjil, Courville Aaron, and Bengio Yoshua. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 26722680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] He Kaiming, Zhang Xiangyu, Ren Shaoqing, Sun Jian, He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Hermans Alexander, Beyer Lucas, and Leibe Bastian. 2017. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017).Google ScholarGoogle Scholar
  10. [10] Huang Yan, Xu Jingsong, Wu Qiang, Zheng Zhedong, Zhang Zhaoxiang, and Zhang Jian. 2018. Multi-pseudo regularized label for generated data in person re-identification. IEEE Transactions on Image Processing 28, 3 (2018), 13911403.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Isola Phillip, Zhu Jun-Yan, Zhou Tinghui, and Efros Alexei A.. 2017. Image-to-image translation with conditional adversarial networks. In IEEE Conference on Computer Vision and Pattern Recognition. 1125–1134.Google ScholarGoogle Scholar
  12. [12] Li Chongxuan, Xu Taufik, Zhu Jun, and Zhang Bo. 2017. Triple generative adversarial nets. In Advances in Neural Information Processing Systems. 40884098. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Li J., Zhang S., Tian Q., Wang M., and Gao W.. 2019. Pose-guided representation learning for person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019), 11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Li Wei, Zhu Xiatian, and Gong Shaogang. 2017. Person re-identification by deep joint learning of multi-loss classification. In International Joint Conference on Artificial Intelligence. 21942200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Li Wei, Zhu Xiatian, and Gong Shaogang. 2018. Harmonious attention network for person re-identification. In IEEE Conference on Computer Vision and Pattern Recognition. 22852294.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Li Xiang, Wu Ancong, and Zheng Wei-Shi. 2018. Adversarial open-world person re-identification. In European Conference on Computer Vision. 280296.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Liao Shengcai, Hu Yang, Zhu Xiangyu, and Li Stan Z.. 2015. Person re-identification by local maximal occurrence representation and metric learning. In IEEE Conference on Computer Vision and Pattern Recognition. 21972206.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Liu Jinxian, Ni Bingbing, Yan Yichao, Zhou Peng, Cheng Shuo, and Hu Jianguo. 2018. Pose transferrable person re-identification. In IEEE Conference on Computer Vision and Pattern Recognition. 40994108.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Liu Kan, Ma Bingpeng, Zhang Wei, and Huang Rui. 2015. A spatio-temporal appearance representation for video-based pedestrian re-identification. In IEEE International Conference on Computer Vision. 38103818. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Liu Meng, Qu Leigang, Nie Liqiang, Liu Maofu, Duan Lingyu, and Chen Baoquan. 2020. Iterative local-global collaboration learning towards one-shot video person re-identification. IEEE Transactions on Image Processing 29 (2020), 93609372.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Luo Hao, Gu Youzhi, Liao Xingyu, Lai Shenqi, and Jiang Wei. 2019. Bag of tricks and a strong baseline for deep person re-identification. In IEEE Conference on Computer Vision and Pattern Recognition Workshops.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Ma Liqian, Jia Xu, Sun Qianru, Schiele Bernt, Tuytelaars Tinne, and Gool Luc Van. 2017. Pose guided person image generation. In Advances in Neural Information Processing Systems. 406416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Mirza Mehdi and Osindero Simon. 2014. Conditional generative adversarial nets. arXiv:1411.1784 (2014).Google ScholarGoogle Scholar
  24. [24] Qian Xuelin, Fu Yanwei, Xiang Tao, Wang Wenxuan, Qiu Jie, Wu Yang, Jiang Yu-Gang, and Xue Xiangyang. 2018. Pose-normalized image generation for person re-identification. In European Conference on Computer Vision. 650–667.Google ScholarGoogle Scholar
  25. [25] Radford Alec, Metz Luke, and Chintala Soumith. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015).Google ScholarGoogle Scholar
  26. [26] Ristani Ergys, Solera Francesco, Zou Roger, Cucchiara Rita, and Tomasi Carlo. 2016. Performance measures and a data set for multi-target, multi-camera tracking. In European Conference on Computer Vision. Springer, 1735.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Salimans Tim, Goodfellow Ian, Zaremba Wojciech, Cheung Vicki, Radford Alec, and Chen Xi. 2016. Improved techniques for training GANs. In Advances in Neural Information Processing Systems. 22342242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Si Jianlou, Zhang Honggang, Li Chun-Guang, Kuen Jason, Kong Xiangfei, Kot Alex C., and Wang Gang. 2018. Dual attention matching network for context-aware feature sequence based person re-identification. In IEEE Conference on Computer Vision and Pattern Recognition. 53635372.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Song Ke, Zhang Wei, Song Ran, Li Yibin, et al. 2020. Online decision based visual tracking via reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 33. 1177811788.Google ScholarGoogle Scholar
  30. [30] Song Ran, Zhang Wei, Zhao Yitian, Liu Yonghuai, and Rosin Paul L.. 2021. Mesh saliency: An independent perceptual measure or a derivative of image saliency? In IEEE Conference on Computer Vision and Pattern Recognition. 88538862.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Su Chi, Li Jianing, Zhang Shiliang, Xing Junliang, Gao Wen, and Tian Qi. 2017. Pose-driven deep convolutional model for person re-identification. In IEEE International Conference on Computer Vision. 39603969.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Su Chi, Yang Fan, Zhang Shiliang, Tian Qi, Davis Larry Steven, and Gao Wen. 2018. Multi-task learning with low rank attribute embedding for multi-camera person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 5 (2018), 11671181.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Sun Yifan, Zheng Liang, Deng Weijian, and Wang Shengjin. 2017. Svdnet for pedestrian retrieval. arXiv preprint 1, 6 (2017).Google ScholarGoogle Scholar
  34. [34] Sun Yifan, Zheng Liang, Yang Yi, Tian Qi, and Wang Shengjin. 2018. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In European Conference on Computer Vision. 480496.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Szegedy Christian, Liu Wei, Jia Yangqing, Sermanet Pierre, Reed Scott, Anguelov Dragomir, Erhan Dumitru, Vanhoucke Vincent, and Rabinovich Andrew. 2015. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition. 19.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Wang Xiaojuan, Zheng Wei-Shi, Li Xiang, and Zhang Jianguo. 2016. Cross-scenario transfer person reidentification. IEEE Transactions on Circuits and Systems for Video Technology 26, 8 (2016), 14471460.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Wei Longhui, Zhang Shiliang, Gao Wen, and Tian Qi. 2018. Person transfer GAN to bridge domain gap for person re-identification. In IEEE Conference on Computer Vision and Pattern Recognition. 7988.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Wei Longhui, Zhang Shiliang, Yao Hantao, Gao Wen, and Tian Qi. 2017. Glad: Global-local-alignment descriptor for pedestrian retrieval. In ACM International Conference on Multimedia. 420428. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Yan Yichao, Xu Jingwei, Ni Bingbing, Zhang Wendong, and Yang Xiaokang. 2017. Skeleton-aided articulated motion generation. In ACM International on Multimedia Conference. 199207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Yao Hantao, Zhang Shiliang, Hong Richang, Zhang Yongdong, Xu Changsheng, and Tian Qi. 2019. Deep representation learning with part loss for person re-identification. IEEE Transactions on Image Processing 28, 6 (2019), 28602871.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Zhang Wei, He Xuanyu, Lu Weizhi, Qiao Hong, and Li Yibin. 2019. Feature aggregation with reinforcement learning for video-based person re-identification. IEEE Transactions on Neural Networks and Learning Systems 30, 12 (2019), 38473852.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Zhang Wei, He Xuanyu, Yu Xiaodong, Lu Weizhi, Zha Zhengjun, and Tian Qi. 2020. A multi-scale spatial-temporal attention model for person re-identification in videos. IEEE Transactions on Image Processing 29 (2020), 33653373.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Zhang Wei, Ma Bingpeng, Liu Kan, and Huang Rui. 2017. Video-based pedestrian re-identification by adaptive spatio-temporal appearance model. IEEE Transactions on Image Processing 26, 4 (2017), 20422054. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Zhao Rui, Oyang Wanli, and Wang Xiaogang. 2016. Person re-identification by saliency learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 2 (2016), 356370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Zhao Zhongwei, Song Ran, Zhang Qian, Duan Peng, and Zhang Youmei. 2021. A framework for jointly training GAN with person re-identification model. In International Conference on Pattern Recognition Workshop: Fine-Grained Visual Recognition and re-IDentification. 3651.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Zheng Liang, Bie Zhi, Sun Yifan, Wang Jingdong, Su Chi, Wang Shengjin, and Tian Qi. 2016. Mars: A video benchmark for large-scale person re-identification. In European Conference on Computer Vision. 868884.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Zheng Liang, Shen Liyue, Tian Lu, Wang Shengjin, Wang Jingdong, and Tian Qi. 2015. Scalable person re-identification: A benchmark. In IEEE International Conference on Computer Vision. 11161124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Zheng Liang, Yang Yi, and Hauptmann Alexander G.. 2016. Person re-identification: Past, present and future. arXiv preprint arXiv:1610.02984 (2016).Google ScholarGoogle Scholar
  49. [49] Zheng Zhedong, Yang Xiaodong, Yu Zhiding, Zheng Liang, Yang Yi, and Kautz Jan. 2019. Joint discriminative and generative learning for person re-identification. In IEEE Conference on Computer Vision and Pattern Recognition. 21382147.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Zheng Zhedong, Zheng Liang, and Yang Yi. 2017. A discriminatively learned CNN embedding for person reidentification. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 1 (2017), 120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Zheng Zhedong, Zheng Liang, and Yang Yi. 2017. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In IEEE International Conference on Computer Vision. 37543762.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Zhong Zhun, Zheng Liang, Cao Donglin, and Li Shaozi. 2017. Re-ranking person re-identification with k-reciprocal encoding. In IEEE Conference on Computer Vision and Pattern Recognition. 13181327.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Zhong Zhun, Zheng Liang, Zheng Zhedong, Li Shaozi, and Yang Yi. 2018. Camera style adaptation for person re-identification. In IEEE Conference on Computer Vision and Pattern Recognition. 51575166.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Zhu Jun-Yan, Park Taesung, Isola Phillip, and Efros Alexei A.. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In IEEE International Conference on Computer Vision. 2223–2232.Google ScholarGoogle Scholar

Index Terms

  1. JoT-GAN: A Framework for Jointly Training GAN and Person Re-Identification Model

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 1s
      February 2022
      352 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3505206
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 January 2022
      • Accepted: 1 August 2021
      • Revised: 1 June 2021
      • Received: 1 January 2021
      Published in tomm Volume 18, Issue 1s

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!