skip to main content
research-article

Correlation Discrepancy Insight Network for Video Re-identification

Published:17 December 2020Publication History
Skip Abstract Section

Abstract

Video-based person re-identification (ReID) aims at re-identifying a specified person sequence from videos that were captured by disjoint cameras. Most existing works on this task ignore the quality discrepancy across frames by using all video frames to develop a ReID method. Additionally, they adopt only the person self-characteristic as the representation, which cannot adapt to cross-camera variation effectively. To that end, we propose a novel correlation discrepancy insight network for video-based person ReID, which consists of an unsupervised correlation insight model (CIM) for video purification and a discrepancy description network (DDN) for person representation. Concretely, CIM is constructed by using kernelized correlation filters to encode person half-parts, which evaluates the frame quality by the cross correlation across frames for selecting discriminative video fragments. Furthermore, DDN exploits the selected video fragments to generate a discrepancy descriptor using a compression network, which aims at employing the discrepancies with other persons’ to facilitate the representation of the target person rather than only using the self-characteristic. Due to the advantage in handling cross-domain variation, the discrepancy descriptor is expected to provide a new pattern for the object representation in cross-camera tasks. Experimental results on three public benchmarks demonstrate that the proposed method outperforms several state-of-the-art methods.

References

  1. Rémi Auguste, Jean Martinet, and Pierre Tirilly. 2015. Space-time histograms and their application to person re-identification in tv shows. In Proceedings of the ACM Conference on Multimedia. 91--97.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Slawomir Bak, Guillaume Charpiat, Etienne Corvee, Francois Bremond, and Monique Thonnat. 2012. Learning to match appearances by correlations in a covariance metric space. In Proceedings of the European Conference on Computer Vision. Springer, 806--820.Google ScholarGoogle Scholar
  3. A. Bedagkar-Gala and S. K. Shah. 2011. Multiple person re-identification using part based spatio-temporal color appearance model. In Proceedings of the International Conference Computer Vision Workshops. 1721--1728.Google ScholarGoogle Scholar
  4. David S. Bolme, J. Ross Beveridge, Bruce A. Draper, and Yui Man Lui. 2010. Visual object tracking using adaptive correlation filters. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 2544--2550.Google ScholarGoogle ScholarCross RefCross Ref
  5. Dapeng Chen, Zejian Yuan, Badong Chen, and Nanning Zheng. 2016. Similarity learning with spatial constraints for person re-identification. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 1268--1277.Google ScholarGoogle ScholarCross RefCross Ref
  6. Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang. 2017. A multi-task deep network for person re-identification. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  7. Yanbei Chen, Xiatian Zhu, and Shaogang Gong. 2018. Deep association learning for unsupervised video person re-identification. arXiv:1808.07301. Retrieved from http://arxiv.org/abs/1808.07301.Google ScholarGoogle Scholar
  8. Afshin Dehghan, Shayan Modiri Assari, and Mubarak Shah. 2015. Gmmcp tracker: Globally optimal generalized maximum multi clique problem for multiple object tracking. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 4091--4099.Google ScholarGoogle ScholarCross RefCross Ref
  9. Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester, and Deva Ramanan. 2010. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9 (2010), 1627--1645.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Niloofar Gheissari, Thomas B. Sebastian, and Richard Hartley. 2006. Person reidentification using spatiotemporal appearance. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 1528--1535.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Shaogang Gong and Tao Xiang. 2011. Visual Analysis of Behaviour: From Pixels to Semantics. Springer Science 8 Business Media.Google ScholarGoogle Scholar
  12. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  13. João F. Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. 2015. High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37, 3 (2015), 583--596.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Martin Hirzer, Csaba Beleznai, Peter M. Roth, and Horst Bischof. 2011. Person re-identification by descriptive and discriminative classification. In Image Analysis.Google ScholarGoogle Scholar
  15. Wenjun Huang, Chao Liang, Yi Yu, Zheng Wang, Weijian Ruan, and Ruimin Hu. 2018. Video-based person re-identification via self paced weighting. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2273--2280.Google ScholarGoogle Scholar
  16. Srikrishna Karanam, Yang Li, and Richard J. Radke. 2015. Person re-identification with discriminatively trained viewpoint invariant dictionaries. In Proceedings of the International Conference on Computer Vision.Google ScholarGoogle Scholar
  17. S. Karanam, Y. Li, and R. J. Radke. 2015. Person re-identification with discriminatively trained viewpoint invariant dictionaries. In Proceedings of the International Conference on Computer Vision. 4516--4524.Google ScholarGoogle Scholar
  18. Srikrishna Karanam, Yang Li, and Richard J. Radke. 2015. Sparse re-id: Block sparsity for person re-identification. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition Workshops. 33--40.Google ScholarGoogle Scholar
  19. Alexander Klaser, Marcin Marszałek, and Cordelia Schmid. 2008. A spatio-temporal descriptor based on 3d-gradients. In Proceedings of the British Machine Vision Conference 275--1.Google ScholarGoogle ScholarCross RefCross Ref
  20. Ivan Laptev, Marcin Marszalek, Cordelia Schmid, and Benjamin Rozenfeld. 2008. Learning realistic human actions from movies. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  21. Minxian Li, Xiatian Zhu, and Shaogang Gong. 2020. Unsupervised tracklet person re-identification. IEEE Trans. Pattern Anal. Mach. Intell. 42, 7 (2020), 1770--1782.Google ScholarGoogle ScholarCross RefCross Ref
  22. Shuangqun Li, Xinchen Liu, Wu Liu, Huadong Ma, and Haitao Zhang. 2016. A discriminative null space based deep learning approach for person re-identification. In Proceedings of International Conference on Cloud Computing and Intelligence Systems. 480--484.Google ScholarGoogle ScholarCross RefCross Ref
  23. Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z. Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 2197--2206.Google ScholarGoogle Scholar
  24. Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z. Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 2197--2206.Google ScholarGoogle Scholar
  25. Jiawei Liu, Zheng-Jun Zha, Xuejin Chen, Zilei Wang, and Yongdong Zhang. 2019. Dense 3D-convolutional neural network for person re-identification in videos. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1s (2019), 8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Kan Liu, Bingpeng Ma, Wei Zhang, and Rui Huang. 2015. A spatio-temporal appearance representation for video-based pedestrian re-identification. In Proceedings of the International Conference on Computer Vision.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Wu Liu, Xinchen Liu, Huadomg Ma, and Peng Cheng. 2017. Beyond human-level license plate super-resolution with progressive vehicle search and domain priori GAN. In Proceedings of the 25th ACM International Conference on Multimedia. 1618--1626.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Wu Liu, Cheng Zhang, Huadong Ma, and Shuangqun Li. 2018. Learning efficient spatial-temporal gait features with deep learning for human identification. Neuroinformatics 16, 3--4 (2018), 457--471.Google ScholarGoogle ScholarCross RefCross Ref
  29. Zimo Liu, Dong Wang, and Huchuan Lu. 2017. Stepwise metric promotion for unsupervised video person re-identification. In Proceedings of the International Conference on Computer Vision. 2448--2457.Google ScholarGoogle ScholarCross RefCross Ref
  30. Tetsu Matsukawa, Takahiro Okabe, Einoshin Suzuki, and Yoichi Sato. 2016. Hierarchical Gaussian descriptor for person re-identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1363--1372.Google ScholarGoogle ScholarCross RefCross Ref
  31. N. McLaughlin, J. Martinez del Rincon, and P. Miller. 2016. Recurrent convolutional network for video-based person re-identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  32. Niall McLaughlin, Jesus Martinez del Rincon, Paul Miller, and Paul Miller. 2016. Recurrent convolutional network for video-based person re-identification. In Proceedings of the Computer Vision and Pattern Recognition. 1325--1334.Google ScholarGoogle ScholarCross RefCross Ref
  33. Deqiang Ouyang, Jie Shao, Yonghui Zhang, Yang Yang, and Heng Tao Shen. 2018. Video-based person re-identification via self-paced learning and deep reinforcement learning framework. In Proceedings of the 2018 ACM Multimedia Conference. ACM, 1562--1570.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Deqiang Ouyang, Yonghui Zhang, and Jie Shao. 2019. Video-based person re-identification via spatio-temporal attentional and two-stream fusion convolutional networks. Pattern Recogn. Lett. 117 (2019), 153--160.Google ScholarGoogle ScholarCross RefCross Ref
  35. Weijian Ruan, Jun Chen, Chao Liang, Yi Wu, and Ruimin Hu. 2017. Object tracking via online trajectory optimization with multi-feature fusion. In Proceedings of the International Conference on Multimedia Expo. 1231--1236.Google ScholarGoogle ScholarCross RefCross Ref
  36. Weijian Ruan, Jun Chen, Jinqiao Wang, Bo Luo, Wenjun Huang, and Ruimin Hu. 2016. Boosted local classifiers for visual tracking. In Proceedings of the International Conference on Multimedia Expo. 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  37. Weijian Ruan, Jun Chen, Yi Wu, Jinqiao Wang, Chao Liang, Ruimin Hu, and Junjun Jiang. 2018. Multi-correlation filters with triangle-structure constraints for object tracking. IEEE Trans. Multimdedia 21, 5 (2018), 1122--1134.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Weijian Ruan, Chao Liang, Yi Yu, Jun Chen, and Ruimin Hu. 2020. SIST: Online scale-adaptive object tracking with stepwise insight. Neurocomputing 384 (2020), 200--212.Google ScholarGoogle ScholarCross RefCross Ref
  39. Weijian Ruan, Wu Liu, Qian Bao, Jun Chen, Yuhao Cheng, and Tao Mei. 2019. POINet: Pose-guided ovonic insight network for multi-person pose tracking. In Proceedings of the ACM Conference on Multimedia. ACM, 284--292.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Paul Scovanner, Saad Ali, and Mubarak Shah. 2007. A 3-dimensional sift descriptor and its application to action recognition. In Proceedings of the ACM Conference on Multimedia. ACM, 357--360.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  42. Chi Su, Shiliang Zhang, Junliang Xing, Wen Gao, and Qi Tian. 2016. Deep attributes driven multi-camera person re-identification. In Proceedings of the European Conference on Computer Vision. Springer, 475--491.Google ScholarGoogle ScholarCross RefCross Ref
  43. Yu Sun, Yun Ye, Wu Liu, Wenpeng Gao, Yili Fu, and Tao Mei. 2019. Human mesh recovery from monocular images via a skeleton-disentangled representation. In Proceedings of the International Conference on Computer Vision. 5349--5358.Google ScholarGoogle ScholarCross RefCross Ref
  44. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 2818--2826.Google ScholarGoogle ScholarCross RefCross Ref
  45. Dapeng Tao, Lianwen Jin, Yongfei Wang, Yuan Yuan, and Xuelong Li. 2013. Person re-identification by regularized smoothing kiss metric learning. IEEE Trans. Circ. Syst. Video Technol. 23, 10 (2013), 1675--1685.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Heng Wang, Muhammad Muneeb Ullah, Alexander Klaser, Ivan Laptev, and Cordelia Schmid. 2009. Evaluation of local spatio-temporal features for action recognition. In Proceedings of the British Machine Vision Conference 124--1.Google ScholarGoogle ScholarCross RefCross Ref
  47. Taiqing Wang, Shaogang Gong, Xiatian Zhu, and Shengjin Wang. 2014. Person re-identification by video ranking. In Proceedings of the European Conference on Computer Vision. 688--703.Google ScholarGoogle ScholarCross RefCross Ref
  48. Taiqing Wang, Shaogang Gong, Xiatian Zhu, and Shengjin Wang. 2016. Person re-identification by discriminative selection in video ranking. IEEE Trans. Pattern Anal. Mach. Intell. 38, 12 (2016), 2501--2514.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Xiao Wang, Chao Liang, Chen Chen, Jun Chen, Zheng Wang, Zhen Han, and Chunxia Xiao. 2019. S3D: Scalable pedestrian detection via score scale surface discrimination. IEEE Trans. Circuits and. Syst. Video Technol. 30, 10 (2019), 3332--3344.Google ScholarGoogle ScholarCross RefCross Ref
  50. Zheng Wang, Ruimin Hu, Chao Liang, Yi Yu, Junjun Jiang, Mang Ye, Jun Chen, and Qingming Leng. 2016. Zero-shot person re-identification via cross-view consistency. IEEE Trans. Multimedia 18, 2 (2016), 260--272.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Lin Wu, Yang Wang, Junbin Gao, and Xue Li. 2019. Where-and-when to look: Deep siamese attention networks for video-based person re-identification. IEEE Trans. Multimedia 21, 6 (2019), 1412--1424.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Yu Wu, Yutian Lin, Xuanyi Dong, Yan Yan, Wei Bian, and Yi Yang. 2019. Progressive learning for person re-identification with one example. IEEE Trans. Image Process. 28, 6 (2019), 2872--2881.Google ScholarGoogle ScholarCross RefCross Ref
  53. Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, and Pan Zhou. 2017. Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In Proceedings of the International Conference on Computer Vision. 4743--4752.Google ScholarGoogle ScholarCross RefCross Ref
  54. Mang Ye, Xiangyuan Lan, and Pong C. Yuen. 2018. Robust anchor embedding for unsupervised video person re-identification in the wild. In Proceedings of the European Conference on Computer Vision. 170--186.Google ScholarGoogle Scholar
  55. Mang Ye, Jiawei Li, Andy J. Ma, Liang Zheng, and Pong C. Yuen. 2019. Dynamic graph co-matching for unsupervised video-based person re-identification. IEEE Trans. Image Process. 28, 6 (2019), 2976--2990.Google ScholarGoogle ScholarCross RefCross Ref
  56. Mang Ye, Chao Liang, Yi Yu, Zheng Wang, Qingming Leng, Chunxia Xiao, Jun Chen, and Ruimin Hu. 2016. Person re-identification via ranking aggregation of similarity pulling and dissimilarity pushing. IEEE Trans. Multimedia 18, 12 (2016), 2553--2566.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Mang Ye, Andy J. Ma, Liang Zheng, Jiawei Li, and Pong C. Yuen. 2017. Dynamic label graph matching for unsupervised video re-identification. In Proceedings of the International Conference on Computer Vision.Google ScholarGoogle Scholar
  58. Jinjie You, Ancong Wu, Xiang Li, and Wei-Shi Zheng. 2016. Top-push video-based person re-identification. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  59. Jinjie You, Ancong Wu, Xiang Li, and Wei-Shi Zheng. 2016. Top-push video-based person re-identification. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 1345--1353.Google ScholarGoogle ScholarCross RefCross Ref
  60. Li Zhang, Tao Xiang, and Shaogang Gong. 2016. Learning a discriminative null space for person re-identification. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 1239--1248.Google ScholarGoogle ScholarCross RefCross Ref
  61. Ruimao Zhang, Liang Lin, Rui Zhang, Wangmeng Zuo, and Lei Zhang. 2015. Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification. IEEE Trans. Image Process. 24, 12 (2015), 4766--4779.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Wei Zhang, Shengnan Hu, Kan Liu, and Zhengjun Zha. 2018. Compact appearance learning for video-based person re-identification. IEEE Trans. Circ. Syst. Video Technol. 29, 8 (2018), 2442--2452.Google ScholarGoogle ScholarCross RefCross Ref
  63. Wei Zhang, Xiaodong Yu, and Xuanyu He. 2017. Learning bidirectional temporal cues for video-based person re-identification. IEEE Trans. Circ. Syst. Video Technol. 28, 10 (2017), 2768--2776.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian. 2016. MARS: A video benchmark for large-scale person re-identification. In Proceedings of the European Conference on Computer Vision.Google ScholarGoogle ScholarCross RefCross Ref
  65. Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proceedings of the International Conference on Computer Vision.Google ScholarGoogle ScholarCross RefCross Ref
  66. Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proceedings of the International Conference on Computer Vision. 1116--1124.Google ScholarGoogle ScholarCross RefCross Ref
  67. Liang Zheng, Yi Yang, and Alexander G. Hauptmann. 2016. Person re-identification: Past, present and future. arXiv preprint arXiv:1610.02984 (2016).Google ScholarGoogle Scholar
  68. Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In Proceedings of the International Conference on Computer Vision. 3754--3762.Google ScholarGoogle ScholarCross RefCross Ref
  69. Zhun Zhong, Liang Zheng, Donglin Cao, and Shaozi Li. 2017. Re-ranking person re-identification with k-reciprocal encoding. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 1318--1327.Google ScholarGoogle ScholarCross RefCross Ref
  70. Zhen Zhou, Yan Huang, Wei Wang, Liang Wang, and Tieniu Tan. 2017. See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 6776--6785.Google ScholarGoogle ScholarCross RefCross Ref
  71. Xiaoke Zhu, Xiao-Yuan Jing, Fei Wu, and Hui Feng. 2016. Video-based person re-identification by simultaneously learning intra-video and inter-video distance metrics. In Proceedings of the International Joint Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  72. Xiaoke Zhu, Xiao-Yuan Jing, Fei Wu, Yunhong Wang, Wangmeng Zuo, and Wei-Shi Zheng. 2017. Learning heterogeneous dictionary pair with feature projection matrix for pedestrian video retrieval via single query image. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar

Index Terms

  1. Correlation Discrepancy Insight Network for Video Re-identification

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!