Abstract
Video-based person re-identification (ReID) aims at re-identifying a specified person sequence from videos that were captured by disjoint cameras. Most existing works on this task ignore the quality discrepancy across frames by using all video frames to develop a ReID method. Additionally, they adopt only the person self-characteristic as the representation, which cannot adapt to cross-camera variation effectively. To that end, we propose a novel correlation discrepancy insight network for video-based person ReID, which consists of an unsupervised correlation insight model (CIM) for video purification and a discrepancy description network (DDN) for person representation. Concretely, CIM is constructed by using kernelized correlation filters to encode person half-parts, which evaluates the frame quality by the cross correlation across frames for selecting discriminative video fragments. Furthermore, DDN exploits the selected video fragments to generate a discrepancy descriptor using a compression network, which aims at employing the discrepancies with other persons’ to facilitate the representation of the target person rather than only using the self-characteristic. Due to the advantage in handling cross-domain variation, the discrepancy descriptor is expected to provide a new pattern for the object representation in cross-camera tasks. Experimental results on three public benchmarks demonstrate that the proposed method outperforms several state-of-the-art methods.
- Rémi Auguste, Jean Martinet, and Pierre Tirilly. 2015. Space-time histograms and their application to person re-identification in tv shows. In Proceedings of the ACM Conference on Multimedia. 91--97.Google Scholar
Digital Library
- Slawomir Bak, Guillaume Charpiat, Etienne Corvee, Francois Bremond, and Monique Thonnat. 2012. Learning to match appearances by correlations in a covariance metric space. In Proceedings of the European Conference on Computer Vision. Springer, 806--820.Google Scholar
- A. Bedagkar-Gala and S. K. Shah. 2011. Multiple person re-identification using part based spatio-temporal color appearance model. In Proceedings of the International Conference Computer Vision Workshops. 1721--1728.Google Scholar
- David S. Bolme, J. Ross Beveridge, Bruce A. Draper, and Yui Man Lui. 2010. Visual object tracking using adaptive correlation filters. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 2544--2550.Google Scholar
Cross Ref
- Dapeng Chen, Zejian Yuan, Badong Chen, and Nanning Zheng. 2016. Similarity learning with spatial constraints for person re-identification. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 1268--1277.Google Scholar
Cross Ref
- Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang. 2017. A multi-task deep network for person re-identification. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.Google Scholar
- Yanbei Chen, Xiatian Zhu, and Shaogang Gong. 2018. Deep association learning for unsupervised video person re-identification. arXiv:1808.07301. Retrieved from http://arxiv.org/abs/1808.07301.Google Scholar
- Afshin Dehghan, Shayan Modiri Assari, and Mubarak Shah. 2015. Gmmcp tracker: Globally optimal generalized maximum multi clique problem for multiple object tracking. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 4091--4099.Google Scholar
Cross Ref
- Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester, and Deva Ramanan. 2010. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9 (2010), 1627--1645.Google Scholar
Digital Library
- Niloofar Gheissari, Thomas B. Sebastian, and Richard Hartley. 2006. Person reidentification using spatiotemporal appearance. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 1528--1535.Google Scholar
Digital Library
- Shaogang Gong and Tao Xiang. 2011. Visual Analysis of Behaviour: From Pixels to Semantics. Springer Science 8 Business Media.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 770--778.Google Scholar
Cross Ref
- João F. Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. 2015. High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37, 3 (2015), 583--596.Google Scholar
Digital Library
- Martin Hirzer, Csaba Beleznai, Peter M. Roth, and Horst Bischof. 2011. Person re-identification by descriptive and discriminative classification. In Image Analysis.Google Scholar
- Wenjun Huang, Chao Liang, Yi Yu, Zheng Wang, Weijian Ruan, and Ruimin Hu. 2018. Video-based person re-identification via self paced weighting. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2273--2280.Google Scholar
- Srikrishna Karanam, Yang Li, and Richard J. Radke. 2015. Person re-identification with discriminatively trained viewpoint invariant dictionaries. In Proceedings of the International Conference on Computer Vision.Google Scholar
- S. Karanam, Y. Li, and R. J. Radke. 2015. Person re-identification with discriminatively trained viewpoint invariant dictionaries. In Proceedings of the International Conference on Computer Vision. 4516--4524.Google Scholar
- Srikrishna Karanam, Yang Li, and Richard J. Radke. 2015. Sparse re-id: Block sparsity for person re-identification. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition Workshops. 33--40.Google Scholar
- Alexander Klaser, Marcin Marszałek, and Cordelia Schmid. 2008. A spatio-temporal descriptor based on 3d-gradients. In Proceedings of the British Machine Vision Conference 275--1.Google Scholar
Cross Ref
- Ivan Laptev, Marcin Marszalek, Cordelia Schmid, and Benjamin Rozenfeld. 2008. Learning realistic human actions from movies. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. IEEE, 1--8.Google Scholar
Cross Ref
- Minxian Li, Xiatian Zhu, and Shaogang Gong. 2020. Unsupervised tracklet person re-identification. IEEE Trans. Pattern Anal. Mach. Intell. 42, 7 (2020), 1770--1782.Google Scholar
Cross Ref
- Shuangqun Li, Xinchen Liu, Wu Liu, Huadong Ma, and Haitao Zhang. 2016. A discriminative null space based deep learning approach for person re-identification. In Proceedings of International Conference on Cloud Computing and Intelligence Systems. 480--484.Google Scholar
Cross Ref
- Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z. Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 2197--2206.Google Scholar
- Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z. Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 2197--2206.Google Scholar
- Jiawei Liu, Zheng-Jun Zha, Xuejin Chen, Zilei Wang, and Yongdong Zhang. 2019. Dense 3D-convolutional neural network for person re-identification in videos. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1s (2019), 8.Google Scholar
Digital Library
- Kan Liu, Bingpeng Ma, Wei Zhang, and Rui Huang. 2015. A spatio-temporal appearance representation for video-based pedestrian re-identification. In Proceedings of the International Conference on Computer Vision.Google Scholar
Digital Library
- Wu Liu, Xinchen Liu, Huadomg Ma, and Peng Cheng. 2017. Beyond human-level license plate super-resolution with progressive vehicle search and domain priori GAN. In Proceedings of the 25th ACM International Conference on Multimedia. 1618--1626.Google Scholar
Digital Library
- Wu Liu, Cheng Zhang, Huadong Ma, and Shuangqun Li. 2018. Learning efficient spatial-temporal gait features with deep learning for human identification. Neuroinformatics 16, 3--4 (2018), 457--471.Google Scholar
Cross Ref
- Zimo Liu, Dong Wang, and Huchuan Lu. 2017. Stepwise metric promotion for unsupervised video person re-identification. In Proceedings of the International Conference on Computer Vision. 2448--2457.Google Scholar
Cross Ref
- Tetsu Matsukawa, Takahiro Okabe, Einoshin Suzuki, and Yoichi Sato. 2016. Hierarchical Gaussian descriptor for person re-identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1363--1372.Google Scholar
Cross Ref
- N. McLaughlin, J. Martinez del Rincon, and P. Miller. 2016. Recurrent convolutional network for video-based person re-identification. In Proceedings of the Conference on Computer Vision and Pattern Recognition.Google Scholar
- Niall McLaughlin, Jesus Martinez del Rincon, Paul Miller, and Paul Miller. 2016. Recurrent convolutional network for video-based person re-identification. In Proceedings of the Computer Vision and Pattern Recognition. 1325--1334.Google Scholar
Cross Ref
- Deqiang Ouyang, Jie Shao, Yonghui Zhang, Yang Yang, and Heng Tao Shen. 2018. Video-based person re-identification via self-paced learning and deep reinforcement learning framework. In Proceedings of the 2018 ACM Multimedia Conference. ACM, 1562--1570.Google Scholar
Digital Library
- Deqiang Ouyang, Yonghui Zhang, and Jie Shao. 2019. Video-based person re-identification via spatio-temporal attentional and two-stream fusion convolutional networks. Pattern Recogn. Lett. 117 (2019), 153--160.Google Scholar
Cross Ref
- Weijian Ruan, Jun Chen, Chao Liang, Yi Wu, and Ruimin Hu. 2017. Object tracking via online trajectory optimization with multi-feature fusion. In Proceedings of the International Conference on Multimedia Expo. 1231--1236.Google Scholar
Cross Ref
- Weijian Ruan, Jun Chen, Jinqiao Wang, Bo Luo, Wenjun Huang, and Ruimin Hu. 2016. Boosted local classifiers for visual tracking. In Proceedings of the International Conference on Multimedia Expo. 1--6.Google Scholar
Cross Ref
- Weijian Ruan, Jun Chen, Yi Wu, Jinqiao Wang, Chao Liang, Ruimin Hu, and Junjun Jiang. 2018. Multi-correlation filters with triangle-structure constraints for object tracking. IEEE Trans. Multimdedia 21, 5 (2018), 1122--1134.Google Scholar
Digital Library
- Weijian Ruan, Chao Liang, Yi Yu, Jun Chen, and Ruimin Hu. 2020. SIST: Online scale-adaptive object tracking with stepwise insight. Neurocomputing 384 (2020), 200--212.Google Scholar
Cross Ref
- Weijian Ruan, Wu Liu, Qian Bao, Jun Chen, Yuhao Cheng, and Tao Mei. 2019. POINet: Pose-guided ovonic insight network for multi-person pose tracking. In Proceedings of the ACM Conference on Multimedia. ACM, 284--292.Google Scholar
Digital Library
- Paul Scovanner, Saad Ali, and Mubarak Shah. 2007. A 3-dimensional sift descriptor and its application to action recognition. In Proceedings of the ACM Conference on Multimedia. ACM, 357--360.Google Scholar
Digital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- Chi Su, Shiliang Zhang, Junliang Xing, Wen Gao, and Qi Tian. 2016. Deep attributes driven multi-camera person re-identification. In Proceedings of the European Conference on Computer Vision. Springer, 475--491.Google Scholar
Cross Ref
- Yu Sun, Yun Ye, Wu Liu, Wenpeng Gao, Yili Fu, and Tao Mei. 2019. Human mesh recovery from monocular images via a skeleton-disentangled representation. In Proceedings of the International Conference on Computer Vision. 5349--5358.Google Scholar
Cross Ref
- Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 2818--2826.Google Scholar
Cross Ref
- Dapeng Tao, Lianwen Jin, Yongfei Wang, Yuan Yuan, and Xuelong Li. 2013. Person re-identification by regularized smoothing kiss metric learning. IEEE Trans. Circ. Syst. Video Technol. 23, 10 (2013), 1675--1685.Google Scholar
Digital Library
- Heng Wang, Muhammad Muneeb Ullah, Alexander Klaser, Ivan Laptev, and Cordelia Schmid. 2009. Evaluation of local spatio-temporal features for action recognition. In Proceedings of the British Machine Vision Conference 124--1.Google Scholar
Cross Ref
- Taiqing Wang, Shaogang Gong, Xiatian Zhu, and Shengjin Wang. 2014. Person re-identification by video ranking. In Proceedings of the European Conference on Computer Vision. 688--703.Google Scholar
Cross Ref
- Taiqing Wang, Shaogang Gong, Xiatian Zhu, and Shengjin Wang. 2016. Person re-identification by discriminative selection in video ranking. IEEE Trans. Pattern Anal. Mach. Intell. 38, 12 (2016), 2501--2514.Google Scholar
Digital Library
- Xiao Wang, Chao Liang, Chen Chen, Jun Chen, Zheng Wang, Zhen Han, and Chunxia Xiao. 2019. S3D: Scalable pedestrian detection via score scale surface discrimination. IEEE Trans. Circuits and. Syst. Video Technol. 30, 10 (2019), 3332--3344.Google Scholar
Cross Ref
- Zheng Wang, Ruimin Hu, Chao Liang, Yi Yu, Junjun Jiang, Mang Ye, Jun Chen, and Qingming Leng. 2016. Zero-shot person re-identification via cross-view consistency. IEEE Trans. Multimedia 18, 2 (2016), 260--272.Google Scholar
Digital Library
- Lin Wu, Yang Wang, Junbin Gao, and Xue Li. 2019. Where-and-when to look: Deep siamese attention networks for video-based person re-identification. IEEE Trans. Multimedia 21, 6 (2019), 1412--1424.Google Scholar
Digital Library
- Yu Wu, Yutian Lin, Xuanyi Dong, Yan Yan, Wei Bian, and Yi Yang. 2019. Progressive learning for person re-identification with one example. IEEE Trans. Image Process. 28, 6 (2019), 2872--2881.Google Scholar
Cross Ref
- Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, and Pan Zhou. 2017. Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In Proceedings of the International Conference on Computer Vision. 4743--4752.Google Scholar
Cross Ref
- Mang Ye, Xiangyuan Lan, and Pong C. Yuen. 2018. Robust anchor embedding for unsupervised video person re-identification in the wild. In Proceedings of the European Conference on Computer Vision. 170--186.Google Scholar
- Mang Ye, Jiawei Li, Andy J. Ma, Liang Zheng, and Pong C. Yuen. 2019. Dynamic graph co-matching for unsupervised video-based person re-identification. IEEE Trans. Image Process. 28, 6 (2019), 2976--2990.Google Scholar
Cross Ref
- Mang Ye, Chao Liang, Yi Yu, Zheng Wang, Qingming Leng, Chunxia Xiao, Jun Chen, and Ruimin Hu. 2016. Person re-identification via ranking aggregation of similarity pulling and dissimilarity pushing. IEEE Trans. Multimedia 18, 12 (2016), 2553--2566.Google Scholar
Digital Library
- Mang Ye, Andy J. Ma, Liang Zheng, Jiawei Li, and Pong C. Yuen. 2017. Dynamic label graph matching for unsupervised video re-identification. In Proceedings of the International Conference on Computer Vision.Google Scholar
- Jinjie You, Ancong Wu, Xiang Li, and Wei-Shi Zheng. 2016. Top-push video-based person re-identification. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- Jinjie You, Ancong Wu, Xiang Li, and Wei-Shi Zheng. 2016. Top-push video-based person re-identification. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 1345--1353.Google Scholar
Cross Ref
- Li Zhang, Tao Xiang, and Shaogang Gong. 2016. Learning a discriminative null space for person re-identification. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 1239--1248.Google Scholar
Cross Ref
- Ruimao Zhang, Liang Lin, Rui Zhang, Wangmeng Zuo, and Lei Zhang. 2015. Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification. IEEE Trans. Image Process. 24, 12 (2015), 4766--4779.Google Scholar
Digital Library
- Wei Zhang, Shengnan Hu, Kan Liu, and Zhengjun Zha. 2018. Compact appearance learning for video-based person re-identification. IEEE Trans. Circ. Syst. Video Technol. 29, 8 (2018), 2442--2452.Google Scholar
Cross Ref
- Wei Zhang, Xiaodong Yu, and Xuanyu He. 2017. Learning bidirectional temporal cues for video-based person re-identification. IEEE Trans. Circ. Syst. Video Technol. 28, 10 (2017), 2768--2776.Google Scholar
Digital Library
- Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian. 2016. MARS: A video benchmark for large-scale person re-identification. In Proceedings of the European Conference on Computer Vision.Google Scholar
Cross Ref
- Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proceedings of the International Conference on Computer Vision.Google Scholar
Cross Ref
- Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proceedings of the International Conference on Computer Vision. 1116--1124.Google Scholar
Cross Ref
- Liang Zheng, Yi Yang, and Alexander G. Hauptmann. 2016. Person re-identification: Past, present and future. arXiv preprint arXiv:1610.02984 (2016).Google Scholar
- Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In Proceedings of the International Conference on Computer Vision. 3754--3762.Google Scholar
Cross Ref
- Zhun Zhong, Liang Zheng, Donglin Cao, and Shaozi Li. 2017. Re-ranking person re-identification with k-reciprocal encoding. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 1318--1327.Google Scholar
Cross Ref
- Zhen Zhou, Yan Huang, Wei Wang, Liang Wang, and Tieniu Tan. 2017. See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In Proceedings of the Annual Conference on Computer Vision and Pattern Recognition. 6776--6785.Google Scholar
Cross Ref
- Xiaoke Zhu, Xiao-Yuan Jing, Fei Wu, and Hui Feng. 2016. Video-based person re-identification by simultaneously learning intra-video and inter-video distance metrics. In Proceedings of the International Joint Conference on Artificial Intelligence.Google Scholar
- Xiaoke Zhu, Xiao-Yuan Jing, Fei Wu, Yunhong Wang, Wangmeng Zuo, and Wei-Shi Zheng. 2017. Learning heterogeneous dictionary pair with feature projection matrix for pedestrian video retrieval via single query image. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.Google Scholar
Index Terms
Correlation Discrepancy Insight Network for Video Re-identification
Recommendations
Video person re-identification with global statistic pooling and self-attention distillation
AbstractMost existing methods for video person re-identification apply spatial-temporal global average or attention pooling to aggregate frame-level feature and get video-level feature. The obtained video-level feature models only the first-...
Deep asymmetric video-based person re-identification
Highlights- We address the “view-bias” problem, a key challenge of video-based person re-id.
AbstractIn this paper, we investigate the problem of video-based person re-identification (re-id) which matches people’s video clips across non-overlapping camera views at different time. A key challenge of video-based person re-id is a person’...
Diverse part attentive network for video-based person re-identification
Highlights- We propose a lightweight attention mechanism to exploit diverse parts of human bodies for addressing visual variations.
AbstractAttention mechanisms have achieved success in video-based person re-identification (re-ID). However, current global attentions tend to focus on the most salient parts, e.g., clothes, and ignore other subtle but valuable cues, e.g., ...






Comments