Abstract
Person re-identification aims at identifying a certain pedestrian across non-overlapping multi-camera networks in different time and places. Existing person re-identification approaches mainly focus on matching pedestrians on images; however, little attention has been paid to re-identify pedestrians in videos. Compared to images, video clips contain motion patterns of pedestrians, which is crucial to person re-identification. Moreover, consecutive video frames present pedestrian appearance with different body poses and from different viewpoints, providing valuable information toward addressing the challenge of pose variation, occlusion, and viewpoint change, and so on. In this article, we propose a Dense 3D-Convolutional Network (D3DNet) to jointly learn spatio-temporal and appearance representation for person re-identification in videos. The D3DNet consists of multiple three-dimensional (3D) dense blocks and transition layers. The 3D dense blocks enlarge the receptive fields of visual neurons in both spatial and temporal dimensions, leading to discriminative appearance representation as well as short-term and long-term motion patterns of pedestrians without the requirement of an additional motion estimation module. Moreover, we formulate a loss function consisting of an identification loss and a center loss to minimize intra-class variance and maximize inter-class variance simultaneously, toward addressing the challenge of large intra-class variance and small inter-class variance. Extensive experiments on two real-world video datasets of person identification, i.e., MARS and iLIDS-VID, have shown the effectiveness of the proposed approach.
- Dapeng Chen, Dan Xu, Hongsheng Li, Nicu Sebe, and Xiaogang Wang. 2018. Group consistent similarity learning via deep CRF for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8649--8658.Google Scholar
Cross Ref
- Dapeng Chen, Zejian Yuan, Badong Chen, and Nanning Zheng. 2016. Similarity learning with spatial constraints for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1268--1277.Google Scholar
Cross Ref
- Dapeng Chen, Zejian Yuan, Gang Hua, Nanning Zheng, and Jingdong Wang. 2015. Similarity learning on an explicit polynomial kernel feature map for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1565--1573.Google Scholar
Cross Ref
- Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang. 2017. Beyond triplet loss: A deep quadruplet network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2.Google Scholar
Cross Ref
- Jason V Davis, Brian Kulis, Prateek Jain, Suvrit Sra, and Inderjit S Dhillon. 2007. Information-theoretic metric learning. In Proceedings of the 24th International Conference on Machine learning. ACM, 209--216. Google Scholar
Digital Library
- Michela Farenzena, Loris Bazzani, Alessandro Perina, Vittorio Murino, and Marco Cristani. 2010. Person re-identification by symmetry-driven accumulation of local features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2360--2367.Google Scholar
Cross Ref
- Gao Huang, Zhuang Liu, Kilian Q Weinberger, and Laurens van der Maaten. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1. 3.Google Scholar
- Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2013. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (2013), 221--231. Google Scholar
Digital Library
- Yifan Jiao, Zhetao Li, Shucheng Huang, Xiaoshan Yang, Bin Liu, and Tianzhu Zhang. 2018. 3D attention-based deep ranking model for video highlight detection. IEEE Trans. Multimedia 20, 10 (2018), 2693–2705.Google Scholar
Cross Ref
- Srikrishna Karanam, Yang Li, and Richard J Radke. 2015. Person re-identification with discriminatively trained viewpoint invariant dictionaries. In Proceedings of the IEEE International Conference on Computer Vision. 4516--4524. Google Scholar
Digital Library
- Alexander Klaser, Marcin Marszałek, and Cordelia Schmid. 2008. A spatio-temporal descriptor based on 3d-gradients. In Proceedings of the BMVA Conference on British Machine Vision Conference. BMVA, 275--1.Google Scholar
Cross Ref
- Martin Koestinger, Martin Hirzer, Paul Wohlhart, Peter M Roth, and Horst Bischof. 2012. Large scale metric learning from equivalence constraints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2288--2295. Google Scholar
Digital Library
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105. Google Scholar
Digital Library
- Dangwei Li, Xiaotang Chen, Zhang Zhang, and Kaiqi Huang. 2017. Learning deep context-aware features over body and latent parts for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7398--7407.Google Scholar
Cross Ref
- Wei Li and Xiaogang Wang. 2013. Locally aligned feature transforms across views. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3594--3601. Google Scholar
Digital Library
- Zhetao Li, Jie Zhang, Kaihua Zhang, and Zhiyong Li. 2018. Visual tracking with weighted adaptive local sparse appearance model via spatio-temporal context learning. IEEE Trans. Image Process. 27, 9 (2018), 4479–4489.Google Scholar
Cross Ref
- Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2197--2206.Google Scholar
Cross Ref
- Giuseppe Lisanti, Svebor Karaman, and Iacopo Masi. 2017. Multichannel-kernel canonical correlation analysis for cross-view person reidentification. ACM Trans. Multimedia Comput. Commun. Appl. 13, 2 (2017), 13. Google Scholar
Digital Library
- Hao Liu, Zequn Jie, Karlekar Jayashree, Meibin Qi, Jianguo Jiang, Shuicheng Yan, and Jiashi Feng. 2017. Video-based person re-identification with accumulative motion context. IEEE Trans. Circ. Syst. Vid. Technol. 28, 10 (2018), 2788–2802.Google Scholar
Digital Library
- Jiawei Liu, Zheng-Jun Zha, QI Tian, Dong Liu, Ting Yao, Qiang Ling, and Tao Mei. 2016. Multi-scale triplet cnn for person re-identification. In Proceedings of the 2016 ACM on Multimedia Conference. ACM, 192--196. Google Scholar
Digital Library
- Kan Liu, Bingpeng Ma, Wei Zhang, and Rui Huang. 2015. A spatio-temporal appearance representation for video-based pedestrian re-identification. In Proceedings of the IEEE International Conference on Computer Vision. 3810--3818. Google Scholar
Digital Library
- Yu Liu, Junjie Yan, and Wanli Ouyang. 2017. Quality aware network for set to set recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4694--4703.Google Scholar
Cross Ref
- David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2 (2004), 91--110. Google Scholar
Digital Library
- Xiaolong Ma, Xiatian Zhu, Shaogang Gong, Xudong Xie, Jianming Hu, Kin-Man Lam, and Yisheng Zhong. 2017. Person re-identification by unsupervised video matching. Pattern Recogn. 65 (2017), 197--210. Google Scholar
Digital Library
- Ju Man and Bir Bhanu. 2006. Individual recognition using gait energy image. IEEE Trans. Pattern Anal. Mach. Intell. 28, 2 (2006), 316--322. Google Scholar
Digital Library
- Tetsu Matsukawa, Takahiro Okabe, Einoshin Suzuki, and Yoichi Sato. 2016. Hierarchical gaussian descriptor for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1363--1372.Google Scholar
Cross Ref
- Niall Mclaughlin, Jesus Martinez Del Rincon, and Paul Miller. 2016. Recurrent convolutional network for video-based person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1325--1334.Google Scholar
Cross Ref
- Federico Pala, Riccardo Satta, Giorgio Fumera, and Fabio Roli. 2016. Multimodal person reidentification using RGB-D cameras. IEEE Trans. Circ. Syst. Vid. Technol. 26, 4 (2016), 788--799.Google Scholar
Digital Library
- Zhiyuan Shi, Timothy M Hospedales, and Tao Xiang. 2015. Transferring a semantic representation for person re-identification and search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4184--4193.Google Scholar
Cross Ref
- Chi Su, Fan Yang, Shiliang Zhang, Qi Tian, Larry Steven Davis, and Wen Gao. 2018. Multi-task learning with low rank attribute embedding for multi-camera person re-identification. IEEE Trans. Pattern Anal. Mach. Intell. 40, 5 (2018), 1167--1181.Google Scholar
Cross Ref
- Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.Google Scholar
Cross Ref
- Yonatan Tariku Tesfaye, Eyasu Zemene, Andrea Prati, Marcello Pelillo, and Mubarak Shah. 2017. Multi-target tracking in multiple non-overlapping cameras using constrained dominant sets. arXiv preprint arXiv:1706.06196 (2017).Google Scholar
- Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 4489--4497. Google Scholar
Digital Library
- Rahul Rama Varior, Gang Wang, Jiwen Lu, and Ting Liu. 2016. Learning invariant color features for person re-identification. IEEE Trans. Image Process. 25, 7 (2016), 3395--3410. Google Scholar
Digital Library
- Faqiang Wang, Wangmeng Zuo, Liang Lin, David Zhang, and Lei Zhang. 2016. Joint learning of single-image and cross-image representations for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1288--1296.Google Scholar
Cross Ref
- Taiqing Wang, Shaogang Gong, Xiatian Zhu, and Shengjin Wang. 2016. Person re-identification by discriminative selection in video ranking. IEEE Trans. Pattern Anal. Mach. Intell. 38, 12 (2016), 2501--2514. Google Scholar
Digital Library
- Yicheng Wang, Zhenzhong Chen, Feng Wu, and Gang Wang. 2018. Person re-identification with cascaded pairwise convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1470--1478.Google Scholar
Cross Ref
- Longhui Wei, Shiliang Zhang, Wen Gao, and Qi Tian. 2018. Person transfer GAN to bridge domain gap for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Cross Ref
- Kilian Q. Weinberger and Lawrence K. Saul. 2009. Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 2 (2009), 207–244. Google Scholar
Digital Library
- Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. 2016. A discriminative feature learning approach for deep face recognition. In Proceedings of the European Conference on Computer Vision. Springer, 499--515.Google Scholar
Cross Ref
- Tong Xiao, Hongsheng Li, Wanli Ouyang, and Xiaogang Wang. 2016. Learning deep feature representations with domain guided dropout for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1249--1258.Google Scholar
Cross Ref
- Fei Xiong, Mengran Gou, Octavia Camps, and Mario Sznaier. 2014. Person re-identification using kernel-based metric learning methods. In Proceedings of European Conference on Computer Vision. Springer, 1--16.Google Scholar
Cross Ref
- Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, and Pan Zhou. 2017. Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In Proceedings of the IEEE International Conference on Computer Vision. 4743--4752.Google Scholar
Cross Ref
- Yichao Yan, Bingbing Ni, Zhichao Song, Chao Ma, Yan Yan, and Xiaokang Yang. 2016. Person re-identification via recurrent feature aggregation. In Proceedings of the European Conference on Computer Vision. 701--716.Google Scholar
Cross Ref
- Xun Yang, Meng Wang, Richang Hong, Qi Tian, and Yong Rui. 2017. Enhancing person re-identification in a self-trained subspace. ACM Trans. Multimedia Comput. Commun. Appl. 13, 3 (2017), 27. Google Scholar
Digital Library
- Yang Yang, Jimei Yang, Junjie Yan, Shengcai Liao, Dong Yi, and Stan Z Li. 2014. Salient color names for person re-identification. In Proceedings of European Conference on Computer Vision. Springer, 536--551.Google Scholar
Cross Ref
- Jinjie You, Ancong Wu, Xiang Li, and Wei-Shi Zheng. 2016. Top-push video-based person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1345--1353.Google Scholar
Cross Ref
- Li Zhang, Tao Xiang, and Shaogang Gong. 2016. Learning a discriminative null space for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1239--1248.Google Scholar
Cross Ref
- Wei Zhang, Shengnan Hu, and Kan Liu. 2017. Learning compact appearance representation for video-based person re-identification. arXiv preprint arXiv:1702.06294 (2017).Google Scholar
- Liming Zhao, Xi Li, Jingdong Wang, and Yueting Zhuang. 2017. Deeply-learned part-aligned representations for person re-identification. In Proceedings of the IEEE International Conference on Computer Vision, Vol. 8.Google Scholar
Cross Ref
- Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2014. Person re-identification by saliency learning. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2 (2014), 356--370. Google Scholar
Digital Library
- Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian. 2016. Mars: A video benchmark for large-scale person re-identification. In Proceedings of European Conference on Computer Vision. Springer, 868--884.Google Scholar
Cross Ref
- Liang Zheng, Yi Yang, and Alexander G Hauptmann. 2016. Person re-identification: Past, present and future. arXiv preprint arXiv:1610.02984 (2016).Google Scholar
- Wei-Shi Zheng, Shaogang Gong, and Tao Xiang. 2011. Person re-identification by probabilistic relative distance comparison. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 649--656. Google Scholar
Digital Library
- Wei-Shi Zheng, Shaogang Gong, and Tao Xiang. 2016. Towards open-world person re-identification by one-shot group-based verification. IEEE Trans. Pattern Anal. Mach. Intell. 38, 3 (2016), 591--606. Google Scholar
Digital Library
- Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. A discriminatively learned cnn embedding for person reidentification. ACM Trans. Multimedia Comput. Commun. Appl. 14, 1 (2017), 13. Google Scholar
Digital Library
- Zhun Zhong, Liang Zheng, Donglin Cao, and Shaozi Li. 2017. Re-ranking person re-identification with k-reciprocal encoding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3652--3661.Google Scholar
Cross Ref
- Sanping Zhou, Jinjun Wang, Jiayun Wang, Yihong Gong, and Nanning Zheng. 2017. Point to set similarity based deep feature learning for person reidentification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5028--5037.Google Scholar
- Zhen Zhou, Yan Huang, Wei Wang, Liang Wang, and Tieniu Tan. 2017. See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 6776--6785.Google Scholar
Cross Ref
- Jun Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2242--2251.Google Scholar
Index Terms
Dense 3D-Convolutional Neural Network for Person Re-Identification in Videos
Recommendations
Person re-identification using appearance classification
ICIAR'11: Proceedings of the 8th international conference on Image analysis and recognition - Volume Part IIIn this paper, we present a person re-identification method based on appearance classification. It consists a human silhouette comparison by characterizing and classification of a persons appearance (the front and the back appearance) using the ...
Deep asymmetric video-based person re-identification
Highlights- We address the “view-bias” problem, a key challenge of video-based person re-id.
AbstractIn this paper, we investigate the problem of video-based person re-identification (re-id) which matches people’s video clips across non-overlapping camera views at different time. A key challenge of video-based person re-id is a person’...
A Unified Generative Adversarial Framework for Image Generation and Person Re-identification
MM '18: Proceedings of the 26th ACM international conference on MultimediaPerson re-identification (re-id) aims to match a certain person across multiple non-overlapping cameras. It is a challenging task because the same person's appearance can be very different across camera views due to the presence of large pose ...






Comments