skip to main content
research-article

Dense 3D-Convolutional Neural Network for Person Re-Identification in Videos

Authors Info & Claims
Published:24 January 2019Publication History
Skip Abstract Section

Abstract

Person re-identification aims at identifying a certain pedestrian across non-overlapping multi-camera networks in different time and places. Existing person re-identification approaches mainly focus on matching pedestrians on images; however, little attention has been paid to re-identify pedestrians in videos. Compared to images, video clips contain motion patterns of pedestrians, which is crucial to person re-identification. Moreover, consecutive video frames present pedestrian appearance with different body poses and from different viewpoints, providing valuable information toward addressing the challenge of pose variation, occlusion, and viewpoint change, and so on. In this article, we propose a Dense 3D-Convolutional Network (D3DNet) to jointly learn spatio-temporal and appearance representation for person re-identification in videos. The D3DNet consists of multiple three-dimensional (3D) dense blocks and transition layers. The 3D dense blocks enlarge the receptive fields of visual neurons in both spatial and temporal dimensions, leading to discriminative appearance representation as well as short-term and long-term motion patterns of pedestrians without the requirement of an additional motion estimation module. Moreover, we formulate a loss function consisting of an identification loss and a center loss to minimize intra-class variance and maximize inter-class variance simultaneously, toward addressing the challenge of large intra-class variance and small inter-class variance. Extensive experiments on two real-world video datasets of person identification, i.e., MARS and iLIDS-VID, have shown the effectiveness of the proposed approach.

References

  1. Dapeng Chen, Dan Xu, Hongsheng Li, Nicu Sebe, and Xiaogang Wang. 2018. Group consistent similarity learning via deep CRF for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8649--8658.Google ScholarGoogle ScholarCross RefCross Ref
  2. Dapeng Chen, Zejian Yuan, Badong Chen, and Nanning Zheng. 2016. Similarity learning with spatial constraints for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1268--1277.Google ScholarGoogle ScholarCross RefCross Ref
  3. Dapeng Chen, Zejian Yuan, Gang Hua, Nanning Zheng, and Jingdong Wang. 2015. Similarity learning on an explicit polynomial kernel feature map for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1565--1573.Google ScholarGoogle ScholarCross RefCross Ref
  4. Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang. 2017. Beyond triplet loss: A deep quadruplet network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2.Google ScholarGoogle ScholarCross RefCross Ref
  5. Jason V Davis, Brian Kulis, Prateek Jain, Suvrit Sra, and Inderjit S Dhillon. 2007. Information-theoretic metric learning. In Proceedings of the 24th International Conference on Machine learning. ACM, 209--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Michela Farenzena, Loris Bazzani, Alessandro Perina, Vittorio Murino, and Marco Cristani. 2010. Person re-identification by symmetry-driven accumulation of local features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2360--2367.Google ScholarGoogle ScholarCross RefCross Ref
  7. Gao Huang, Zhuang Liu, Kilian Q Weinberger, and Laurens van der Maaten. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1. 3.Google ScholarGoogle Scholar
  8. Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2013. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (2013), 221--231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Yifan Jiao, Zhetao Li, Shucheng Huang, Xiaoshan Yang, Bin Liu, and Tianzhu Zhang. 2018. 3D attention-based deep ranking model for video highlight detection. IEEE Trans. Multimedia 20, 10 (2018), 2693–2705.Google ScholarGoogle ScholarCross RefCross Ref
  10. Srikrishna Karanam, Yang Li, and Richard J Radke. 2015. Person re-identification with discriminatively trained viewpoint invariant dictionaries. In Proceedings of the IEEE International Conference on Computer Vision. 4516--4524. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Alexander Klaser, Marcin Marszałek, and Cordelia Schmid. 2008. A spatio-temporal descriptor based on 3d-gradients. In Proceedings of the BMVA Conference on British Machine Vision Conference. BMVA, 275--1.Google ScholarGoogle ScholarCross RefCross Ref
  12. Martin Koestinger, Martin Hirzer, Paul Wohlhart, Peter M Roth, and Horst Bischof. 2012. Large scale metric learning from equivalence constraints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2288--2295. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Dangwei Li, Xiaotang Chen, Zhang Zhang, and Kaiqi Huang. 2017. Learning deep context-aware features over body and latent parts for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7398--7407.Google ScholarGoogle ScholarCross RefCross Ref
  15. Wei Li and Xiaogang Wang. 2013. Locally aligned feature transforms across views. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3594--3601. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Zhetao Li, Jie Zhang, Kaihua Zhang, and Zhiyong Li. 2018. Visual tracking with weighted adaptive local sparse appearance model via spatio-temporal context learning. IEEE Trans. Image Process. 27, 9 (2018), 4479–4489.Google ScholarGoogle ScholarCross RefCross Ref
  17. Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2197--2206.Google ScholarGoogle ScholarCross RefCross Ref
  18. Giuseppe Lisanti, Svebor Karaman, and Iacopo Masi. 2017. Multichannel-kernel canonical correlation analysis for cross-view person reidentification. ACM Trans. Multimedia Comput. Commun. Appl. 13, 2 (2017), 13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hao Liu, Zequn Jie, Karlekar Jayashree, Meibin Qi, Jianguo Jiang, Shuicheng Yan, and Jiashi Feng. 2017. Video-based person re-identification with accumulative motion context. IEEE Trans. Circ. Syst. Vid. Technol. 28, 10 (2018), 2788–2802.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jiawei Liu, Zheng-Jun Zha, QI Tian, Dong Liu, Ting Yao, Qiang Ling, and Tao Mei. 2016. Multi-scale triplet cnn for person re-identification. In Proceedings of the 2016 ACM on Multimedia Conference. ACM, 192--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kan Liu, Bingpeng Ma, Wei Zhang, and Rui Huang. 2015. A spatio-temporal appearance representation for video-based pedestrian re-identification. In Proceedings of the IEEE International Conference on Computer Vision. 3810--3818. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yu Liu, Junjie Yan, and Wanli Ouyang. 2017. Quality aware network for set to set recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4694--4703.Google ScholarGoogle ScholarCross RefCross Ref
  23. David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2 (2004), 91--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Xiaolong Ma, Xiatian Zhu, Shaogang Gong, Xudong Xie, Jianming Hu, Kin-Man Lam, and Yisheng Zhong. 2017. Person re-identification by unsupervised video matching. Pattern Recogn. 65 (2017), 197--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ju Man and Bir Bhanu. 2006. Individual recognition using gait energy image. IEEE Trans. Pattern Anal. Mach. Intell. 28, 2 (2006), 316--322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Tetsu Matsukawa, Takahiro Okabe, Einoshin Suzuki, and Yoichi Sato. 2016. Hierarchical gaussian descriptor for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1363--1372.Google ScholarGoogle ScholarCross RefCross Ref
  27. Niall Mclaughlin, Jesus Martinez Del Rincon, and Paul Miller. 2016. Recurrent convolutional network for video-based person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1325--1334.Google ScholarGoogle ScholarCross RefCross Ref
  28. Federico Pala, Riccardo Satta, Giorgio Fumera, and Fabio Roli. 2016. Multimodal person reidentification using RGB-D cameras. IEEE Trans. Circ. Syst. Vid. Technol. 26, 4 (2016), 788--799.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Zhiyuan Shi, Timothy M Hospedales, and Tao Xiang. 2015. Transferring a semantic representation for person re-identification and search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4184--4193.Google ScholarGoogle ScholarCross RefCross Ref
  30. Chi Su, Fan Yang, Shiliang Zhang, Qi Tian, Larry Steven Davis, and Wen Gao. 2018. Multi-task learning with low rank attribute embedding for multi-camera person re-identification. IEEE Trans. Pattern Anal. Mach. Intell. 40, 5 (2018), 1167--1181.Google ScholarGoogle ScholarCross RefCross Ref
  31. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.Google ScholarGoogle ScholarCross RefCross Ref
  32. Yonatan Tariku Tesfaye, Eyasu Zemene, Andrea Prati, Marcello Pelillo, and Mubarak Shah. 2017. Multi-target tracking in multiple non-overlapping cameras using constrained dominant sets. arXiv preprint arXiv:1706.06196 (2017).Google ScholarGoogle Scholar
  33. Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 4489--4497. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Rahul Rama Varior, Gang Wang, Jiwen Lu, and Ting Liu. 2016. Learning invariant color features for person re-identification. IEEE Trans. Image Process. 25, 7 (2016), 3395--3410. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Faqiang Wang, Wangmeng Zuo, Liang Lin, David Zhang, and Lei Zhang. 2016. Joint learning of single-image and cross-image representations for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1288--1296.Google ScholarGoogle ScholarCross RefCross Ref
  36. Taiqing Wang, Shaogang Gong, Xiatian Zhu, and Shengjin Wang. 2016. Person re-identification by discriminative selection in video ranking. IEEE Trans. Pattern Anal. Mach. Intell. 38, 12 (2016), 2501--2514. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Yicheng Wang, Zhenzhong Chen, Feng Wu, and Gang Wang. 2018. Person re-identification with cascaded pairwise convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1470--1478.Google ScholarGoogle ScholarCross RefCross Ref
  38. Longhui Wei, Shiliang Zhang, Wen Gao, and Qi Tian. 2018. Person transfer GAN to bridge domain gap for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  39. Kilian Q. Weinberger and Lawrence K. Saul. 2009. Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 2 (2009), 207–244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. 2016. A discriminative feature learning approach for deep face recognition. In Proceedings of the European Conference on Computer Vision. Springer, 499--515.Google ScholarGoogle ScholarCross RefCross Ref
  41. Tong Xiao, Hongsheng Li, Wanli Ouyang, and Xiaogang Wang. 2016. Learning deep feature representations with domain guided dropout for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1249--1258.Google ScholarGoogle ScholarCross RefCross Ref
  42. Fei Xiong, Mengran Gou, Octavia Camps, and Mario Sznaier. 2014. Person re-identification using kernel-based metric learning methods. In Proceedings of European Conference on Computer Vision. Springer, 1--16.Google ScholarGoogle ScholarCross RefCross Ref
  43. Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, and Pan Zhou. 2017. Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In Proceedings of the IEEE International Conference on Computer Vision. 4743--4752.Google ScholarGoogle ScholarCross RefCross Ref
  44. Yichao Yan, Bingbing Ni, Zhichao Song, Chao Ma, Yan Yan, and Xiaokang Yang. 2016. Person re-identification via recurrent feature aggregation. In Proceedings of the European Conference on Computer Vision. 701--716.Google ScholarGoogle ScholarCross RefCross Ref
  45. Xun Yang, Meng Wang, Richang Hong, Qi Tian, and Yong Rui. 2017. Enhancing person re-identification in a self-trained subspace. ACM Trans. Multimedia Comput. Commun. Appl. 13, 3 (2017), 27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Yang Yang, Jimei Yang, Junjie Yan, Shengcai Liao, Dong Yi, and Stan Z Li. 2014. Salient color names for person re-identification. In Proceedings of European Conference on Computer Vision. Springer, 536--551.Google ScholarGoogle ScholarCross RefCross Ref
  47. Jinjie You, Ancong Wu, Xiang Li, and Wei-Shi Zheng. 2016. Top-push video-based person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1345--1353.Google ScholarGoogle ScholarCross RefCross Ref
  48. Li Zhang, Tao Xiang, and Shaogang Gong. 2016. Learning a discriminative null space for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1239--1248.Google ScholarGoogle ScholarCross RefCross Ref
  49. Wei Zhang, Shengnan Hu, and Kan Liu. 2017. Learning compact appearance representation for video-based person re-identification. arXiv preprint arXiv:1702.06294 (2017).Google ScholarGoogle Scholar
  50. Liming Zhao, Xi Li, Jingdong Wang, and Yueting Zhuang. 2017. Deeply-learned part-aligned representations for person re-identification. In Proceedings of the IEEE International Conference on Computer Vision, Vol. 8.Google ScholarGoogle ScholarCross RefCross Ref
  51. Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2014. Person re-identification by saliency learning. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2 (2014), 356--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian. 2016. Mars: A video benchmark for large-scale person re-identification. In Proceedings of European Conference on Computer Vision. Springer, 868--884.Google ScholarGoogle ScholarCross RefCross Ref
  53. Liang Zheng, Yi Yang, and Alexander G Hauptmann. 2016. Person re-identification: Past, present and future. arXiv preprint arXiv:1610.02984 (2016).Google ScholarGoogle Scholar
  54. Wei-Shi Zheng, Shaogang Gong, and Tao Xiang. 2011. Person re-identification by probabilistic relative distance comparison. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 649--656. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Wei-Shi Zheng, Shaogang Gong, and Tao Xiang. 2016. Towards open-world person re-identification by one-shot group-based verification. IEEE Trans. Pattern Anal. Mach. Intell. 38, 3 (2016), 591--606. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. A discriminatively learned cnn embedding for person reidentification. ACM Trans. Multimedia Comput. Commun. Appl. 14, 1 (2017), 13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Zhun Zhong, Liang Zheng, Donglin Cao, and Shaozi Li. 2017. Re-ranking person re-identification with k-reciprocal encoding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3652--3661.Google ScholarGoogle ScholarCross RefCross Ref
  58. Sanping Zhou, Jinjun Wang, Jiayun Wang, Yihong Gong, and Nanning Zheng. 2017. Point to set similarity based deep feature learning for person reidentification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5028--5037.Google ScholarGoogle Scholar
  59. Zhen Zhou, Yan Huang, Wei Wang, Liang Wang, and Tieniu Tan. 2017. See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 6776--6785.Google ScholarGoogle ScholarCross RefCross Ref
  60. Jun Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2242--2251.Google ScholarGoogle Scholar

Index Terms

  1. Dense 3D-Convolutional Neural Network for Person Re-Identification in Videos

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Multimedia Computing, Communications, and Applications
          ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 15, Issue 1s
          Special Section on Deep Learning for Intelligent Multimedia Analytics and Special Section on Multi-Modal Understanding of Social, Affective and Subjective Attributes of Data
          January 2019
          265 pages
          ISSN:1551-6857
          EISSN:1551-6865
          DOI:10.1145/3309769
          Issue’s Table of Contents

          Copyright © 2019 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 24 January 2019
          • Accepted: 1 June 2018
          • Revised: 1 April 2018
          • Received: 1 October 2017
          Published in tomm Volume 15, Issue 1s

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!