Abstract
The superiority of deeply learned pedestrian representations has been reported in very recent literature of person re-identification (re-ID). In this article, we consider the more pragmatic issue of learning a deep feature with no or only a few labels. We propose a progressive unsupervised learning (PUL) method to transfer pretrained deep representations to unseen domains. Our method is easy to implement and can be viewed as an effective baseline for unsupervised re-ID feature learning. Specifically, PUL iterates between (1) pedestrian clustering and (2) fine-tuning of the convolutional neural network (CNN) to improve the initialization model trained on the irrelevant labeled dataset. Since the clustering results can be very noisy, we add a selection operation between the clustering and fine-tuning. At the beginning, when the model is weak, CNN is fine-tuned on a small amount of reliable examples that locate near to cluster centroids in the feature space. As the model becomes stronger, in subsequent iterations, more images are being adaptively selected as CNN training samples. Progressively, pedestrian clustering and the CNN model are improved simultaneously until algorithm convergence. This process is naturally formulated as self-paced learning. We then point out promising directions that may lead to further improvement. Extensive experiments on three large-scale re-ID datasets demonstrate that PUL outputs discriminative features that improve the re-ID accuracy. Our code has been released at https://github.com/hehefan/Unsupervised-Person-Re-identification-Clustering-and-Fine-tuning.
- David Arthur and Sergei Vassilvitskii. 2007. k-means++: The advantages of careful seeding. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’07). 1027--1035. Google Scholar
Digital Library
- Boris Babenko, Ming-Hsuan Yang, and Serge J. Belongie. 2009. Visual tracking with online multiple instance learning. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’09). 983--990. Retrieved fromGoogle Scholar
- Song Bai, Xiang Bai, and Qi Tian. 2017. Scalable person re-identification on supervised smoothed manifold. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 3356--3365. Retrieved fromGoogle Scholar
Cross Ref
- S. Bai, X. Bai, Q. Tian, and L. J. Latecki. 2018. Regularized diffusion process on bidirectional context for object retrieval. IEEE Trans. Pattern Anal. Mach. Intell. (2018). Retrieved fromGoogle Scholar
- Song Bai, Zhichao Zhou, Jingdong Wang, Xiang Bai, Longin Jan Latecki, and Qi Tian. 2017. Ensemble diffusion for retrieval. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 774--783. Retrieved fromGoogle Scholar
Cross Ref
- Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML’09). 41--48. Retrieved from Google Scholar
Digital Library
- Dapeng Chen, Zejian Yuan, Badong Chen, and Nanning Zheng. 2016. Similarity learning with spatial constraints for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1268--1277. Retrieved fromGoogle Scholar
Cross Ref
- Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang. 2017. Beyond triplet loss: A deep quadruplet network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 1320--1329. Retrieved fromGoogle Scholar
Cross Ref
- De Cheng, Yihong Gong, Sanping Zhou, Jinjun Wang, and Nanning Zheng. 2016. Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1335--1344. Retrieved fromGoogle Scholar
Cross Ref
- Cheng Deng, Zhaojia Chen, Xianglong Liu, Xinbo Gao, and Dacheng Tao. 2018. Triplet-based deep hashing network for cross-modal retrieval. IEEE Trans. Image Processing 27, 8 (2018), 3893--3903. Retrieved fromGoogle Scholar
Cross Ref
- Xuanyi Dong, Liang Zheng, Fan Ma, Yi Yang, and Deyu Meng. 2018. Few-example object detection with model communication. IEEE Trans. Pattern Anal. Mach. Intell. (2018). Retrieved fromGoogle Scholar
- Hehe Fan, Xiaojun Chang, De Cheng, Yi Yang, Dong Xu, and Alexander G. Hauptmann. 2017. Complex event detection by identifying reliable shots from untrimmed videos. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 736--744. Retrieved fromGoogle Scholar
- Michela Farenzena, Loris Bazzani, Alessandro Perina, Vittorio Murino, and Marco Cristani. 2010. Person re-identification by symmetry-driven accumulation of local features. In Proceedings of the 23rd IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). 2360--2367. Retrieved fromGoogle Scholar
Cross Ref
- Pedro F. Felzenszwalb, Ross B. Girshick, David A. McAllester, and Deva Ramanan. {n.d.}. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9, 1627--1645. Retrieved from Google Scholar
Digital Library
- Mengyue Geng, Yaowei Wang, Tao Xiang, and Yonghong Tian. 2016. Deep transfer learning for person re-identification. arXiv abs/1611.05244.Google Scholar
- Douglas Gray and Hai Tao. 2008. Viewpoint invariant pedestrian recognition with an ensemble of localized features. In Proceedings of the 10th European Conference on Computer Vision (ECCV’08). 262--275. Retrieved from Google Scholar
Digital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770--778. Retrieved fromGoogle Scholar
Cross Ref
- Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv abs/1703.07737.Google Scholar
- Lu Jiang, Deyu Meng, Shoou-I Yu, Zhen-Zhong Lan, Shiguang Shan, and Alexander G. Hauptmann. 2014. Self-paced learning with diversity. In Proceedings of the Annual Conference on Neural Information Processing Systems. 2078--2086. Google Scholar
Digital Library
- Elyor Kodirov, Tao Xiang, Zhen-Yong Fu, and Shaogang Gong. 2016. Person re-identification by unsupervised l<sub>1</sub> graph learning. In Proceedings of the 14th European Conference on Computer Vision (ECCV’16). 178--195. Retrieved fromGoogle Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems. 1106--1114. Google Scholar
Digital Library
- M. Pawan Kumar, Benjamin Packer, and Daphne Koller. 2010. Self-paced learning for latent variable models. In Proceedings of the 24th Annual Conference on Neural Information Processing Systems. 1189--1197. Google Scholar
Digital Library
- Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. 2014. DeepReID: Deep filter pairing neural network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 152--159. Retrieved from Google Scholar
Digital Library
- Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z. Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 2197--2206. Retrieved fromGoogle Scholar
- Yutian Lin, Liang Zheng, Zhedong Zheng, Yu Wu, and Yi Yang. 2017. Improving person re-identification by attribute and identity learning. arXiv abs/1703.07220.Google Scholar
- Chunxiao Liu, Chen Change Loy, Shaogang Gong, and Guijin Wang. 2013. POP: Person re-identification post-rank optimisation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13). 441--448. Retrieved from Google Scholar
Digital Library
- Hao Liu, Jiashi Feng, Meibin Qi, Jianguo Jiang, and Shuicheng Yan. 2017. End-to-end comparative attention networks for person re-identification. IEEE Trans. Image Processing 26, 7 (2017), 3492--3506. Retrieved fromGoogle Scholar
Digital Library
- Jiawei Liu, Zheng-Jun Zha, Q. I. Tian, Dong Liu, Ting Yao, Qiang Ling, and Tao Mei. 2016. Multi-scale triplet CNN for person re-identification. In Proceedings of the 2016 ACM Conference on Multimedia Conference (MM’16). 192--196. Retrieved from Google Scholar
Digital Library
- Xinchen Liu, Wu Liu, Tao Mei, and Huadong Ma. 2018. PROVID: Progressive and multimodal vehicle reidentification for large-scale urban surveillance. IEEE Trans. Multimedia 20, 3 (2018), 645--658. Retrieved from Google Scholar
Digital Library
- Fan Ma, Deyu Meng, Qi Xie, Zina Li, and Xuanyi Dong. 2017. Self-paced co-training. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). 2275--2284.Google Scholar
Digital Library
- Xiaolong Ma, Xiatian Zhu, Shaogang Gong, Xudong Xie, Jianming Hu, Kin-Man Lam, and Yisheng Zhong. 2017. Person re-identification by unsupervised video matching. Pattern Recogn. 65 (2017), 197--210. Retrieved from Google Scholar
Digital Library
- Zhigang Ma, Xiaojun Chang, Yi Yang, Nicu Sebe, and Alexander G. Hauptmann. 2017. The many shades of negativity. IEEE Trans. Multimedia 19, 7 (2017), 1558--1568. Retrieved fromGoogle Scholar
Digital Library
- Peixi Peng, Tao Xiang, Yaowei Wang, Massimiliano Pontil, Shaogang Gong, Tiejun Huang, and Yonghong Tian. 2016. Unsupervised cross-dataset transfer learning for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1306--1315. Retrieved fromGoogle Scholar
Cross Ref
- Filip Radenovic, Giorgos Tolias, and Ondrej Chum. 2016. CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples. In Proceedings of the 14th European Conference on Computer Vision (ECCV’16). 3--20. Retrieved fromGoogle Scholar
Cross Ref
- Ergys Ristani, Francesco Solera, Roger S. Zou, Rita Cucchiara, and Carlo Tomasi. 2016. Performance measures and a data set for multi-target, multi-camera tracking. In Proceedings of the European Conference on Computer Vision (ECCV’16). 17--35. Retrieved fromGoogle Scholar
Cross Ref
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-Fei Li. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115, 3 (2015), 211--252. Retrieved from Google Scholar
Digital Library
- Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 815--823. Retrieved fromGoogle Scholar
Cross Ref
- Yifan Sun, Liang Zheng, Weijian Deng, and Shengjin Wang. 2017. SVDNet for pedestrian retrieval. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 3820--3828. Retrieved fromGoogle Scholar
Cross Ref
- Rahul Rama Varior, Mrinal Haloi, and Gang Wang. 2016. Gated siamese convolutional neural network architecture for human re-identification. In Proceedings of the 14th European Conference on Computer Vision (ECCV’16). 791--808. Retrieved fromGoogle Scholar
Cross Ref
- Hanxiao Wang, Shaogang Gong, Xiatian Zhu, and Tao Xiang. 2016. Human-in-the-loop person re-identification. In Proceedings of the 14th European Conference on Computer Vision (ECCV’16). 405--422. Retrieved fromGoogle Scholar
Cross Ref
- Taiqing Wang, Shaogang Gong, Xiatian Zhu, and Shengjin Wang. 2014. Person re-identification by video ranking. In Proceedings of the 13th European Conference on Computer Vision (ECCV’14). 688--703. Retrieved fromGoogle Scholar
Cross Ref
- Longhui Wei, Shiliang Zhang, Wen Gao, and Qi Tian. 2017. Person transfer GAN to bridge domain gap for person re-identification. arXiv abs/1711.08565.Google Scholar
- Yu Wu, Yutian Lin, Xuanyi Dong, Yan Yan, Wanli Ouyang, and Yi Yang. 2018. Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 5177--5186.Google Scholar
Cross Ref
- Tong Xiao, Hongsheng Li, Wanli Ouyang, and Xiaogang Wang. 2016. Learning deep feature representations with domain guided dropout for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1249--1258. Retrieved fromGoogle Scholar
Cross Ref
- Tong Xiao, Shuang Li, Bochao Wang, Liang Lin, and Xiaogang Wang. 2017. Joint detection and identification feature learning for person search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 3376--3385. Retrieved fromGoogle Scholar
Cross Ref
- Chenggang Yan, Hongtao Xie, Shun Liu, Jian Yin, Yongdong Zhang, and Qionghai Dai. 2018. Effective uyghur language text detection in complex background images for traffic prompt identification. IEEE Trans. Intell. Transport. Syst. 19, 1 (2018), 220--229. Retrieved fromGoogle Scholar
Cross Ref
- Chenggang Yan, Hongtao Xie, Dongbao Yang, Jian Yin, Yongdong Zhang, and Qionghai Dai. 2018. Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Trans. Intell. Transport. Syst. 19, 1 (2018), 284--295. Retrieved fromGoogle Scholar
Cross Ref
- Xun Yang, Meng Wang, Richang Hong, Qi Tian, and Yong Rui. {n. d.}. Enhancing person re-identification in a self-trained subspace. TOMCCAP 13, 3, 27:1--27:23. Retrieved from Google Scholar
Digital Library
- Yi Yang, Zhigang Ma, Alexander G. Hauptmann, and Nicu Sebe. 2013. Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans. Multimedia 15, 3 (2013), 661--669. Retrieved from Google Scholar
Digital Library
- Yang Yang, Longyin Wen, Siwei Lyu, and Stan Z. Li. 2017. Unsupervised learning of multi-level descriptors for person re-identification. In Proceedings of the 21st AAAI Conference on Artificial Intelligence. 4306--4312.Google Scholar
- Mang Ye, Chao Liang, Yi Yu, Zheng Wang, Qingming Leng, Chunxia Xiao, Jun Chen, and Ruimin Hu. 2016. Person reidentification via ranking aggregation of similarity pulling and dissimilarity pushing. IEEE Trans. Multimedia 18, 12 (2016), 2553--2566. Retrieved from Google Scholar
Digital Library
- Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z. Li. 2014. Deep metric learning for person re-identification. In Proceedings of the 22nd International Conference on Pattern Recognition (ICPR’14). 34--39. Retrieved from Google Scholar
Digital Library
- Li Zhang, Tao Xiang, and Shaogang Gong. 2016. Learning a discriminative null space for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1239--1248. Retrieved fromGoogle Scholar
Cross Ref
- Ying Zhang, Baohua Li, Huchuan Lu, Atshushi Irie, and Xiang Ruan. 2016. Sample-specific SVM learning for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1278--1287. Retrieved fromGoogle Scholar
Cross Ref
- Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2013. Person re-identification by salience matching. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13). 2528--2535. Retrieved from Google Scholar
Digital Library
- Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2013. Unsupervised salience learning for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3586--3593. Retrieved from Google Scholar
Digital Library
- Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2014. Learning mid-level filters for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 144--151. Retrieved from Google Scholar
Digital Library
- Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian. 2016. MARS: A video benchmark for large-scale person re-identification. In Proceedings of the 14th European Conference On Compuer Vision (ECCV’16). 868--884. Retrieved fromGoogle Scholar
Cross Ref
- Liang Zheng, Yujia Huang, Huchuan Lu, and Yi Yang. 2017. Pose invariant embedding for deep person re-identification. arXiv abs/1701.07732 (2017).Google Scholar
- Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). 1116--1124. Retrieved from Google Scholar
Digital Library
- Liang Zheng, Shengjin Wang, Lu Tian, Fei He, Ziqiong Liu, and Qi Tian. 2015. Query-adaptive late fusion for image search and person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 1741--1750. Retrieved fromGoogle Scholar
Cross Ref
- Liang Zheng, Yi Yang, and Alexander G. Hauptmann. 2016. Person re-identification: Past, present and future. arXiv abs/1610.02984 (2016).Google Scholar
- Liang Zheng, Yi Yang, and Qi Tian. 2018. SIFT meets CNN: A decade survey of instance retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 40, 5 (2018), 1224--1244. Retrieved fromGoogle Scholar
Cross Ref
- Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 3774--3782. Retrieved fromGoogle Scholar
Cross Ref
- Zhun Zhong, Liang Zheng, Donglin Cao, and Shaozi Li. 2017. Re-ranking person re-identification with k-reciprocal encoding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 3652--3661. Retrieved fromGoogle Scholar
Cross Ref
Index Terms
Unsupervised Person Re-identification: Clustering and Fine-tuning
Recommendations
A loss combination based deep model for person re-identification
The Convolutional Neural Network (CNN) has significantly improved the state-of-the-art in person re-identification (re-ID). In the existing available identification CNN model, the softmax loss function is employed as the supervision signal to train the ...
Unsupervised Person Re-Identification via Multi-Label Classification
AbstractThe challenge of unsupervised person re-identification (ReID) lies in learning discriminative features without true labels. Most of previous works predict single-class pseudo labels through clustering. To improve the quality of generated pseudo ...
A Deep Clustering-Guide Learning for Unsupervised Person Re-identification
Neural Information ProcessingAbstractUnsupervised person re-identification (RE-ID) has attracted increasing attentions due to its ability to overcome the scalability problem of supervised RE-ID methods. However, it is hard to learn discriminative features without pairwise labels and ...






Comments