Abstract
Deep learning has revolutionized the field of computer vision and image processing. Its ability to extract the compact image representation has taken the person re-identification (re-id) problem to a new level. However, in most cases, researchers are focused on developing new approaches to extract more fruitful image representation and use it in the re-id task. The extra information about images is rarely taken into account because the traditional person re-id datasets usually do not have it. Nevertheless, the research in multimodal machine learning has demonstrated that the utilization of the information from different sources leads to better performance. In this work, we demonstrate how a person re-id problem can benefit from the utilization of multimodal data. We have used the UAV drone to collect and label the new person re-id dataset, which is composed of pedestrian images and its attributes. We have manually annotated this dataset with attributes, and in contrast to the recent research, we do not use the deep network to classify them. Instead, we employ the continuous bag-of-words model to extract the word embeddings from text descriptions and fuse it with features extracted from images. Then the deep neural decision forest is used for pedestrians classification. The extensive experiments on the collected dataset demonstrate the effectiveness of the proposed model.
- Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. Enriching word vectors with subword information. arXiv:1606.04606v1.Google Scholar
- Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang. 2017. Beyond triplet loss: A deep quadruplet network for person re-identification. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 1320--1329. DOI:https://doi.org/10.1109/cvpr.2017.145Google Scholar
Cross Ref
- Philip Chikontwe and Hyo Lee. 2018. Deep multi-task network for learning person identity and attributes. IEEE Access 6 (2018), 60801--60811. DOI:https://doi.org/10.1109/ACCESS.2018.2875783Google Scholar
Cross Ref
- Yixiao Ge, Zhuowan Li, Haiyu Zhao, Guojun Yin, Shuai Yi, Xiaogang Wang, and Hongsheng Li. 2018. FD-GAN: Pose-guided feature distilling GAN for robust person re-identification. arXiv:1810.02936.Google Scholar
- Mengran Gou, Srikrishna Karanam, Wenqian Liu, Octavia Camps, and Richard J. Radke. 2017. DukeMTMC4ReID: A Large-Scale Multi-Camera Person Re-Identification Dataset. IEEE, Los Alamitos, CA. DOI:https://doi.org/10.1109/CVPRW.2017.185Google Scholar
- Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2.1--8. DOI:https://doi.org/10.1109/cvpr.2006.100Google Scholar
Digital Library
- Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. arXiv:1703.06870.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arXiv:1512.03385.Google Scholar
- Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv:1703.07737.Google Scholar
- Meng-Ru Hsieh, Yen-Liang Lin, and Winston H. Hsu. 2017. Drone-based object counting by spatially regularized regional proposal network. arXiv:1707.05972. DOI:https://doi.org/10.1109/iccv.2017.446Google Scholar
- Minyoung Huh, Pulkit Agrawal, and Alexei A. Efros. 2016. What makes ImageNet good for transfer learning? arXiv:1608.08614.Google Scholar
- Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980.Google Scholar
- Peter Kontschieder, Madalina Fiterau, Antonio Criminisi, and Samuel Bulo. 2015. Deep Neural Decision Forests. IEEE, Los Alamitos, CA. DOI:https://doi.org/10.1109/ICCV.2015.172Google Scholar
- M. Kostinger, M. Hirzer, P. Wohlhart, P. M. Roth, and H. Bischof. 2012. Large scale metric learning from equivalence constraints. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12), Vol. 1. 2288--2295. DOI:https://doi.org/10.1109/CVPR.2012.6247939Google Scholar
Cross Ref
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM 60, 6 (2017), 84--90. DOI:https://doi.org/10.1145/3065386Google Scholar
Digital Library
- Yann LeCun and Yoshua Bengio. 1998. Convolutional networks for images, speech, and time series. In The Handbook of Brain Theory and Neural Networks. MIT Press, 255--258. http://dl.acm.org/citation.cfm?id=303568.303704Google Scholar
- Shuang Li, Tong Xiao, Hongsheng Li, Bolei Zhou, Dayu Yue, and Xiaogang Wang. 2017. Person search with natural language description. arXiv:1702.05729.Google Scholar
- Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. 2014. DeepReID: Deep filter pairing neural network for person re-identification. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 152--159. DOI:https://doi.org/10.1109/cvpr.2014.27Google Scholar
Digital Library
- Wei Li, Xiatian Zhu, and Shaogang Gong. 2018. Harmonious attention network for person re-identification. arXiv:1802.08122.Google Scholar
- Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z. Li. 2015. Person Re-Identification by Local Maximal Occurrence Representation and Metric Learning. IEEE, Los Alamitos, CA. DOI:https://doi.org/10.1109/CVPR.2015.7298832Google Scholar
- Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Zitnick, and Piotr Dollár. 2014. Microsoft COCO: Common objects in context. arXiv:1405.0312.Google Scholar
- Yutian Lin, Liang Zheng, Zhedong Zheng, Yu Wu, and Yi Yang. 2017. Improving person re-identification by attribute and identity learning. arXiv:1703.07220. DOI:https://doi.org/10.1007/978-1-4471-6296-4_6Google Scholar
- Jiawei Liu, Zheng-Jun Zha, Hongtao Xie, Zhiwei Xiong, and Yongdong Zhang. 2018. CA 3 Net: Contextual-Attentional Attribute-Appearance Network for Person Re-Identification. ACM, New York, NY. DOI:https://doi.org/10.1145/3240508.3240585Google Scholar
- Xihui Liu, Haiyu Zhao, Maoqing Tian, Lu Sheng, Jing Shao, Shuai Yi, Junjie Yan, and Xiaogang Wang. 2017. HydraPlus-Net: Attentive deep features for pedestrian analysis. arXiv:1709.09930.Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of Neural Information Processing Systems (NIPS’13).Google Scholar
- Matthias Mueller, Neil Smith, and Bernard Ghanem. 2016. A benchmark and simulator for UAV tracking. In Computer Vision—ECCV 2016. Lecture Notes in Computer Science, Vol. 9905. Springer, 445--461. DOI:https://doi.org/10.1007/978-3-319-46448-0_27Google Scholar
Cross Ref
- Yuankai Qi, Shengping Zhang, Lei Qin, Hongxun Yao, Qingming Huang, Jongwoo Lim, and Ming-Hsuan Yang. 2016. Hedged deep tracking. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 4303--4311. DOI:https://doi.org/10.1109/CVPR.2016.466Google Scholar
Cross Ref
- Xuelin Qian, Yanwei Fu, Wenxuan Wang, Tao Xiang, Yang Wu, Yu-Gang Jiang, and Xiangyang Xue. 2017. Pose-Normalized image generation for person re-identification. arXiv:1712.02225.Google Scholar
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. arXiv:1506.01497.Google Scholar
- Alexandre Robicquet, Amir Sadeghian, Alexandre Alahi, and Silvio Savarese. 2016. Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes. Vol. 9912. Springer. DOI:https://doi.org/10.1007/978-3-319-46484-8_33Google Scholar
- Anirban Roy. 2016. Monocular depth estimation using neural regression forest. In Proceedings of the 2016 IEEE Conference on Computer Visionand Pattern Recognition (CVPR’16). DOI:https://doi.org/10.1109/cvpr.2016.594Google Scholar
Cross Ref
- M Sarfraz, Arne Schumann, Andreas Eberle, and Rainer Stiefelhagen. 2017. A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking. arXiv:1711.10378.Google Scholar
- Chi Su, Jianing Li, Shiliang Zhang, Junliang Xing, Wen Gao, and Qi Tian. 2017. Pose-driven deep convolutional model for person re-identification. arXiv:1709.08325.Google Scholar
- Q. Tan, Y. Gao, J. Shi, X. Wang, B. Fang, and Z. Tian. 2019. Toward a comprehensive insight into the eclipse attacks of Tor hidden services. IEEE Internet of Things Journal 6, 2 (April 2019), 1584--1593. DOI:https://doi.org/10.1109/JIOT.2018.2846624Google Scholar
Cross Ref
- Z. Tian, X. Gao, S. Su, J. Qiu, X. Du, and M. Guizani. 2019a. Evaluating reputation management schemes of Internet of Vehicles based on evolutionary game theory. IEEE Transactions on Vehicular Technology 68, 6 (June 2019), 5971--5980. DOI:https://doi.org/10.1109/TVT.2019.2910217Google Scholar
Cross Ref
- Zhihong Tian, Mohan Li, Meikang Qiu, Yanbin Sun, and Shen Su. 2019b. Block-DEF: A secure digital evidence framework using blockchain. Information Sciences 491 (July 2019), 151--165. DOI:https://doi.org/10.1016/j.ins.2019.04.011Google Scholar
- Z. Tian, W. Shi, Y. Wang, C. Zhu, X. Du, S. Su, Y. Sun, and N. Guizani. 2019c. Real-time lateral movement detection based on evidence reasoning network for edge computing environment. IEEE Transactions on Industrial Informatics 15, 7 (July 2019), 4285--4294. DOI:https://doi.org/10.1109/TII.2019.2907754Google Scholar
Cross Ref
- Zhihong Tian, Shen Su, Wei Shi, Xiaojiang Du, Mohsen Guizani, and Xiang Yu. 2019d. A data-driven method for future Internet route decision modeling. Future Generation Computer Systems 95 (June 2019), 212--220. DOI:https://doi.org/10.1016/j.future.2018.12.054Google Scholar
- Taiqing Wang, Shaogang Gong, Xiatian Zhu, and Shengjin Wang. 2014. Person Re-Identification by Video Ranking. Springer. DOI:https://doi.org/10.1007/978-3-319-10593-2_45Google Scholar
- Zheng Wang, Xiang Bai, Mang Ye, and Shin’ichi Satoh. 2018. Incremental Deep Hidden Attribute Learning. ACM, New York, NY. DOI:https://doi.org/10.1145/3240508.3240510Google Scholar
- Longhui Wei, Shiliang Zhang, Wen Gao, and Qi Tian. 2017a. Person transfer GAN to bridge domain gap for person re-identification. arXiv:1711.08565.Google Scholar
- Longhui Wei, Shiliang Zhang, Hantao Yao, Wen Gao, and Qi Tian. 2017b. GLAD: Global-Local-Alignment Descriptor for Pedestrian Retrieval. ACM, New York, NY. DOI:https://doi.org/10.1145/3123266.3123279Google Scholar
Digital Library
- Qiqi Xiao, Hao Luo, and Chi Zhang. 2017b. Margin sample mining loss: A deep learning based method for person re-identification. arXiv:1710.00478.Google Scholar
- T. Xiao, S. Li, B. Wang, L. Lin, and X. Wang. 2016. End-to-end deep learning for person search. arXiv:1604.01850.Google Scholar
- Tong Xiao, Shuang Li, Bochao Wang, Liang Lin, and Xiaogang Wang. 2017a. Joint detection and identification feature learning for person search. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 3376--3385. DOI:https://doi.org/10.1109/cvpr.2017.360Google Scholar
Cross Ref
- Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z. Li. 2014. Deep Metric Learning for Person Re-Identification. IEEE, Los Alamitos, CA. DOI:https://doi.org/10.1109/ICPR.2014.16Google Scholar
- Li Zhang, Tao Xiang, and Shaogang Gong. 2016. Learning a Discriminative Null Space for Person Re-Identification. IEEE, Los Alamitos, CA. DOI:https://doi.org/10.1109/CVPR.2016.139Google Scholar
- S. Zhang, X. Lan, Y. Qi, and P. C. Yuen. 2017a. Robust visual tracking via basis matching. IEEE Transactions on Circuits and Systems for Video Technology 27, 3 (March 2017), 421--430. DOI:https://doi.org/10.1109/TCSVT.2016.2539860Google Scholar
Digital Library
- S. Zhang, X. Lan, H. Yao, H. Zhou, D. Tao, and X. Li. 2017b. A biologically inspired appearance model for robust visual tracking. IEEE Transactions on Neural Networks and Learning Systems 28, 10 (Oct. 2017), 2357--2370. DOI:https://doi.org/10.1109/TNNLS.2016.2586194Google Scholar
Cross Ref
- S. Zhang, Y. Qi, F. Jiang, X. Lan, P. C. Yuen, and H. Zhou. 2018. Point-to-set distance metric learning on deep representations for visual tracking. IEEE Transactions on Intelligent Transportation Systems 19, 1 (Jan. 2018), 187--198. DOI:https://doi.org/10.1109/TITS.2017.2766093Google Scholar
Cross Ref
- Shengping Zhang, Huiyu Zhou, Feng Jiang, and Xuelong Li. 2015. Robust visual tracking using structurally random projection and weighted least squares. IEEE Transactions on Circuits and Systems for Video Technology 25, 11 (Nov. 2015), 1749--1760. DOI:https://doi.org/10.1109/TCSVT.2015.2406194Google Scholar
Digital Library
- Xuan Zhang, Hao Luo, Xing Fan, Weilai Xiang, Yixiao Sun, Qiqi Xiao, Wei Jiang, Chi Zhang, and Jian Sun. 2017. AlignedReID: Surpassing human-level performance in person re-identification. arXiv:1711.08184.Google Scholar
- H. Zhao, M. Tian, S. Sun, J. Shao, J. Yan, S. Yi, X. Wang, and X. Tang. 2017b. Spindle Net: Person re-identification with human body region guided feature decomposition and fusion. In Proceedings of the 2017 IEEE Conference on Computer Vision andPattern Recognition (CVPR’17). 907--915. DOI:https://doi.org/10.1109/CVPR.2017.103Google Scholar
Cross Ref
- Liming Zhao, Xi Li, Yueting Zhuang, and Jingdong Wang. 2017a. Deeply-Learned Part-Aligned Representations for Person Re-Identification. IEEE, Los Alamitos, CA. DOI:https://doi.org/10.1109/ICCV.2017.349Google Scholar
- Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian. 2016. MARS: A video benchmark for large-scale person re-identification. In Computer Vision—ECCV 2016. Lecture Notes in Computer Science, Vol. 9910. Springer, 868--884. DOI:https://doi.org/10.1007/978-3-319-46466-4_52Google Scholar
- Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable Person Re-Identification: A Benchmark. IEEE, Los Alamitos, CA. DOI:https://doi.org/10.1109/ICCV.2015.133Google Scholar
- L. Zheng, H. Zhang, S. Sun, M. Chandraker, Y, Yang, and Q. Tian. 2017. Person re-identification in the wild. arXiv:1604.02531. DOI:https://doi.org/10.1109/cvpr.2017.357Google Scholar
- Zhun Zhong, Liang Zheng, Zhedong Zheng, Shaozi Li, and Yi Yang. 2017. Camera style adaptation for person re-identification. arXiv:1711.10295v1.Google Scholar
- Pengfei Zhu, Longyin Wen, Xiao Bian, Haibin Ling, and Qinghua Hu. 2018. Vision meets drones: A challenge. arXiv:1804.07437.Google Scholar
Index Terms
Delving Deeper in Drone-Based Person Re-Id by Employing Deep Decision Forest and Attributes Fusion
Recommendations
Deep Learning Based Hand Gesture Recognition and UAV Flight Controls
AbstractDynamic hand gesture recognition is a desired alternative means for human-computer interactions. This paper presents a hand gesture recognition system that is designed for the control of flights of unmanned aerial vehicles (UAV). A data ...
On Continuous Space Word Representations as Input of LSTM Language Model
SLSP 2015: Proceedings of the Third International Conference on Statistical Language and Speech Processing - Volume 9449Artificial neural networks have become the state-of-the-art in the task of language modelling whereas Long-Short Term Memory LSTM networks seem to be an efficient architecture. The continuous skip-gram and theï źcontinuous bag of words CBOW are ...
Multi-Level Fusion for Person Re-identification with Incomplete Marks
MM '15: Proceedings of the 23rd ACM international conference on MultimediaMost video surveillance suspect investigation systems rely on the videos taken in different camera views. Actually, besides the videos, in the investigation process, investigators also manually label some marks, which, albeit incomplete, can be quite ...






Comments