skip to main content
research-article

Delving Deeper in Drone-Based Person Re-Id by Employing Deep Decision Forest and Attributes Fusion

Authors Info & Claims
Published:25 April 2020Publication History
Skip Abstract Section

Abstract

Deep learning has revolutionized the field of computer vision and image processing. Its ability to extract the compact image representation has taken the person re-identification (re-id) problem to a new level. However, in most cases, researchers are focused on developing new approaches to extract more fruitful image representation and use it in the re-id task. The extra information about images is rarely taken into account because the traditional person re-id datasets usually do not have it. Nevertheless, the research in multimodal machine learning has demonstrated that the utilization of the information from different sources leads to better performance. In this work, we demonstrate how a person re-id problem can benefit from the utilization of multimodal data. We have used the UAV drone to collect and label the new person re-id dataset, which is composed of pedestrian images and its attributes. We have manually annotated this dataset with attributes, and in contrast to the recent research, we do not use the deep network to classify them. Instead, we employ the continuous bag-of-words model to extract the word embeddings from text descriptions and fuse it with features extracted from images. Then the deep neural decision forest is used for pedestrians classification. The extensive experiments on the collected dataset demonstrate the effectiveness of the proposed model.

References

  1. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. Enriching word vectors with subword information. arXiv:1606.04606v1.Google ScholarGoogle Scholar
  2. Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang. 2017. Beyond triplet loss: A deep quadruplet network for person re-identification. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 1320--1329. DOI:https://doi.org/10.1109/cvpr.2017.145Google ScholarGoogle ScholarCross RefCross Ref
  3. Philip Chikontwe and Hyo Lee. 2018. Deep multi-task network for learning person identity and attributes. IEEE Access 6 (2018), 60801--60811. DOI:https://doi.org/10.1109/ACCESS.2018.2875783Google ScholarGoogle ScholarCross RefCross Ref
  4. Yixiao Ge, Zhuowan Li, Haiyu Zhao, Guojun Yin, Shuai Yi, Xiaogang Wang, and Hongsheng Li. 2018. FD-GAN: Pose-guided feature distilling GAN for robust person re-identification. arXiv:1810.02936.Google ScholarGoogle Scholar
  5. Mengran Gou, Srikrishna Karanam, Wenqian Liu, Octavia Camps, and Richard J. Radke. 2017. DukeMTMC4ReID: A Large-Scale Multi-Camera Person Re-Identification Dataset. IEEE, Los Alamitos, CA. DOI:https://doi.org/10.1109/CVPRW.2017.185Google ScholarGoogle Scholar
  6. Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2.1--8. DOI:https://doi.org/10.1109/cvpr.2006.100Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. arXiv:1703.06870.Google ScholarGoogle Scholar
  8. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arXiv:1512.03385.Google ScholarGoogle Scholar
  9. Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv:1703.07737.Google ScholarGoogle Scholar
  10. Meng-Ru Hsieh, Yen-Liang Lin, and Winston H. Hsu. 2017. Drone-based object counting by spatially regularized regional proposal network. arXiv:1707.05972. DOI:https://doi.org/10.1109/iccv.2017.446Google ScholarGoogle Scholar
  11. Minyoung Huh, Pulkit Agrawal, and Alexei A. Efros. 2016. What makes ImageNet good for transfer learning? arXiv:1608.08614.Google ScholarGoogle Scholar
  12. Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980.Google ScholarGoogle Scholar
  13. Peter Kontschieder, Madalina Fiterau, Antonio Criminisi, and Samuel Bulo. 2015. Deep Neural Decision Forests. IEEE, Los Alamitos, CA. DOI:https://doi.org/10.1109/ICCV.2015.172Google ScholarGoogle Scholar
  14. M. Kostinger, M. Hirzer, P. Wohlhart, P. M. Roth, and H. Bischof. 2012. Large scale metric learning from equivalence constraints. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12), Vol. 1. 2288--2295. DOI:https://doi.org/10.1109/CVPR.2012.6247939Google ScholarGoogle ScholarCross RefCross Ref
  15. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM 60, 6 (2017), 84--90. DOI:https://doi.org/10.1145/3065386Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Yann LeCun and Yoshua Bengio. 1998. Convolutional networks for images, speech, and time series. In The Handbook of Brain Theory and Neural Networks. MIT Press, 255--258. http://dl.acm.org/citation.cfm?id=303568.303704Google ScholarGoogle Scholar
  17. Shuang Li, Tong Xiao, Hongsheng Li, Bolei Zhou, Dayu Yue, and Xiaogang Wang. 2017. Person search with natural language description. arXiv:1702.05729.Google ScholarGoogle Scholar
  18. Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. 2014. DeepReID: Deep filter pairing neural network for person re-identification. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 152--159. DOI:https://doi.org/10.1109/cvpr.2014.27Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Wei Li, Xiatian Zhu, and Shaogang Gong. 2018. Harmonious attention network for person re-identification. arXiv:1802.08122.Google ScholarGoogle Scholar
  20. Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z. Li. 2015. Person Re-Identification by Local Maximal Occurrence Representation and Metric Learning. IEEE, Los Alamitos, CA. DOI:https://doi.org/10.1109/CVPR.2015.7298832Google ScholarGoogle Scholar
  21. Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Zitnick, and Piotr Dollár. 2014. Microsoft COCO: Common objects in context. arXiv:1405.0312.Google ScholarGoogle Scholar
  22. Yutian Lin, Liang Zheng, Zhedong Zheng, Yu Wu, and Yi Yang. 2017. Improving person re-identification by attribute and identity learning. arXiv:1703.07220. DOI:https://doi.org/10.1007/978-1-4471-6296-4_6Google ScholarGoogle Scholar
  23. Jiawei Liu, Zheng-Jun Zha, Hongtao Xie, Zhiwei Xiong, and Yongdong Zhang. 2018. CA 3 Net: Contextual-Attentional Attribute-Appearance Network for Person Re-Identification. ACM, New York, NY. DOI:https://doi.org/10.1145/3240508.3240585Google ScholarGoogle Scholar
  24. Xihui Liu, Haiyu Zhao, Maoqing Tian, Lu Sheng, Jing Shao, Shuai Yi, Junjie Yan, and Xiaogang Wang. 2017. HydraPlus-Net: Attentive deep features for pedestrian analysis. arXiv:1709.09930.Google ScholarGoogle Scholar
  25. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of Neural Information Processing Systems (NIPS’13).Google ScholarGoogle Scholar
  26. Matthias Mueller, Neil Smith, and Bernard Ghanem. 2016. A benchmark and simulator for UAV tracking. In Computer Vision—ECCV 2016. Lecture Notes in Computer Science, Vol. 9905. Springer, 445--461. DOI:https://doi.org/10.1007/978-3-319-46448-0_27Google ScholarGoogle ScholarCross RefCross Ref
  27. Yuankai Qi, Shengping Zhang, Lei Qin, Hongxun Yao, Qingming Huang, Jongwoo Lim, and Ming-Hsuan Yang. 2016. Hedged deep tracking. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 4303--4311. DOI:https://doi.org/10.1109/CVPR.2016.466Google ScholarGoogle ScholarCross RefCross Ref
  28. Xuelin Qian, Yanwei Fu, Wenxuan Wang, Tao Xiang, Yang Wu, Yu-Gang Jiang, and Xiangyang Xue. 2017. Pose-Normalized image generation for person re-identification. arXiv:1712.02225.Google ScholarGoogle Scholar
  29. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. arXiv:1506.01497.Google ScholarGoogle Scholar
  30. Alexandre Robicquet, Amir Sadeghian, Alexandre Alahi, and Silvio Savarese. 2016. Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes. Vol. 9912. Springer. DOI:https://doi.org/10.1007/978-3-319-46484-8_33Google ScholarGoogle Scholar
  31. Anirban Roy. 2016. Monocular depth estimation using neural regression forest. In Proceedings of the 2016 IEEE Conference on Computer Visionand Pattern Recognition (CVPR’16). DOI:https://doi.org/10.1109/cvpr.2016.594Google ScholarGoogle ScholarCross RefCross Ref
  32. M Sarfraz, Arne Schumann, Andreas Eberle, and Rainer Stiefelhagen. 2017. A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking. arXiv:1711.10378.Google ScholarGoogle Scholar
  33. Chi Su, Jianing Li, Shiliang Zhang, Junliang Xing, Wen Gao, and Qi Tian. 2017. Pose-driven deep convolutional model for person re-identification. arXiv:1709.08325.Google ScholarGoogle Scholar
  34. Q. Tan, Y. Gao, J. Shi, X. Wang, B. Fang, and Z. Tian. 2019. Toward a comprehensive insight into the eclipse attacks of Tor hidden services. IEEE Internet of Things Journal 6, 2 (April 2019), 1584--1593. DOI:https://doi.org/10.1109/JIOT.2018.2846624Google ScholarGoogle ScholarCross RefCross Ref
  35. Z. Tian, X. Gao, S. Su, J. Qiu, X. Du, and M. Guizani. 2019a. Evaluating reputation management schemes of Internet of Vehicles based on evolutionary game theory. IEEE Transactions on Vehicular Technology 68, 6 (June 2019), 5971--5980. DOI:https://doi.org/10.1109/TVT.2019.2910217Google ScholarGoogle ScholarCross RefCross Ref
  36. Zhihong Tian, Mohan Li, Meikang Qiu, Yanbin Sun, and Shen Su. 2019b. Block-DEF: A secure digital evidence framework using blockchain. Information Sciences 491 (July 2019), 151--165. DOI:https://doi.org/10.1016/j.ins.2019.04.011Google ScholarGoogle Scholar
  37. Z. Tian, W. Shi, Y. Wang, C. Zhu, X. Du, S. Su, Y. Sun, and N. Guizani. 2019c. Real-time lateral movement detection based on evidence reasoning network for edge computing environment. IEEE Transactions on Industrial Informatics 15, 7 (July 2019), 4285--4294. DOI:https://doi.org/10.1109/TII.2019.2907754Google ScholarGoogle ScholarCross RefCross Ref
  38. Zhihong Tian, Shen Su, Wei Shi, Xiaojiang Du, Mohsen Guizani, and Xiang Yu. 2019d. A data-driven method for future Internet route decision modeling. Future Generation Computer Systems 95 (June 2019), 212--220. DOI:https://doi.org/10.1016/j.future.2018.12.054Google ScholarGoogle Scholar
  39. Taiqing Wang, Shaogang Gong, Xiatian Zhu, and Shengjin Wang. 2014. Person Re-Identification by Video Ranking. Springer. DOI:https://doi.org/10.1007/978-3-319-10593-2_45Google ScholarGoogle Scholar
  40. Zheng Wang, Xiang Bai, Mang Ye, and Shin’ichi Satoh. 2018. Incremental Deep Hidden Attribute Learning. ACM, New York, NY. DOI:https://doi.org/10.1145/3240508.3240510Google ScholarGoogle Scholar
  41. Longhui Wei, Shiliang Zhang, Wen Gao, and Qi Tian. 2017a. Person transfer GAN to bridge domain gap for person re-identification. arXiv:1711.08565.Google ScholarGoogle Scholar
  42. Longhui Wei, Shiliang Zhang, Hantao Yao, Wen Gao, and Qi Tian. 2017b. GLAD: Global-Local-Alignment Descriptor for Pedestrian Retrieval. ACM, New York, NY. DOI:https://doi.org/10.1145/3123266.3123279Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Qiqi Xiao, Hao Luo, and Chi Zhang. 2017b. Margin sample mining loss: A deep learning based method for person re-identification. arXiv:1710.00478.Google ScholarGoogle Scholar
  44. T. Xiao, S. Li, B. Wang, L. Lin, and X. Wang. 2016. End-to-end deep learning for person search. arXiv:1604.01850.Google ScholarGoogle Scholar
  45. Tong Xiao, Shuang Li, Bochao Wang, Liang Lin, and Xiaogang Wang. 2017a. Joint detection and identification feature learning for person search. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 3376--3385. DOI:https://doi.org/10.1109/cvpr.2017.360Google ScholarGoogle ScholarCross RefCross Ref
  46. Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z. Li. 2014. Deep Metric Learning for Person Re-Identification. IEEE, Los Alamitos, CA. DOI:https://doi.org/10.1109/ICPR.2014.16Google ScholarGoogle Scholar
  47. Li Zhang, Tao Xiang, and Shaogang Gong. 2016. Learning a Discriminative Null Space for Person Re-Identification. IEEE, Los Alamitos, CA. DOI:https://doi.org/10.1109/CVPR.2016.139Google ScholarGoogle Scholar
  48. S. Zhang, X. Lan, Y. Qi, and P. C. Yuen. 2017a. Robust visual tracking via basis matching. IEEE Transactions on Circuits and Systems for Video Technology 27, 3 (March 2017), 421--430. DOI:https://doi.org/10.1109/TCSVT.2016.2539860Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. S. Zhang, X. Lan, H. Yao, H. Zhou, D. Tao, and X. Li. 2017b. A biologically inspired appearance model for robust visual tracking. IEEE Transactions on Neural Networks and Learning Systems 28, 10 (Oct. 2017), 2357--2370. DOI:https://doi.org/10.1109/TNNLS.2016.2586194Google ScholarGoogle ScholarCross RefCross Ref
  50. S. Zhang, Y. Qi, F. Jiang, X. Lan, P. C. Yuen, and H. Zhou. 2018. Point-to-set distance metric learning on deep representations for visual tracking. IEEE Transactions on Intelligent Transportation Systems 19, 1 (Jan. 2018), 187--198. DOI:https://doi.org/10.1109/TITS.2017.2766093Google ScholarGoogle ScholarCross RefCross Ref
  51. Shengping Zhang, Huiyu Zhou, Feng Jiang, and Xuelong Li. 2015. Robust visual tracking using structurally random projection and weighted least squares. IEEE Transactions on Circuits and Systems for Video Technology 25, 11 (Nov. 2015), 1749--1760. DOI:https://doi.org/10.1109/TCSVT.2015.2406194Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Xuan Zhang, Hao Luo, Xing Fan, Weilai Xiang, Yixiao Sun, Qiqi Xiao, Wei Jiang, Chi Zhang, and Jian Sun. 2017. AlignedReID: Surpassing human-level performance in person re-identification. arXiv:1711.08184.Google ScholarGoogle Scholar
  53. H. Zhao, M. Tian, S. Sun, J. Shao, J. Yan, S. Yi, X. Wang, and X. Tang. 2017b. Spindle Net: Person re-identification with human body region guided feature decomposition and fusion. In Proceedings of the 2017 IEEE Conference on Computer Vision andPattern Recognition (CVPR’17). 907--915. DOI:https://doi.org/10.1109/CVPR.2017.103Google ScholarGoogle ScholarCross RefCross Ref
  54. Liming Zhao, Xi Li, Yueting Zhuang, and Jingdong Wang. 2017a. Deeply-Learned Part-Aligned Representations for Person Re-Identification. IEEE, Los Alamitos, CA. DOI:https://doi.org/10.1109/ICCV.2017.349Google ScholarGoogle Scholar
  55. Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian. 2016. MARS: A video benchmark for large-scale person re-identification. In Computer Vision—ECCV 2016. Lecture Notes in Computer Science, Vol. 9910. Springer, 868--884. DOI:https://doi.org/10.1007/978-3-319-46466-4_52Google ScholarGoogle Scholar
  56. Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable Person Re-Identification: A Benchmark. IEEE, Los Alamitos, CA. DOI:https://doi.org/10.1109/ICCV.2015.133Google ScholarGoogle Scholar
  57. L. Zheng, H. Zhang, S. Sun, M. Chandraker, Y, Yang, and Q. Tian. 2017. Person re-identification in the wild. arXiv:1604.02531. DOI:https://doi.org/10.1109/cvpr.2017.357Google ScholarGoogle Scholar
  58. Zhun Zhong, Liang Zheng, Zhedong Zheng, Shaozi Li, and Yi Yang. 2017. Camera style adaptation for person re-identification. arXiv:1711.10295v1.Google ScholarGoogle Scholar
  59. Pengfei Zhu, Longyin Wen, Xiao Bian, Haibin Ling, and Qinghua Hu. 2018. Vision meets drones: A challenge. arXiv:1804.07437.Google ScholarGoogle Scholar

Index Terms

  1. Delving Deeper in Drone-Based Person Re-Id by Employing Deep Decision Forest and Attributes Fusion

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 16, Issue 1s
      Special Issue on Multimodal Machine Learning for Human Behavior Analysis and Special Issue on Computational Intelligence for Biomedical Data and Imaging
      January 2020
      376 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3388236
      Issue’s Table of Contents

      Copyright © 2020 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 April 2020
      • Accepted: 1 September 2019
      • Revised: 1 August 2019
      • Received: 1 April 2019
      Published in tomm Volume 16, Issue 1s

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!