skip to main content
research-article

Multitarget Tracking Using Siamese Neural Networks

Authors Info & Claims
Published:18 May 2021Publication History
Skip Abstract Section

Abstract

In this article, we detect and track visual objects by using Siamese network or twin neural network. The Siamese network is constructed to classify moving objects based on the associations of object detection network and object tracking network, which are thought of as the two branches of the twin neural network. The proposed tracking method was designed for single-target tracking, which implements multitarget tracking by using deep neural networks and object detection. The contributions of this article are stated as follows. First, we implement the proposed method for visual object tracking based on multiclass classification using deep neural networks. Then, we attain multitarget tracking by combining the object detection network and the single-target tracking network. Next, we uplift the tracking performance by fusing the outcomes of the object detection network and object tracking network. Finally, we speculate on the object occlusion problem based on IoU and similarity score, which effectively diminish the influence of this issue in multitarget tracking.

References

  1. D. S. Bolme, J. R. Beveridge, B. A. Draper, and Y. M. Lui. 2010. Visual object tracking using adaptive correlation filters. In Proceedings of IEEE CVPR. 2544–2550.Google ScholarGoogle Scholar
  2. J. F. Henriques, R. Caseiro, P. Martins, and J. Batista. 2014. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 3 (2014), 583–596.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Danelljan, G. Hager, F. Shahbaz Khan, and M. Felsberg. 2015. Learning spatially regularized correlation filters for visual tracking. In Proceedings of IEEE ICCV. 4310–4318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg. 2016. SSD: Single shot multibox detector. In Proceedings of ECCV. 21–37.Google ScholarGoogle Scholar
  5. B. Li, J. Yan, W. Wu, Z. Zhu, and X. Hu. 2018. High performance visual tracking with Siamese region proposal network. In Proceedings of IEEE CVPR. 8971–8980.Google ScholarGoogle Scholar
  6. R. Girshick, J. Donahue, T. Darrell, and J. Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of IEEE CVPR. 580–587. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Girshick. 2015. Fast R-CNN. In Proceedings of IEEE ICCV. 1440–1448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Danelljan, A. Robinson, F. S. Khan, and M. Felsberg. 2016. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In Proceedings of ECCV. 472–488.Google ScholarGoogle Scholar
  9. N. Wojke, A. Bewley, and D. Paulus. 2017. Simple online and realtime tracking with a deep association metric. In Proceedings of IEEE ICIP. 3645–3649.Google ScholarGoogle Scholar
  10. M. Danelljan, G. Bhat, F. Shahbaz Khan, and M. Felsberg. 2017. ECO: Efficient convolution operators for tracking. In Proceedings of IEEE CVPR. 6638–6646.Google ScholarGoogle Scholar
  11. L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. Torr. 2016. Fully-convolutional Siamese networks for object tracking. In Proceedings of ECCV. 850–865.Google ScholarGoogle Scholar
  12. S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. C. Lee and W. G. Chen. 1999. U.S. Patent No. 5,970,173. Washington, DC: U.S. Patent and Trademark Office.Google ScholarGoogle Scholar
  14. J. Zhu, H. Yang, N. Liu, M. Kim, W. Zhang, and M. H. Yang. 2018. Online multi-object tracking with dual matching attention networks. In Proceedings of ECCV. 366–382.Google ScholarGoogle Scholar
  15. Q. Chu, W. Ouyang, H. Li, X. Wang, B. Liu, and N. Yu. 2017. Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In Proceedings of IEEE ICCV. 4836–4845.Google ScholarGoogle Scholar
  16. Z. Huang, J. Zhan, H. Zhao, K. Lin, P. Zheng, and J. Lv. 2019. Real-time visual tracking base on SiamRPN with generalized intersection over union. In Proceedings of BICS. 96–105.Google ScholarGoogle Scholar
  17. S. Cui, S. Tian, and X. Yin. 2019. Combined correlation filters with Siamese region proposal network for visual tracking. In Proceedings of ICONIP. 128–138.Google ScholarGoogle Scholar
  18. W. Feng, Z. Hu, W. Wu, J. Yan, and W. Ouyang. 2019. Multi-object tracking with multiple cues and switcher-aware classification. arXiv:1901.06129Google ScholarGoogle Scholar
  19. A. Milan, L. Leal-Taixé, I. Reid, S. Roth, and K. Schindler. 2016. MOT16: A benchmark for multi object tracking. arXiv:1603.00831Google ScholarGoogle Scholar
  20. L. Wen, D. Du, Z. Cai, Z. Lei, M. C. Chang, H. Qi, and S. Lyu. 2015. UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking. arXiv:1511.04136Google ScholarGoogle Scholar
  21. S. S. Deutsch. 2019. Siamese Networks for Visual Object Tracking. Ph.D. Dissertation. Universitat Politècnica de Catalunya, Escola Tècnica Superior d'Enginyeria de Telecomunicació de Barcelona, Spain.Google ScholarGoogle Scholar
  22. M. Z. Alom, T. M. Taha, C. Yakopcic, S. Westberg, P. Sidike, M. S. Nasrin, and V. K. Asari. 2018. The history began from AlexNet: A comprehensive survey on deep learning approaches. arXiv:1803.01164Google ScholarGoogle Scholar
  23. Z. Huang, J. Zhan, H. Zhao, K. Lin, P. Zheng, and J. Lv. 2019. Real-time visual tracking base on SiamRPN with generalized intersection over union. In Proceedings of BICS. 96–105.Google ScholarGoogle Scholar
  24. Z. Zhang and H. Peng. 2019. Deeper and wider Siamese networks for real-time visual tracking. In Proceedings of IEEE CVPR. 4591–4600.Google ScholarGoogle Scholar
  25. B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, and J. Yan. 2019. SiamRPN++: Evolution of Siamese visual tracking with very deep networks. In Proceedings of IEEE CVPR. 4282–4291.Google ScholarGoogle Scholar
  26. B. Li, J. Yan, W. Wu, Z. Zhu, and X. Hu. 2018. High performance visual tracking with Siamese region proposal network. In Proceedings of IEEE CVPR. 8971–8980.Google ScholarGoogle Scholar
  27. D. Li, X. Wang, and Y. Yu. 2019. Siamese visual tracking with deep features and robust feature fusion. In Proceedings of IEEE ICCE-Asia. 16–34.Google ScholarGoogle Scholar
  28. L. Zheng, M. Tang, Y. Chen, J. Wang, and H. Lu. 2020. Siamese deformable cross-correlation network for real-time visual tracking. Neurocomputing 401 (2020), 36–47.Google ScholarGoogle ScholarCross RefCross Ref
  29. R. D. Keane and R. J. Adrian. 1992. Theory of cross-correlation analysis of PIV images. Applied Scientific Research 49, 3 (1992), 191–215.Google ScholarGoogle ScholarCross RefCross Ref
  30. N. Dehak, R. Dehak, J. R. Glass, D. A. Reynolds, and P. Kenny. 2010. Cosine similarity scoring without score normalization techniques. In Proceedings of Odyssey. 15.Google ScholarGoogle Scholar
  31. B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, and J. Yan. 2019. SiamRPN++: Evolution of Siamese visual tracking with very deep networks. In Proceedings of IEEE CVPR. 4282–4291.Google ScholarGoogle Scholar
  32. L. I. Kuncheva. 2010. Full-class set classification using the Hungarian algorithm. International Journal of Machine Learning and Cybernetics 1, 1-4 (2010), 53–61.Google ScholarGoogle ScholarCross RefCross Ref
  33. R. T. Collins, A. J. Lipton, T. Kanade, H. Fujiyoshi, D. Duggins, Y. Tsin, and L. Wixson. 2000. A System for Video Surveillance and Monitoring. Final Report. VSAM.Google ScholarGoogle Scholar
  34. F. Bashir and F. Porikli. 2006. Performance evaluation of object detection and tracking systems. In Proceedings of IEEE PETS. 7–14.Google ScholarGoogle Scholar
  35. A. S. Abdel-Aziz, A. E. Hassanien, A. T. Azar, and S. E. O. Hanafi. 2013. Machine learning techniques for anomalies detection and classification. In Proceedings of SecNet. 219–229.Google ScholarGoogle ScholarCross RefCross Ref
  36. E. Bochinski, T. Senst, and T. Sikora. 2018. Extending IoU based multi-object tracking by visual information. In Proceedings of IEEE AVSS. 1–6.Google ScholarGoogle Scholar
  37. G. Chandan, A. Jain, and H. Jain. 2018. Real time object detection and tracking using deep learning and OpenCV. In Proceedings of ICIRCA. 1305–1308.Google ScholarGoogle Scholar
  38. W. Lotter, G. Kreiman, and D. Cox. 2015. Unsupervised learning of visual structure using predictive generative networks. arXiv:1511.06380Google ScholarGoogle Scholar
  39. M. J. Shafiee, B. Chywl, F. Li, and A. Wong. 2017. Fast YOLO: A fast you only look once system for real-time embedded object detection in video. arXiv:1709.05943Google ScholarGoogle Scholar
  40. R. R. Varior, B. Shuai, J. Lu, D. Xu, and G. Wang. 2016. A Siamese long short-term memory architecture for human re-identification. In Proceedings of ECCV. 135–153.Google ScholarGoogle Scholar
  41. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of IEEE ICCV. 618–626.Google ScholarGoogle Scholar
  42. M. D. Zeiler and R. Fergus. 2014. Visualizing and understanding convolutional networks. In Proceedings of ECCV. 818–833.Google ScholarGoogle Scholar
  43. L. Lin, G. Wang, W. Zuo, X. Feng, and L. Zhang. 2016. Cross-domain visual matching via generalized similarity measure and feature learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2016), 1089–1102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. R. Jonker and T. Volgenant. 1986. Improving the Hungarian assignment algorithm. Operations Research Letters 5, 4 (1986), 171–175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. S. C. Wong, A. Gatt, V. Stamatescu, and M. D. McDonnell. 2016. Understanding data augmentation for classification: When to warp? In Proceedings of DICTA. 1–6.Google ScholarGoogle Scholar
  46. Y. Wu, J. Lim, and M. H. Yang. 2015. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 9 (2015), 1834–1848.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Z. Zhang, S. Qiao, C. Xie, W. Shen, B. Wang, and A. L. Yuille. 2018. Single-shot object detection with enriched semantics. In Proceedings of IEEE CVPR. 5813–5821.Google ScholarGoogle Scholar
  48. J. Zhu, H. Yang, N. Liu, M. Kim, W. Zhang, and M. H. Yang. 2018. Online multi-object tracking with dual matching attention networks. In Proceedings of ECCV. 366–382.Google ScholarGoogle Scholar
  49. S. Tang, M. Andriluka, B. Andres, and B. Schiele. 2017. Multiple people tracking by lifted multicut and person re-identification. In Proceedings of IEEE CVPR. 3539–3548.Google ScholarGoogle Scholar
  50. C. Shen, Z. Jin, Y. Zhao, Z. Fu, R. Jiang, Y. Chen, and X. S. Hua. 2017. Deep Siamese network with multi-level similarity perception for person re-identification. In Proceedings of ACM MM. 1942–1950. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. A. Milan, S. H. Rezatofighi, A. Dick, I. Reid, and K. Schindler. 2017. Online multi-target tracking using recurrent neural networks. In Proceedings of AAAI. 4225—4232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Z. He, J. Li, D. Liu, H. He, and D. Barber. 2019. Tracking by animation: Unsupervised learning of multi-object attentive trackers. In Proceedings of IEEE CVPR. 1318–1327.Google ScholarGoogle Scholar
  53. Y. C. Yoon, D. Y. Kim, K. Yoon, Y. M. Song, and M. Jeon. 2019. Online multiple pedestrian tracking using deep temporal appearance matching association. arXiv:1907.00831Google ScholarGoogle Scholar
  54. W. Feng, Z. Hu, W. Wu, J. Yan, and W. Ouyang. 2019. Multi-object tracking with multiple cues and switcher-aware classification. arXiv:1901.06129Google ScholarGoogle Scholar
  55. C. Yan, B. Gong, Y. Wei, and Y. Gao. 2020. Deep multi-view enhancement hashing for image retrieval. arXiv:2002.00169Google ScholarGoogle Scholar
  56. A. Milan, L. Leal-Taixé, I. Reid, S. Roth, and K. Schindler. 2016. MOT16: A benchmark for multi-object tracking. arXiv:1603.00831Google ScholarGoogle Scholar
  57. W. Luo, J. Xing, A. Milan, X. Zhang, W. Liu, X. Zhao, and T. K. Kim. 2014. Multiple object tracking: A literature review. arXiv:1409.7618Google ScholarGoogle Scholar
  58. Y. Zhang, D. Wang, L. Wang, J. Qi, and H. Lu. 2018. Learning regression and verification networks for long-term visual tracking. arXiv:1809.04320Google ScholarGoogle Scholar
  59. A. Sadeghian, A. Alahi, and S. Savarese. 2017. Tracking the untrackable: Learning to track multiple cues with long-term dependencies. In Proceedings of ICCV. 300–311.Google ScholarGoogle Scholar
  60. J. Yin, W. Wang, Q. Meng, R. Yang, and J. Shen. 2020. A unified object motion and affinity model for online multi-object tracking. In Proceedings of CVPR. 6768–6777.Google ScholarGoogle Scholar
  61. P. Chu, H. Fan, C. C. Tan, and H. Ling. 2019. Online multi-object tracking with instance-aware tracker and dynamic model refreshment. In Proceedings of IEEE WACV. 161–170.Google ScholarGoogle Scholar
  62. P. Chu and H. Ling. 2019. FAMNet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In Proceedings of ICCV. 6172–6181.Google ScholarGoogle Scholar
  63. N. An. 2020. Anomalies Detection and Tracking Using Siamese Neural Networks. Master's Thesis. Auckland University of Technology, New Zealand.Google ScholarGoogle Scholar
  64. W. Yan. 2020. Computational Methods for Deep Learning. Springer.Google ScholarGoogle Scholar
  65. W. Yan. 2019. Introduction to Intelligent Surveillance—Data Capture, Transmission, and Analytics (3rd ed.). Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Multitarget Tracking Using Siamese Neural Networks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 2s
      June 2021
      349 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3465440
      Issue’s Table of Contents

      Copyright © 2021 Association for Computing Machinery.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 May 2021
      • Revised: 1 December 2020
      • Accepted: 1 December 2020
      • Received: 1 July 2020
      Published in tomm Volume 17, Issue 2s

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!