Abstract
In this article, we detect and track visual objects by using Siamese network or twin neural network. The Siamese network is constructed to classify moving objects based on the associations of object detection network and object tracking network, which are thought of as the two branches of the twin neural network. The proposed tracking method was designed for single-target tracking, which implements multitarget tracking by using deep neural networks and object detection. The contributions of this article are stated as follows. First, we implement the proposed method for visual object tracking based on multiclass classification using deep neural networks. Then, we attain multitarget tracking by combining the object detection network and the single-target tracking network. Next, we uplift the tracking performance by fusing the outcomes of the object detection network and object tracking network. Finally, we speculate on the object occlusion problem based on IoU and similarity score, which effectively diminish the influence of this issue in multitarget tracking.
- D. S. Bolme, J. R. Beveridge, B. A. Draper, and Y. M. Lui. 2010. Visual object tracking using adaptive correlation filters. In Proceedings of IEEE CVPR. 2544–2550.Google Scholar
- J. F. Henriques, R. Caseiro, P. Martins, and J. Batista. 2014. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 3 (2014), 583–596.Google Scholar
Digital Library
- M. Danelljan, G. Hager, F. Shahbaz Khan, and M. Felsberg. 2015. Learning spatially regularized correlation filters for visual tracking. In Proceedings of IEEE ICCV. 4310–4318. Google Scholar
Digital Library
- W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg. 2016. SSD: Single shot multibox detector. In Proceedings of ECCV. 21–37.Google Scholar
- B. Li, J. Yan, W. Wu, Z. Zhu, and X. Hu. 2018. High performance visual tracking with Siamese region proposal network. In Proceedings of IEEE CVPR. 8971–8980.Google Scholar
- R. Girshick, J. Donahue, T. Darrell, and J. Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of IEEE CVPR. 580–587. Google Scholar
Digital Library
- R. Girshick. 2015. Fast R-CNN. In Proceedings of IEEE ICCV. 1440–1448. Google Scholar
Digital Library
- M. Danelljan, A. Robinson, F. S. Khan, and M. Felsberg. 2016. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In Proceedings of ECCV. 472–488.Google Scholar
- N. Wojke, A. Bewley, and D. Paulus. 2017. Simple online and realtime tracking with a deep association metric. In Proceedings of IEEE ICIP. 3645–3649.Google Scholar
- M. Danelljan, G. Bhat, F. Shahbaz Khan, and M. Felsberg. 2017. ECO: Efficient convolution operators for tracking. In Proceedings of IEEE CVPR. 6638–6646.Google Scholar
- L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. Torr. 2016. Fully-convolutional Siamese networks for object tracking. In Proceedings of ECCV. 850–865.Google Scholar
- S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780. Google Scholar
Digital Library
- M. C. Lee and W. G. Chen. 1999. U.S. Patent No. 5,970,173. Washington, DC: U.S. Patent and Trademark Office.Google Scholar
- J. Zhu, H. Yang, N. Liu, M. Kim, W. Zhang, and M. H. Yang. 2018. Online multi-object tracking with dual matching attention networks. In Proceedings of ECCV. 366–382.Google Scholar
- Q. Chu, W. Ouyang, H. Li, X. Wang, B. Liu, and N. Yu. 2017. Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In Proceedings of IEEE ICCV. 4836–4845.Google Scholar
- Z. Huang, J. Zhan, H. Zhao, K. Lin, P. Zheng, and J. Lv. 2019. Real-time visual tracking base on SiamRPN with generalized intersection over union. In Proceedings of BICS. 96–105.Google Scholar
- S. Cui, S. Tian, and X. Yin. 2019. Combined correlation filters with Siamese region proposal network for visual tracking. In Proceedings of ICONIP. 128–138.Google Scholar
- W. Feng, Z. Hu, W. Wu, J. Yan, and W. Ouyang. 2019. Multi-object tracking with multiple cues and switcher-aware classification. arXiv:1901.06129Google Scholar
- A. Milan, L. Leal-Taixé, I. Reid, S. Roth, and K. Schindler. 2016. MOT16: A benchmark for multi object tracking. arXiv:1603.00831Google Scholar
- L. Wen, D. Du, Z. Cai, Z. Lei, M. C. Chang, H. Qi, and S. Lyu. 2015. UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking. arXiv:1511.04136Google Scholar
- S. S. Deutsch. 2019. Siamese Networks for Visual Object Tracking. Ph.D. Dissertation. Universitat Politècnica de Catalunya, Escola Tècnica Superior d'Enginyeria de Telecomunicació de Barcelona, Spain.Google Scholar
- M. Z. Alom, T. M. Taha, C. Yakopcic, S. Westberg, P. Sidike, M. S. Nasrin, and V. K. Asari. 2018. The history began from AlexNet: A comprehensive survey on deep learning approaches. arXiv:1803.01164Google Scholar
- Z. Huang, J. Zhan, H. Zhao, K. Lin, P. Zheng, and J. Lv. 2019. Real-time visual tracking base on SiamRPN with generalized intersection over union. In Proceedings of BICS. 96–105.Google Scholar
- Z. Zhang and H. Peng. 2019. Deeper and wider Siamese networks for real-time visual tracking. In Proceedings of IEEE CVPR. 4591–4600.Google Scholar
- B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, and J. Yan. 2019. SiamRPN++: Evolution of Siamese visual tracking with very deep networks. In Proceedings of IEEE CVPR. 4282–4291.Google Scholar
- B. Li, J. Yan, W. Wu, Z. Zhu, and X. Hu. 2018. High performance visual tracking with Siamese region proposal network. In Proceedings of IEEE CVPR. 8971–8980.Google Scholar
- D. Li, X. Wang, and Y. Yu. 2019. Siamese visual tracking with deep features and robust feature fusion. In Proceedings of IEEE ICCE-Asia. 16–34.Google Scholar
- L. Zheng, M. Tang, Y. Chen, J. Wang, and H. Lu. 2020. Siamese deformable cross-correlation network for real-time visual tracking. Neurocomputing 401 (2020), 36–47.Google Scholar
Cross Ref
- R. D. Keane and R. J. Adrian. 1992. Theory of cross-correlation analysis of PIV images. Applied Scientific Research 49, 3 (1992), 191–215.Google Scholar
Cross Ref
- N. Dehak, R. Dehak, J. R. Glass, D. A. Reynolds, and P. Kenny. 2010. Cosine similarity scoring without score normalization techniques. In Proceedings of Odyssey. 15.Google Scholar
- B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, and J. Yan. 2019. SiamRPN++: Evolution of Siamese visual tracking with very deep networks. In Proceedings of IEEE CVPR. 4282–4291.Google Scholar
- L. I. Kuncheva. 2010. Full-class set classification using the Hungarian algorithm. International Journal of Machine Learning and Cybernetics 1, 1-4 (2010), 53–61.Google Scholar
Cross Ref
- R. T. Collins, A. J. Lipton, T. Kanade, H. Fujiyoshi, D. Duggins, Y. Tsin, and L. Wixson. 2000. A System for Video Surveillance and Monitoring. Final Report. VSAM.Google Scholar
- F. Bashir and F. Porikli. 2006. Performance evaluation of object detection and tracking systems. In Proceedings of IEEE PETS. 7–14.Google Scholar
- A. S. Abdel-Aziz, A. E. Hassanien, A. T. Azar, and S. E. O. Hanafi. 2013. Machine learning techniques for anomalies detection and classification. In Proceedings of SecNet. 219–229.Google Scholar
Cross Ref
- E. Bochinski, T. Senst, and T. Sikora. 2018. Extending IoU based multi-object tracking by visual information. In Proceedings of IEEE AVSS. 1–6.Google Scholar
- G. Chandan, A. Jain, and H. Jain. 2018. Real time object detection and tracking using deep learning and OpenCV. In Proceedings of ICIRCA. 1305–1308.Google Scholar
- W. Lotter, G. Kreiman, and D. Cox. 2015. Unsupervised learning of visual structure using predictive generative networks. arXiv:1511.06380Google Scholar
- M. J. Shafiee, B. Chywl, F. Li, and A. Wong. 2017. Fast YOLO: A fast you only look once system for real-time embedded object detection in video. arXiv:1709.05943Google Scholar
- R. R. Varior, B. Shuai, J. Lu, D. Xu, and G. Wang. 2016. A Siamese long short-term memory architecture for human re-identification. In Proceedings of ECCV. 135–153.Google Scholar
- R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of IEEE ICCV. 618–626.Google Scholar
- M. D. Zeiler and R. Fergus. 2014. Visualizing and understanding convolutional networks. In Proceedings of ECCV. 818–833.Google Scholar
- L. Lin, G. Wang, W. Zuo, X. Feng, and L. Zhang. 2016. Cross-domain visual matching via generalized similarity measure and feature learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2016), 1089–1102. Google Scholar
Digital Library
- R. Jonker and T. Volgenant. 1986. Improving the Hungarian assignment algorithm. Operations Research Letters 5, 4 (1986), 171–175. Google Scholar
Digital Library
- S. C. Wong, A. Gatt, V. Stamatescu, and M. D. McDonnell. 2016. Understanding data augmentation for classification: When to warp? In Proceedings of DICTA. 1–6.Google Scholar
- Y. Wu, J. Lim, and M. H. Yang. 2015. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 9 (2015), 1834–1848.Google Scholar
Digital Library
- Z. Zhang, S. Qiao, C. Xie, W. Shen, B. Wang, and A. L. Yuille. 2018. Single-shot object detection with enriched semantics. In Proceedings of IEEE CVPR. 5813–5821.Google Scholar
- J. Zhu, H. Yang, N. Liu, M. Kim, W. Zhang, and M. H. Yang. 2018. Online multi-object tracking with dual matching attention networks. In Proceedings of ECCV. 366–382.Google Scholar
- S. Tang, M. Andriluka, B. Andres, and B. Schiele. 2017. Multiple people tracking by lifted multicut and person re-identification. In Proceedings of IEEE CVPR. 3539–3548.Google Scholar
- C. Shen, Z. Jin, Y. Zhao, Z. Fu, R. Jiang, Y. Chen, and X. S. Hua. 2017. Deep Siamese network with multi-level similarity perception for person re-identification. In Proceedings of ACM MM. 1942–1950. Google Scholar
Digital Library
- A. Milan, S. H. Rezatofighi, A. Dick, I. Reid, and K. Schindler. 2017. Online multi-target tracking using recurrent neural networks. In Proceedings of AAAI. 4225—4232. Google Scholar
Digital Library
- Z. He, J. Li, D. Liu, H. He, and D. Barber. 2019. Tracking by animation: Unsupervised learning of multi-object attentive trackers. In Proceedings of IEEE CVPR. 1318–1327.Google Scholar
- Y. C. Yoon, D. Y. Kim, K. Yoon, Y. M. Song, and M. Jeon. 2019. Online multiple pedestrian tracking using deep temporal appearance matching association. arXiv:1907.00831Google Scholar
- W. Feng, Z. Hu, W. Wu, J. Yan, and W. Ouyang. 2019. Multi-object tracking with multiple cues and switcher-aware classification. arXiv:1901.06129Google Scholar
- C. Yan, B. Gong, Y. Wei, and Y. Gao. 2020. Deep multi-view enhancement hashing for image retrieval. arXiv:2002.00169Google Scholar
- A. Milan, L. Leal-Taixé, I. Reid, S. Roth, and K. Schindler. 2016. MOT16: A benchmark for multi-object tracking. arXiv:1603.00831Google Scholar
- W. Luo, J. Xing, A. Milan, X. Zhang, W. Liu, X. Zhao, and T. K. Kim. 2014. Multiple object tracking: A literature review. arXiv:1409.7618Google Scholar
- Y. Zhang, D. Wang, L. Wang, J. Qi, and H. Lu. 2018. Learning regression and verification networks for long-term visual tracking. arXiv:1809.04320Google Scholar
- A. Sadeghian, A. Alahi, and S. Savarese. 2017. Tracking the untrackable: Learning to track multiple cues with long-term dependencies. In Proceedings of ICCV. 300–311.Google Scholar
- J. Yin, W. Wang, Q. Meng, R. Yang, and J. Shen. 2020. A unified object motion and affinity model for online multi-object tracking. In Proceedings of CVPR. 6768–6777.Google Scholar
- P. Chu, H. Fan, C. C. Tan, and H. Ling. 2019. Online multi-object tracking with instance-aware tracker and dynamic model refreshment. In Proceedings of IEEE WACV. 161–170.Google Scholar
- P. Chu and H. Ling. 2019. FAMNet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In Proceedings of ICCV. 6172–6181.Google Scholar
- N. An. 2020. Anomalies Detection and Tracking Using Siamese Neural Networks. Master's Thesis. Auckland University of Technology, New Zealand.Google Scholar
- W. Yan. 2020. Computational Methods for Deep Learning. Springer.Google Scholar
- W. Yan. 2019. Introduction to Intelligent Surveillance—Data Capture, Transmission, and Analytics (3rd ed.). Springer. Google Scholar
Digital Library
Index Terms
Multitarget Tracking Using Siamese Neural Networks
Recommendations
Convolutional Neural Networks
Human-Centered Artificial IntelligenceAbstractThis chapter presents Convolutional Neural Networks (CNNs). The chapter begins with a review of the convolution equation, and a description of the original LeNet series of CNN architectures. It then traces the emergence of Convolutional Networks ...
Human tracking using convolutional neural networks
In this paper, we treat tracking as a learning problem of estimating the location and the scale of an object given its previous location, scale, as well as current and previous image frames. Given a set of examples, we train convolutional neural ...
Siamese Network for Underwater Multiple Object Tracking
ICMLC 2017: Proceedings of the 9th International Conference on Machine Learning and ComputingFor underwater videos, the performance of object tracking is greatly affected by illumination changes, background disturbances and occlusion. Hence, there is a need to have a robust function that computes image similarity, to accurately track the moving ...






Comments