Abstract
In the last few years, enormous strides have been made for object detection and data association, which are vital subtasks for one-stage online multi-object tracking (MOT). However, the two separated submodules involved in the whole MOT pipeline are processed or optimized separately, resulting in a complex method design and requiring manual settings. In addition, few works integrate the two subtasks into a single end-to-end network to optimize the overall task. In this study, we propose an end-to-end MOT network called joint detection and association network (JDAN) that is trained and inferred in a single network. All layers in JDAN are differentiable, and can be optimized jointly to detect targets and output an association matrix for robust multi-object tracking. What’s more, we generate suitable pseudo-labels to address the data inconsistency between object detection and association. The detection and association submodules could be optimized by the composite loss function that is derived from the detection results and the generated pseudo association labels, respectively. The proposed approach is evaluated on two MOT challenge datasets, and achieves promising performance compared with classic and latest methods.
- [1] . 2021. Multitarget tracking using Siamese neural networks. ACM Transactions on Multimidia Computing Communications and Applications 17, 2s (2021), 1–16.Google Scholar
Digital Library
- [2] . 2018. Confidence-based data association and discriminative deep appearance learning for robust online multi-object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence (2018), 1–1.Google Scholar
- [3] . 2019. Tracking without bells and whistles. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 941–951.Google Scholar
Cross Ref
- [4] . 2008. Evaluating multiple object tracking performance: The CLEAR MOT metrics. EURASIP J. Image and Video Processing (2008), Article No. 1.Google Scholar
Digital Library
- [5] . 2017. Online multi-object tracking with convolutional neural networks. In Propceedings of the 2017 IEEE International Conference on Image Processing (ICIP). IEEE, 645–649.Google Scholar
Digital Library
- [6] . 2019. Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6172–6181.Google Scholar
Cross Ref
- [7] . 2019. CVPR19 tracking and detection challenge: How crowded can it get? In Proceedings of CoRR (2019).Google Scholar
- [8] . 2009. Pedestrian detection: A benchmark. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 304–311.Google Scholar
Cross Ref
- [9] . 2008. A mobile vision system for robust multi-person tracking. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1–8.Google Scholar
Cross Ref
- [10] . 2018. Recurrent autoregressive networks for online multi-object tracking. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 466–475.Google Scholar
Cross Ref
- [11] . 2019. Multi-level cooperative fusion of GM-PHD filters for online multiple human tracking. IEEE Transactions on Multimedia (2019), 1–1.Google Scholar
- [12] . 2012. Are we ready for autonomous driving? The KITTI vision benchmark suite. Computer Vision and Pattern Recognition (2012), 3354–3361.Google Scholar
- [13] . 2017. Mask R-CNN. IEEE Transactions on Pattern Analysis & Machine Intelligence (PP), 99 (2017), 1–1.Google Scholar
- [14] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR, 2016). 770–778.Google Scholar
Cross Ref
- [15] . 2017. Hadamard product for low-rank bilinear pooling. In Proceedings of ICLR (2017).Google Scholar
- [16] . 2017. Ubernet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6129–6138.Google Scholar
Cross Ref
- [17] . 1955. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly 2, 1–2 (1955), 83–97.Google Scholar
Cross Ref
- [18] . 2018. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European Conference on Computer Vision (ECCV). 734–750.Google Scholar
Digital Library
- [19] . 2015. MOTChallenge 2015: Towards a benchmark for multi-target tracking. CoRR (2015).Google Scholar
- [20] . 2019. Multi-view correlation tracking with adaptive memory-improved update model. Neural Computing and Applications (2019), 9047–9063.Google Scholar
- [21] . 2020. Learning a dynamic feature fusion tracker for object tracking. IEEE Transactions on Intelligent Transportation Systems (2020).Google Scholar
- [22] . 2016. Robust object tracking via weight-based local sparse appearance model. ICNC-FSKD (2016), 560–565.Google Scholar
- [23] . 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision. 2980–2988.Google Scholar
Cross Ref
- [24] . 2014. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740–755.Google Scholar
Cross Ref
- [25] . 2016. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision. Springer, 21–37.Google Scholar
Cross Ref
- [26] . 2020. A strong baseline for multiple object tracking on vidor dataset. In Proceedings of the 28th ACM International Conference on Multimedia. 4595–4599.Google Scholar
Digital Library
- [27] . 2020. Person re-identification with expanded neighborhoods distance re-ranking. Image and Vision Computing (2020).Google Scholar
Digital Library
- [28] . 2019. Multi-target tracking using CNN-based features: CNNMTT. Multimedia Tools and Applications 78, 6 (2019), 7077–7096.Google Scholar
Digital Library
- [29] . 2016. MOT16: A benchmark for multi-object tracking. (2016).Google Scholar
- [30] . 2006. Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics 5, 1 (2006), 32–38.Google Scholar
- [31] . 2018. Robust object tracking via local sparse appearance model. IEEE Transactions on Image Processing (2018), 4958–4970.Google Scholar
Digital Library
- [32] . 2019. Multi-pattern correlation tracking. Knowledge-Based Systems (2019).Google Scholar
Digital Library
- [33] . 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems (NIPS’19). 8024–8035.Google Scholar
- [34] . 2017. Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 1 (2017), 121–135.Google Scholar
Digital Library
- [35] . 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).Google Scholar
- [36] . 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS 2015). 91–99.Google Scholar
- [37] . 2016. Performance measures and a data set for multi-target, multi-camera tracking. In Proceedings of the European Conference on Computer Vision (ECCV’16). Springer, 17–35.Google Scholar
Cross Ref
- [38] . 2016. Online multi-target tracking with strong and weak detections. In Proceedings of the European Conference on Computer Vision. Springer, 84–99.Google Scholar
Cross Ref
- [39] . 2019. Deep affinity network for multiple object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019).Google Scholar
Digital Library
- [40] . 2019. MOTS: Multi-object tracking and segmentation. In Proceedings of(CVPR’2019), 7942–7951.Google Scholar
- [41] . 2020. Non-local attention association scheme for online multi-object tracking. Image and Vision Computing (2020), 103983.Google Scholar
Cross Ref
- [42] . 2020. Towards real-time multi-object tracking. (2020), 107–122.Google Scholar
- [43] . 1995. An introduction to the kalman filter. (1995).Google Scholar
- [44] . 2017. Simple online and realtime tracking with a deep association metric. In Proceedings of the(ICIP’17).Google Scholar
- [45] . 2015. Learning to track: Online multi-object tracking by decision making. In Proceedings of the IEEE International Conference on Computer Vision. 4705–4713.Google Scholar
Digital Library
- [46] . 2017. Joint detection and identification feature learning for person search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3415–3424.Google Scholar
Cross Ref
- [47] . 2021. Exploring image enhancement for salient object detection in low light images. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 17, 1s (2021), 1–19.Google Scholar
Digital Library
- [48] . 2020. How to train your deep multi-object tracker. In Proceedings of the(CVPR’20), 6786–6795.Google Scholar
- [49] . 2016. POI: Multiple object tracking with high performance detection and appearance feature. In Proceedings of the European Conference on Computer Vision. Springer, 36–42.Google Scholar
Cross Ref
- [50] . 2018. Deep layer aggregation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2403–2412.Google Scholar
Cross Ref
- [51] . 2017. Citypersons: A diverse dataset for pedestrian detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3213–3221.Google Scholar
Cross Ref
- [52] . 2015. Deep learning with elastic averaging SGD. In Advances in Neural Information Processing Systems. 685–693.Google Scholar
- [53] . 2021. Fairmot: On the fairness of detection and re-identification in multiple object tracking. International Journal of Computer Vision (2021), 1–19.Google Scholar
- [54] . 2017. Person re-identification in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1367–1376.Google Scholar
Cross Ref
- [55] . 2020. Tracking objects as points. In Proceedings of theEuropean Conference on Computer Vision. Springer, 474–490.Google Scholar
Digital Library
- [56] . 2019. Objects as points. arXiv preprint arXiv:1904.07850 (2019).Google Scholar
- [57] . 2018. Online multi-target tracking with tensor-based high-order graph matching. In Proceedings of the2018 24th International Conference on Pattern Recognition (ICPR). IEEE, 1809–1814.Google Scholar
Cross Ref
- [58] . 2018. Online multi-object tracking with dual matching attention networks. In Proceedings of theECCV (2018), 379–396.Google Scholar
Index Terms
JDAN: Joint Detection and Association Network for Real-Time Online Multi-Object Tracking
Recommendations
Online multiple objects tracking with detection reliability prior constraint
Multi-object tracking (MOT) is one popular topic in computer vision. It remains a challenging problem in complex scenes, especially of objects with similar appearance. In this case, many existing data association strategies, which link detections among ...
Real-Time Online Multi-Object Tracking: A Joint Detection and Tracking Framework
CSAI '19: Proceedings of the 2019 3rd International Conference on Computer Science and Artificial IntelligenceIn recent years, object detection technology has been continuously developed, and the tracking-by-detection strategy has gradually become the main method of multi-object tracking. Based on detection, the accuracy of the multi-object tracking depends on ...
Online multi-object tracking using KCF-based single-object tracker with occlusion analysis
AbstractMost state-of-the-art multiple-object tracking (MOT) methods adopt the tracking-by-detection (TBD) paradigm, which is a two-step procedure including the detection module and the tracking module. In these methods, the tracking performance heavily ...






Comments