skip to main content
research-article

JDAN: Joint Detection and Association Network for Real-Time Online Multi-Object Tracking

Published:03 February 2023Publication History
Skip Abstract Section

Abstract

In the last few years, enormous strides have been made for object detection and data association, which are vital subtasks for one-stage online multi-object tracking (MOT). However, the two separated submodules involved in the whole MOT pipeline are processed or optimized separately, resulting in a complex method design and requiring manual settings. In addition, few works integrate the two subtasks into a single end-to-end network to optimize the overall task. In this study, we propose an end-to-end MOT network called joint detection and association network (JDAN) that is trained and inferred in a single network. All layers in JDAN are differentiable, and can be optimized jointly to detect targets and output an association matrix for robust multi-object tracking. What’s more, we generate suitable pseudo-labels to address the data inconsistency between object detection and association. The detection and association submodules could be optimized by the composite loss function that is derived from the detection results and the generated pseudo association labels, respectively. The proposed approach is evaluated on two MOT challenge datasets, and achieves promising performance compared with classic and latest methods.

REFERENCES

  1. [1] An Na and Yan Wei Qi. 2021. Multitarget tracking using Siamese neural networks. ACM Transactions on Multimidia Computing Communications and Applications 17, 2s (2021), 116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Bae Seung-Hwan and Yoon Kuk-Jin. 2018. Confidence-based data association and discriminative deep appearance learning for robust online multi-object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence (2018), 11.Google ScholarGoogle Scholar
  3. [3] Bergmann Philipp, Meinhardt Tim, and Leal-Taixe Laura. 2019. Tracking without bells and whistles. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 941951.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Bernardin Keni and Stiefelhagen Rainer. 2008. Evaluating multiple object tracking performance: The CLEAR MOT metrics. EURASIP J. Image and Video Processing (2008), Article No. 1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Chen Long, Ai Haizhou, Shang Chong, Zhuang Zijie, and Bai Bo. 2017. Online multi-object tracking with convolutional neural networks. In Propceedings of the 2017 IEEE International Conference on Image Processing (ICIP). IEEE, 645649.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Chu Peng and Ling Haibin. 2019. Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 61726181.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Dendorfer Patrick, Rezatofighi Hamid Seyed, Milan Anton, Shi Javen, Cremers Daniel, Reid D. Ian, Roth Stefan, Schindler Konrad, and Leal-Taixeacute Laura. 2019. CVPR19 tracking and detection challenge: How crowded can it get? In Proceedings of CoRR (2019).Google ScholarGoogle Scholar
  8. [8] Dollár Piotr, Wojek Christian, Schiele Bernt, and Perona Pietro. 2009. Pedestrian detection: A benchmark. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 304311.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Ess Andreas, Leibe Bastian, Schindler Konrad, and Gool Luc Van. 2008. A mobile vision system for robust multi-person tracking. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 18.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Fang Kuan, Xiang Yu, Li Xiaocheng, and Savarese Silvio. 2018. Recurrent autoregressive networks for online multi-object tracking. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 466475.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Fu Zeyu, Angelini Federico, Chambers Jonathon, and Naqvi Mohsen Syed. 2019. Multi-level cooperative fusion of GM-PHD filters for online multiple human tracking. IEEE Transactions on Multimedia (2019), 11.Google ScholarGoogle Scholar
  12. [12] Geiger Andreas, Lenz Philip, and Urtasun Raquel. 2012. Are we ready for autonomous driving? The KITTI vision benchmark suite. Computer Vision and Pattern Recognition (2012), 33543361.Google ScholarGoogle Scholar
  13. [13] He Kaiming, Gkioxari Georgia, Dollar Piotr, and Girshick Ross. 2017. Mask R-CNN. IEEE Transactions on Pattern Analysis & Machine Intelligence (PP), 99 (2017), 11.Google ScholarGoogle Scholar
  14. [14] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR, 2016). 770778.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Kim Jin-Hwa, On Woon Kyoung, Kim Jeonghee, Ha JungWoo, and Zhang Byoung-Tak. 2017. Hadamard product for low-rank bilinear pooling. In Proceedings of ICLR (2017).Google ScholarGoogle Scholar
  16. [16] Kokkinos Iasonas. 2017. Ubernet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 61296138.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Kuhn Harold W.. 1955. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly 2, 1–2 (1955), 8397.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Law Hei and Deng Jia. 2018. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European Conference on Computer Vision (ECCV). 734750.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Leal-Taixé Laura, Milan Anton, Reid D. Ian, Roth Stefan, and Schindler Konrad. 2015. MOTChallenge 2015: Towards a benchmark for multi-target tracking. CoRR (2015).Google ScholarGoogle Scholar
  20. [20] Li Guiji, Peng Manman, Nai Ke, Li Zhiyong, and Li Keqin. 2019. Multi-view correlation tracking with adaptive memory-improved update model. Neural Computing and Applications (2019), 90479063.Google ScholarGoogle Scholar
  21. [21] Li Zhiyong, Nai Ke, Li Guiji, and Jiang Shilong. 2020. Learning a dynamic feature fusion tracker for object tracking. IEEE Transactions on Intelligent Transportation Systems (2020).Google ScholarGoogle Scholar
  22. [22] Li Zhiyong, Wang Dongming, Nai Ke, Shen Tong, and Zeng Ying. 2016. Robust object tracking via weight-based local sparse appearance model. ICNC-FSKD (2016), 560565.Google ScholarGoogle Scholar
  23. [23] Lin Tsung-Yi, Goyal Priya, Girshick Ross, He Kaiming, and Dollár Piotr. 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision. 29802988.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Lin Tsung-Yi, Maire Michael, Belongie Serge, Hays James, Perona Pietro, Ramanan Deva, Dollár Piotr, and Zitnick C. Lawrence. 2014. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740755.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Liu Wei, Anguelov Dragomir, Erhan Dumitru, Szegedy Christian, Reed Scott, Fu Cheng-Yang, and Berg Alexander C.. 2016. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision. Springer, 2137.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Luo Zhipeng, Zhang Zhiguang, and Yao Yuehan. 2020. A strong baseline for multiple object tracking on vidor dataset. In Proceedings of the 28th ACM International Conference on Multimedia. 45954599.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Lv Jingyi, Li Zhiyong, Nai Ke, Chen Ying, and Yuan Jin. 2020. Person re-identification with expanded neighborhoods distance re-ranking. Image and Vision Computing (2020).Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Mahmoudi Nima, Ahadi Seyed Mohammad, and Rahmati Mohammad. 2019. Multi-target tracking using CNN-based features: CNNMTT. Multimedia Tools and Applications 78, 6 (2019), 70777096.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Milan Anton, Leal-Taixe Laura, Reid Ian, Roth Stefan, and Schindler Konrad. 2016. MOT16: A benchmark for multi-object tracking. (2016).Google ScholarGoogle Scholar
  30. [30] Munkres and James. 2006. Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics 5, 1 (2006), 3238.Google ScholarGoogle Scholar
  31. [31] Nai Ke, Li Zhiyong, Li Guiji, and Wang Shanquan. 2018. Robust object tracking via local sparse appearance model. IEEE Transactions on Image Processing (2018), 49584970.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Nai Ke, Xiao Degui, Li Zhiyong, Jiang Shilong, and Gu Yu. 2019. Multi-pattern correlation tracking. Knowledge-Based Systems (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Paszke Adam, Gross Sam, Massa Francisco, Lerer Adam, Bradbury James, Chanan Gregory, Killeen Trevor, Lin Zeming, Gimelshein Natalia, and Antiga Luca. 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems (NIPS’19). 80248035.Google ScholarGoogle Scholar
  34. [34] Ranjan Rajeev, Patel Vishal M., and Chellappa Rama. 2017. Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 1 (2017), 121135.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Redmon Joseph and Farhadi Ali. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).Google ScholarGoogle Scholar
  36. [36] Ren Shaoqing, He Kaiming, Girshick Ross, and Sun Jian. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS 2015). 9199.Google ScholarGoogle Scholar
  37. [37] Ristani Ergys, Solera Francesco, Zou Roger, Cucchiara Rita, and Tomasi Carlo. 2016. Performance measures and a data set for multi-target, multi-camera tracking. In Proceedings of the European Conference on Computer Vision (ECCV’16). Springer, 1735.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Sanchez-Matilla Ricardo, Poiesi Fabio, and Cavallaro Andrea. 2016. Online multi-target tracking with strong and weak detections. In Proceedings of the European Conference on Computer Vision. Springer, 8499.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Sun ShiJie, Akhtar Naveed, Song HuanSheng, Mian Ajmal S., and Shah Mubarak. 2019. Deep affinity network for multiple object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Voigtlaender Paul, Krause Michael, Osep Aljosa, Luiten Jonathon, Sekar Balachandar Gnana Berin, Geiger Andreas, and Leibe Bastian. 2019. MOTS: Multi-object tracking and segmentation. In Proceedings of(CVPR’2019), 79427951.Google ScholarGoogle Scholar
  41. [41] Wang Haidong, Wang Saizhou, Lv Jingyi, Hu Chenming, and Li Zhiyong. 2020. Non-local attention association scheme for online multi-object tracking. Image and Vision Computing (2020), 103983.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Wang Zhongdao, Zheng Liang, Liu Yixuan, Li Yali, and Wang Shengjin. 2020. Towards real-time multi-object tracking. (2020), 107122.Google ScholarGoogle Scholar
  43. [43] Welch Greg, Bishop Gary, et al. 1995. An introduction to the kalman filter. (1995).Google ScholarGoogle Scholar
  44. [44] Wojke Nicolai, Bewley Alex, and Paulus Dietrich. 2017. Simple online and realtime tracking with a deep association metric. In Proceedings of the(ICIP’17).Google ScholarGoogle Scholar
  45. [45] Xiang Yu, Alahi Alexandre, and Savarese Silvio. 2015. Learning to track: Online multi-object tracking by decision making. In Proceedings of the IEEE International Conference on Computer Vision. 47054713.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Xiao Tong, Li Shuang, Wang Bochao, Lin Liang, and Wang Xiaogang. 2017. Joint detection and identification feature learning for person search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 34153424.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Xu Xin, Wang Shiqin, Wang Zheng, Zhang Xiaolong, and Hu Ruimin. 2021. Exploring image enhancement for salient object detection in low light images. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 17, 1s (2021), 119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Xu Yihong, Osep Aljosa, Ban Yutong, Horaud Radu, Leal-Taixé Laura, and Alameda-Pineda Xavier. 2020. How to train your deep multi-object tracker. In Proceedings of the(CVPR’20), 67866795.Google ScholarGoogle Scholar
  49. [49] Yu Fengwei, Li Wenbo, Li Quanquan, Liu Yu, Shi Xiaohua, and Yan Junjie. 2016. POI: Multiple object tracking with high performance detection and appearance feature. In Proceedings of the European Conference on Computer Vision. Springer, 3642.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Yu Fisher, Wang Dequan, Shelhamer Evan, and Darrell Trevor. 2018. Deep layer aggregation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 24032412.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Zhang Shanshan, Benenson Rodrigo, and Schiele Bernt. 2017. Citypersons: A diverse dataset for pedestrian detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 32133221.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Zhang Sixin, Choromanska Anna E., and LeCun Yann. 2015. Deep learning with elastic averaging SGD. In Advances in Neural Information Processing Systems. 685693.Google ScholarGoogle Scholar
  53. [53] Zhang Yifu, Wang Chunyu, Wang Xinggang, Zeng Wenjun, and Liu Wenyu. 2021. Fairmot: On the fairness of detection and re-identification in multiple object tracking. International Journal of Computer Vision (2021), 119.Google ScholarGoogle Scholar
  54. [54] Zheng Liang, Zhang Hengheng, Sun Shaoyan, Chandraker Manmohan, Yang Yi, and Tian Qi. 2017. Person re-identification in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 13671376.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Zhou Xingyi, Koltun Vladlen, and Krähenbühl Philipp. 2020. Tracking objects as points. In Proceedings of theEuropean Conference on Computer Vision. Springer, 474490.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Zhou Xingyi, Wang Dequan, and Krähenbühl Philipp. 2019. Objects as points. arXiv preprint arXiv:1904.07850 (2019).Google ScholarGoogle Scholar
  57. [57] Zhou Zongwei, Xing Junliang, Zhang Mengdan, and Hu Weiming. 2018. Online multi-target tracking with tensor-based high-order graph matching. In Proceedings of the2018 24th International Conference on Pattern Recognition (ICPR). IEEE, 18091814.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Zhu Ji, Yang Hua, Liu Nian, Kim Minyoung, Zhang Wenjun, and Yang Ming-Hsuan. 2018. Online multi-object tracking with dual matching attention networks. In Proceedings of theECCV (2018), 379396.Google ScholarGoogle Scholar

Index Terms

  1. JDAN: Joint Detection and Association Network for Real-Time Online Multi-Object Tracking

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Multimedia Computing, Communications, and Applications
            ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 1s
            February 2023
            504 pages
            ISSN:1551-6857
            EISSN:1551-6865
            DOI:10.1145/3572859
            • Editor:
            • Abdulmotaleb El Saddik
            Issue’s Table of Contents

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 3 February 2023
            • Online AM: 2 May 2022
            • Accepted: 22 April 2022
            • Revised: 5 February 2022
            • Received: 14 August 2021
            Published in tomm Volume 19, Issue 1s

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Refereed
          • Article Metrics

            • Downloads (Last 12 months)266
            • Downloads (Last 6 weeks)23

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Full Text

          View this article in Full Text.

          View Full Text

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!